Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tree: b7af9f12b6
Fetching contributors…

Cannot retrieve contributors at this time

99 lines (71 sloc) 3.814 kb
.. -*- rest -*-
.. vim:syntax=rest
========================================
Datarray: Numpy arrays with named axes
========================================
Scientists, engineers, mathematicians and statisticians don't just work with
matrices; they often work with structured data, just like you'd find in a
table. However, functionality for this is missing from Numpy, and there are
efforts to create something to fill the void. This is one of those efforts.
.. warning::
This code is currently experimental, and its API *will* change! It is meant
to be a place for the community to understand and develop the right
semantics and have a prototype implementation that will ultimately
(hopefully) be folded back into Numpy.
Datarray provides a subclass of Numpy ndarrays that support:
- individual dimensions (axes) being labeled with meaningful descriptions
- labeled 'ticks' along each axis
- indexing and slicing by named axis
- indexing on any axis with the tick labels instead of only integers
- reduction operations (like .sum, .mean, etc) support named axis arguments
instead of only integer indices.
Prior Art
=========
At present, there is no accepted standard solution to dealing with tabular data
such as this. However, based on the following list of ad-hoc and proposal-level
implementations of something such as this, there is *definitely* a demand for
it. For examples, in no particular order:
* [Tabular](http://bitbucket.org/elaine/tabular/src) implements a
spreadsheet-inspired datatype, with rows/columns, csv/etc. IO, and fancy
tabular operations.
* [scikits.statsmodels](http://scikits.appspot.com/statsmodels) sounded as
though it had some features we'd like to eventually see implemented on top of
something such as datarray, and [Skipper](http://scipystats.blogspot.com/)
seemed pretty interested in something like this himself.
* [scikits.timeseries](http://scikits.appspot.com/timeseries) also has a
time-series-specific object that's somewhat reminiscent of labeled arrays.
* [pandas](http://pandas.sourceforge.net/) is based around a number of
DataFrame-esque datatypes.
* [pydataframe](http://code.google.com/p/pydataframe/) is supposed to be a
clone of R's data.frame.
* [larry](http://github.com/kwgoodman/la), or "labeled array," often comes up
in discussions alongside pandas.
* [divisi](http://github.com/commonsense/divisi2) includes labeled sparse and
dense arrays.
* [pymvpa](https://github.com/PyMVPA/PyMVPA) provides Dataset class
encapsulating the data together with matching in length sets of
attributes for the first two (samples and features) dimensions.
Dataset is not a subclass of numpy array to allow other data
structures (e.g. sparse matrices).
* [ptsa](http://git.debian.org/?p=pkg-exppsy/ptsa.git) subclasses
ndarray to provide attributes per dimensions aiming to ease
slicing/indexing given the values of the axis attributes
Project Goals
=============
1. Get something akin to this in the numpy core.
2. Stick to basic functionality such that projects like scikits.statsmodels and
pandas can use it as a base datatype.
3. Make an interface that allows for simple, pretty manipulation that doesn't
introduce confusion.
4. Oh, and make sure that the base numpy array is still accessible.
Code
====
You can find our sources and single-click downloads:
* `Main repository`_ on Github.
* Documentation_ for all releases and current development tree.
* Download as a tar/zip file the `current trunk`_.
* Downloads of all `available releases`_.
.. _main repository: http://github.com/fperez/datarray
.. _Documentation: http://fperez.github.com/datarray-doc
.. _current trunk: http://github.com/fperez/datarray/archives/master
.. _available releases: http://github.com/fperez/datarray/downloads
Jump to Line
Something went wrong with that request. Please try again.