Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Prototyping numpy arrays with named axes for data management. Docs are available at URL below
branch: master

This branch is 26 commits behind fperez:master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
datarray
doc
examples
tools
.gitignore
LICENSE
MANIFEST.in
README.txt
setup.py

README.txt

.. -*- rest -*-
.. vim:syntax=rest

========================================
 Datarray: Numpy arrays with named axes
========================================

Scientists, engineers, mathematicians and statisticians don't just work with
matrices; they often work with structured data, just like you'd find in a
table. However, functionality for this is missing from Numpy, and there are
efforts to create something to fill the void.  This is one of those efforts.

.. warning::

   This code is currently experimental, and its API *will* change!  It is meant
   to be a place for the community to understand and develop the right
   semantics and have a prototype implementation that will ultimately
   (hopefully) be folded back into Numpy.

Datarray provides a subclass of Numpy ndarrays that support:

- individual dimensions (axes) being labeled with meaningful descriptions
- labeled 'ticks' along each axis
- indexing and slicing by named axis
- indexing on any axis with the tick labels instead of only integers
- reduction operations (like .sum, .mean, etc) support named axis arguments
  instead of only integer indices.

Prior Art
=========

At present, there is no accepted standard solution to dealing with tabular data
such as this. However, based on the following list of ad-hoc and proposal-level
implementations of something such as this, there is *definitely* a demand for
it.  For examples, in no particular order:

* [Tabular](http://bitbucket.org/elaine/tabular/src) implements a
  spreadsheet-inspired datatype, with rows/columns, csv/etc. IO, and fancy
  tabular operations.

* [scikits.statsmodels](http://scikits.appspot.com/statsmodels) sounded as
  though it had some features we'd like to eventually see implemented on top of
  something such as datarray, and [Skipper](http://scipystats.blogspot.com/)
  seemed pretty interested in something like this himself.

* [scikits.timeseries](http://scikits.appspot.com/timeseries) also has a
  time-series-specific object that's somewhat reminiscent of labeled arrays.

* [pandas](http://pandas.sourceforge.net/) is based around a number of
  DataFrame-esque datatypes.

* [pydataframe](http://code.google.com/p/pydataframe/) is supposed to be a
  clone of R's data.frame.

* [larry](http://github.com/kwgoodman/la), or "labeled array," often comes up
  in discussions alongside pandas.

* [divisi](http://github.com/commonsense/divisi2) includes labeled sparse and
  dense arrays.

* [pymvpa](https://github.com/PyMVPA/PyMVPA) provides Dataset class
  encapsulating the data together with matching in length sets of
  attributes for the first two (samples and features) dimensions.
  Dataset is not a subclass of numpy array to allow other data
  structures (e.g. sparse matrices).

* [ptsa](http://git.debian.org/?p=pkg-exppsy/ptsa.git) subclasses
  ndarray to provide attributes per dimensions aiming to ease
  slicing/indexing given the values of the axis attributes

Project Goals
=============

1. Get something akin to this in the numpy core.

2. Stick to basic functionality such that projects like scikits.statsmodels and
pandas can use it as a base datatype.

3. Make an interface that allows for simple, pretty manipulation that doesn't
introduce confusion.

4. Oh, and make sure that the base numpy array is still accessible.


Code
====

You can find our sources and single-click downloads:

* `Main repository`_ on Github.
* Documentation_ for all releases and current development tree.
* Download as a tar/zip file the `current trunk`_.
* Downloads of all `available releases`_.

.. _main repository: http://github.com/fperez/datarray
.. _Documentation: http://fperez.github.com/datarray-doc
.. _current trunk: http://github.com/fperez/datarray/archives/master
.. _available releases: http://github.com/fperez/datarray/downloads
Something went wrong with that request. Please try again.