Skip to content
This repository

Prototyping numpy arrays with named axes for data management. Docs are available at URL below

branch: master
README.rst

Datarray: Numpy arrays with named axes

Scientists, engineers, mathematicians and statisticians don't just work with matrices; they often work with structured data, just like you'd find in a table. However, functionality for this is missing from Numpy, and there are efforts to create something to fill the void. This is one of those efforts.

Warning

This code is currently experimental, and its API will change! It is meant to be a place for the community to understand and develop the right semantics and have a prototype implementation that will ultimately (hopefully) be folded back into Numpy.

Datarray provides a subclass of Numpy ndarrays that support:

  • individual dimensions (axes) being labeled with meaningful descriptions
  • labeled 'ticks' along each axis
  • indexing and slicing by named axis
  • indexing on any axis with the tick labels instead of only integers
  • reduction operations (like .sum, .mean, etc) support named axis arguments instead of only integer indices.

Prior Art

At present, there is no accepted standard solution to dealing with tabular data such as this. However, based on the following list of ad-hoc and proposal-level implementations of something such as this, there is definitely a demand for it. For examples, in no particular order:

Project Goals

  1. Get something akin to this in the numpy core.

2. Stick to basic functionality such that projects like scikits.statsmodels and pandas can use it as a base datatype.

3. Make an interface that allows for simple, pretty manipulation that doesn't introduce confusion.

  1. Oh, and make sure that the base numpy array is still accessible.

Code

You can find our sources and single-click downloads:

Something went wrong with that request. Please try again.