Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Meet larry, the labeled numpy array
Python C Shell
Tag: v0.4.0

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
doc
la
sandbox
.gitignore
LICENSE
MANIFEST.in
README.rst
RELEASE.rst
setup.py

README.rst

Who's larry?

The main class of the la package is a labeled array, larry. A larry consists of data and labels. The data is stored as a NumPy array and the labels as a list of lists (one list per dimension).

Here's larry in schematic form:

             date1    date2    date3
    'AAPL'   209.19   207.87   210.11
y = 'IBM'    129.03   130.39   130.55
    'DELL'    14.82    15.11    14.94

The larry above is stored internally as a Numpy array and a list of lists:

y.label = [['AAPL', 'IBM', 'DELL'], [date1, date2, date3]]
y.x = np.array([[209.19, 207.87, 210.11],
                [129.03, 130.39, 130.55],
                [ 14.82,  15.11,  14.94]])

A larry can have any number of dimensions except zero. Here, for example, is one way to create a one-dimensional larry:

>>> import la
>>> y = la.larry([1, 2, 3])

In the statement above the list is converted to a Numpy array and the labels default to range(n), where n in this case is 3.

larry has built-in methods such as movingsum, ranking, merge, shuffle, zscore, demean, lag as well as typical Numpy methods like sum, max, std, sign, clip. NaNs are treated as missing data.

Alignment by label is automatic when you add (or subtract, multiply, divide) two larrys.

You can archive larrys in HDF5 format using save and load or using a dictionary-like interface:

>>> io = la.IO('/tmp/dataset.hdf5')
>>> io['y'] = y   # <--- save
>>> z = io['y']   # <--- load
>>> del io['y']   # <--- delete from archive

For the most part larry acts like a Numpy array. And, whenever you want, you have direct access to the Numpy array that holds your data. For example if you have a function, myfunc, that works on Numpy arrays and doesn't change the shape or ordering of the array, then you can use it on a larry, y, like this:

y.x = myfunc(y.x)

larry adds the convenience of labels, provides many built-in methods, and let's you use your existing array functions.

License

The la package is distributed under a Simplified BSD license. Parts of NumPy, Scipy, and numpydoc, which all have BSD licenses, are included in la. See the LICENSE file, which is distributed with the la package, for details.

Installation

The la package requires Python and Numpy. Numpy 1.4.1 or newer is recommended for its improved NaN handling. Also some of the unit tests in the la package require Numpy 1.4 or newer and many require nose.

To save and load larrys in HDF5 format, you need h5py with HDF5 1.8.

To install la:

$ python setup.py build
$ sudo python setup.py install

Or, if you wish to specify where la is installed, for example inside /usr/local:

$ python setup.py build
$ sudo python setup.py install --prefix=/usr/local

After you have installed la, run the suite of unit tests:

>>> import la
>>> la.test()
<snip>
Ran 2987 tests in 1.351s
OK
<nose.result.TextTestResult run=2987 errors=0 failures=0>

The la package contains C extensions that speed up common alignment operations such as adding two unaligned larrys. If the C extensions don't compile when you build la then there's an automatic fallback to python versions of the functions. To see whether you are using the C functions or the Python functions:

>>> la.info()
la version      0.4.0
la file         /usr/local/lib/python2.6/dist-packages/la/__init__.pyc
HDF5 archiving  Available
listmap         Faster C version
listmap_fill    Faster C version

Since la can run in a pure python mode, you can use la by just saving it and making sure that python can find it.

URLs

docs http://larry.sourceforge.net
download http://pypi.python.org/pypi/la
code http://github.com/kwgoodman/la
list 1 http://groups.google.ca/group/pystatsmodels
list 2 http://groups.google.com/group/labeled-array

la at a glance

la package

package name la
web site http://larry.sourceforge.net
license Simplified BSD
programming languages Python, Cython
required dependencies Python, NumPy
optional dependencies h5py, Scipy, nose, C-compiler
year started (open source) 2008 (2010)

Data object

data object (main class) larry
number of dimensions supported nd > 0d
data container Numpy array
direct access to data container yes
data types homogenous: float, int, str, object
label container list of lists
direct access to label container yes
label types heterogenous, hashable
label constraints unique along any one axis, hashable
missing values NaN (float), partial: '' (str), None (object)
binary operations on two data objects intersection of labels
IO HDF5, partial support for CSV

Similar to Numpy

Numpy array la larry
arr = np.array([[1, 2], [3, 4]]) lar = la.larry([[1, 2], [3, 4]])
np.nansum(arr) lar.sum()
arr.shape, arr.dtype, lar.shape, lar.dtype
arr.ndim, arr.T lar.ndim, lar.T
arr.astype(float) lar.astype(float)
arr1 + arr2 lar1 + lar2
arr[:,0] lar[:,0]
Something went wrong with that request. Please try again.