Skip to content

Commit

Permalink
Renamed package from DSNA to Bottleneck.
Browse files Browse the repository at this point in the history
  • Loading branch information
kwgoodman committed Nov 28, 2010
1 parent b5877c2 commit 9df1719
Show file tree
Hide file tree
Showing 44 changed files with 4,378 additions and 4,271 deletions.
2 changes: 1 addition & 1 deletion LICENSE
@@ -1 +1 @@
The license file is one level down from this file: dsna/LICENSE.
The license file is one level down from this file: bottleneck/LICENSE.
8 changes: 4 additions & 4 deletions MANIFEST.in
@@ -1,6 +1,6 @@
include LICENSE README.rst RELEASE.rst dsna/LICENSE
include dsna/src/MakeFile
include dsna/src/func/setup.py
recursive-include dsna/src/func *.pyx
include LICENSE README.rst RELEASE.rst bottleneck/LICENSE
include bottleneck/src/MakeFile
include bottleneck/src/func/setup.py
recursive-include bottleneck/src/func *.pyx
recursive-include doc *
recursive-exclude doc/build *
131 changes: 63 additions & 68 deletions README.rst
@@ -1,78 +1,77 @@
====
DSNA
====
==========
Bottleneck
==========

Introduction
============

DSNA uses the magic of Cython to give you fast, NaN-aware descriptive
statistics of NumPy arrays.
Bottleneck is a collection of fast, NumPy array functions written in Cython.

The three categories of Bottleneck functions:

- Faster, drop-in replacement for NaN functions in NumPy and SciPy
- Moving window functions
- Group functions that bin calculations by like-labeled elements

The functions in dsna fall into three categories:
Function signatures (using mean as an example):

=============== ===============================
General sum(arr, axis=None)
Moving window move_sum(arr, window, axis=0)
Group by group_sum(arr, label, axis=0)
=============== ===============================
=============== ============================================
NaN functions mean(arr, axis=None)
Moving window move_mean(arr, window, axis=0)
Group by group_mean(arr, label, order=None, axis=0)
=============== ============================================

For example, create a NumPy array::
Let's give it a try. Create a NumPy array::
>>> import numpy as np
>>> arr = np.array([1, 2, np.nan, 4, 5])

Then find the sum::
Find the sum::

>>> import dsna as ds
>>> ds.sum(arr)
>>> import bottleneck as bn
>>> bn.sum(arr)
12.0

Moving window sum::

>>> ds.move_sum(arr, window=2)
>>> bn.move_sum(arr, window=2)
array([ nan, 3., 2., 4., 9.])

Group mean::

>>> label = ['a', 'a', 'b', 'b', 'a']
>>> ds.group_mean(arr, label)
>>> bn.group_mean(arr, label)
(array([ 2.66666667, 4. ]), ['a', 'b'])
>>> ds.group_mean(arr, label, order=['b', 'a'])
(array([ 4. , 2.66666667]), ['b', 'a'])
>>> ds.group_mean(arr, label, order=['b'])
(array([ 4.]), ['b'])

Fast
====

DNSA is fast::
Bottleneck is fast::

>>> import dsna as ds
>>> import numpy as np
>>> arr = np.random.rand(100, 100)
>>> arr = np.random.rand(100, 100)
>>> timeit np.nansum(arr)
10000 loops, best of 3: 68.4 us per loop
>>> timeit ds.sum(arr)
>>> timeit bn.sum(arr)
100000 loops, best of 3: 17.7 us per loop

Let's not forget to add some NaNs::

>>> arr[arr > 0.5] = np.nan
>>> timeit np.nansum(arr)
1000 loops, best of 3: 417 us per loop
>>> timeit ds.sum(arr)
>>> timeit bn.sum(arr)
10000 loops, best of 3: 64.8 us per loop

DSNA comes with a benchmark suite that compares the performance of the dsna
functions that have a NumPy/SciPy equivalent. To run the benchmark::
Bottleneck comes with a benchmark suite that compares the performance of the
bottleneck functions that have a NumPy/SciPy equivalent. To run the
benchmark::
>>> ds.benchit(verbose=False)
DSNA performance benchmark
DSNA 0.0.1dev
Numpy 1.5.1
Scipy 0.8.0
Speed is numpy (or scipy) time divided by dsna time
>>> bn.benchit(verbose=False)
Bottleneck performance benchmark
Bottleneck 0.1.0dev
Numpy 1.5.1
Scipy 0.8.0
Speed is numpy (or scipy) time divided by Bottleneck time
NaN means all NaNs
Speed Test Shape dtype NaN?
4.8103 nansum(a, axis=-1) (500,500) int64
Expand Down Expand Up @@ -119,54 +118,49 @@ functions that have a NumPy/SciPy equivalent. To run the benchmark::
Faster
======

Under the hood dsna uses a separate Cython function for each combination of
ndim, dtype, and axis. A lot of the overhead in ds.max, for example, is
in checking that your axis is within range, converting non-array data to an
Under the hood Bottleneck uses a separate Cython function for each combination
of ndim, dtype, and axis. A lot of the overhead in bn.max(), for example, is
in checking that the axis is within range, converting non-array data to an
array, and selecting the function to use to calculate the maximum.

You can get rid of the overhead by doing all this before you, say, enter
an inner loop::

>>> arr = np.random.rand(10,10)
>>> func, a = ds.func.max_selector(arr, axis=0)
>>> func, a = bn.func.max_selector(arr, axis=0)
>>> func
<built-in function max_2d_float64_axis0>

Let's see how much faster than runs::
Let's see how much faster than runs::
>> timeit np.nanmax(arr, axis=0)
10000 loops, best of 3: 25.7 us per loop
>> timeit ds.max(arr, axis=0)
>> timeit bn.max(arr, axis=0)
100000 loops, best of 3: 5.25 us per loop
>> timeit func(a)
100000 loops, best of 3: 2.5 us per loop

Note that ``func`` is faster than the Numpy's non-nan version of max::
Note that ``func`` is faster than Numpy's non-NaN version of max::
>> timeit arr.max(axis=0)
100000 loops, best of 3: 3.28 us per loop

So adding NaN protection to your inner loops has a negative cost!
So adding NaN protection to your inner loops comes at a negative cost!

Functions
=========

DSNA is in the prototype stage.
Bottleneck is in the prototype stage.

DSNA contains the following functions (an asterisk means not yet complete):
Bottleneck contains the following functions:

========= ============== ===============
sum* move_sum* group_sum*
mean move_mean* group_mean*
var move_var* group_var*
std move_std* group_std*
min move_min* group_min*
max move_max* group_max*
median* move_median* group_median*
zscore* move_zscore* group_zscore*
ranking* move_ranking* group_ranking*
quantile* move_quantile* group_quantile*
count* move_count* group_count*
sum move_sum
mean group_mean
var
std
min
max
========= ============== ===============

Currently only 1d, 2d, and 3d NumPy arrays with dtype int32, int64, and
Expand All @@ -175,23 +169,24 @@ float64 are supported.
License
=======

DSNA is distributed under a Simplified BSD license. Parts of NumPy and Scipy,
which both have BSD licenses, are included in dsna. See the LICENSE file,
which is distributed with dsna, for details.
Bottleneck is distributed under a Simplified BSD license. Parts of NumPy,
Scipy and numpydoc, all of which have BSD licenses, are included in
Bottleneck. See the LICENSE file, which is distributed with Bottleneck, for
details.

Install
=======
Download and install
====================

You can grab dsna from http://github.com/kwgoodman/dsna
You can grab Bottleneck from http://github.com/kwgoodman/bottleneck

**GNU/Linux, Mac OS X, et al.**

To install dsna::
To install Bottleneck::

$ python setup.py build
$ sudo python setup.py install
Or, if you wish to specify where dsna is installed, for example inside
Or, if you wish to specify where Bottleneck is installed, for example inside
``/usr/local``::

$ python setup.py build
Expand All @@ -210,10 +205,10 @@ commands::

**Post install**

After you have installed dsna, run the suite of unit tests::
After you have installed Bottleneck, run the suite of unit tests::

>>> import dsna
>>> dsna.test()
>>> import bottleneck as bn
>>> bn.test()
<snip>
Ran 10 tests in 13.756s
OK
Expand Down
8 changes: 4 additions & 4 deletions RELEASE.rst
Expand Up @@ -4,11 +4,11 @@ Release Notes
=============

These are the major changes made in each release. For details of the changes
see the commit log at http://github.com/kwgoodman/dsna
see the commit log at http://github.com/kwgoodman/bottleneck

dsna 0.1.0
==========
Bottleneck 0.1.0
================

*Release date: Not yet released, in development*

The first release of dsna (descriptive statistics of NumPy arrays).
The first release of Bottleneck.
15 changes: 8 additions & 7 deletions dsna/LICENSE → bottleneck/LICENSE
Expand Up @@ -2,11 +2,12 @@
License
=======

DSNA is distributed under a Simplified BSD license. Parts of NumPy, SciPy and
and numpydoc, which all have BSD licenses, are included in dsna.
Bottleneck is distributed under a Simplified BSD license. Parts of NumPy,
SciPy and and numpydoc, which all have BSD licenses, are included in
Bottleneck.

DSNA license
============
Bottleneck license
==================

Copyright (c) 2010, Archipel Asset Management AB.
All rights reserved.
Expand Down Expand Up @@ -37,8 +38,8 @@ POSSIBILITY OF SUCH DAMAGE.
Other licenses
==============

DSNA contains doc strings from NumPy and SciPy and Sphinx extensions from
numpydoc.
Bottleneck contains doc strings from NumPy and SciPy and Sphinx extensions
from numpydoc.


NumPy license
Expand Down Expand Up @@ -115,4 +116,4 @@ DAMAGE.
numpydoc license
----------------

The numpydoc license is in dsna/doc/sphinxext/LICENSE.txt
The numpydoc license is in bottleneck/doc/sphinxext/LICENSE.txt
6 changes: 3 additions & 3 deletions dsna/__init__.py → bottleneck/__init__.py
Expand Up @@ -3,12 +3,12 @@
from move import move_sum
from group import group_mean

from dsna.version import __version__
from dsna.bench.bench import *
from bottleneck.version import __version__
from bottleneck.bench.bench import *

try:
from numpy.testing import Tester
test = Tester().test
del Tester
except (ImportError, ValueError):
print "No dsna unit testing available."
print "No Bottleneck unit testing available."
File renamed without changes.
File renamed without changes.
40 changes: 20 additions & 20 deletions dsna/bench/bench.py → bottleneck/bench/bench.py
@@ -1,7 +1,7 @@

import numpy as np
import scipy
import dsna as ds
import bottleneck as bn
from autotimeit import autotimeit

__all__ = ['benchit']
Expand Down Expand Up @@ -35,20 +35,20 @@ def suite():
statements = {}
setups = {}

setups['(10000,) float64'] = "import numpy as np; import scipy.stats as sp; import dsna as ds; from dsna.bench.bench import geta; N=10000; a = geta((N,), 'float64')"
setups['(500,500) float64'] = "import numpy as np; import scipy.stats as sp; import dsna as ds; from dsna.bench.bench import geta; N=500; a = geta((N, N), 'float64')"
setups['(10000,) float64 NaN'] = "import numpy as np; import scipy.stats as sp; import dsna as ds; from dsna.bench.bench import geta; N=10000; a = geta((N,), 'float64', True)"
setups['(500,500) float64 NaN'] = "import numpy as np; import scipy.stats as sp; import dsna as ds; from dsna.bench.bench import geta; N=500; a = geta((N, N), 'float64', True)"
setups['(10000,) int32'] = "import numpy as np; import scipy.stats as sp; import dsna as ds; from dsna.bench.bench import geta; N=10000; a = geta((N,), 'int32')"
setups['(500,500) int32'] = "import numpy as np; import scipy.stats as sp; import dsna as ds; from dsna.bench.bench import geta; N=500; a = geta((N, N), 'int32')"
setups['(10000,) int64'] = "import numpy as np; import scipy.stats as sp; import dsna as ds; from dsna.bench.bench import geta; N=10000; a = geta((N,), 'int64')"
setups['(500,500) int64'] = "import numpy as np; import scipy.stats as sp; import dsna as ds; from dsna.bench.bench import geta; N=500; a = geta((N, N), 'int64')"
setups['(10000,) float64'] = "import numpy as np; import scipy.stats as sp; import bottleneck as bn; from bottleneck.bench.bench import geta; N=10000; a = geta((N,), 'float64')"
setups['(500,500) float64'] = "import numpy as np; import scipy.stats as sp; import bottleneck as bn; from bottleneck.bench.bench import geta; N=500; a = geta((N, N), 'float64')"
setups['(10000,) float64 NaN'] = "import numpy as np; import scipy.stats as sp; import bottleneck as bn; from bottleneck.bench.bench import geta; N=10000; a = geta((N,), 'float64', True)"
setups['(500,500) float64 NaN'] = "import numpy as np; import scipy.stats as sp; import bottleneck as bn; from bottleneck.bench.bench import geta; N=500; a = geta((N, N), 'float64', True)"
setups['(10000,) int32'] = "import numpy as np; import scipy.stats as sp; import bottleneck as bn; from bottleneck.bench.bench import geta; N=10000; a = geta((N,), 'int32')"
setups['(500,500) int32'] = "import numpy as np; import scipy.stats as sp; import bottleneck as bn; from bottleneck.bench.bench import geta; N=500; a = geta((N, N), 'int32')"
setups['(10000,) int64'] = "import numpy as np; import scipy.stats as sp; import bottleneck as bn; from bottleneck.bench.bench import geta; N=10000; a = geta((N,), 'int64')"
setups['(500,500) int64'] = "import numpy as np; import scipy.stats as sp; import bottleneck as bn; from bottleneck.bench.bench import geta; N=500; a = geta((N, N), 'int64')"

# DSNA
s = ['ds.sum(a, axis=-1)', 'ds.max(a, axis=-1)',
'ds.min(a, axis=-1)', 'ds.mean(a, axis=-1)',
'ds.std(a, axis=-1)']
statements['dsna'] = s
# Bottleneck
s = ['bn.sum(a, axis=-1)', 'bn.max(a, axis=-1)',
'bn.min(a, axis=-1)', 'bn.mean(a, axis=-1)',
'bn.std(a, axis=-1)']
statements['bottleneck'] = s

# Numpy
s = ['np.nansum(a, axis=-1)', 'np.nanmax(a, axis=-1)',
Expand All @@ -60,14 +60,14 @@ def suite():

def display(results):
results = list(results)
na = [i for i in results if i[0].startswith('ds.')]
na = [i for i in results if i[0].startswith('bn.')]
nu = [i for i in results if i[0].startswith('np.') or
i[0].startswith('sp.')]
print 'DSNA performance benchmark'
print "\tDSNA %s" % ds.__version__
print "\tNumpy %s" % np.__version__
print "\tScipy %s" % scipy.__version__
print "\tSpeed is numpy (or scipy) time divided by dsna time"
print 'Bottleneck performance benchmark'
print "\tBottleneck %s" % bn.__version__
print "\tNumpy %s" % np.__version__
print "\tScipy %s" % scipy.__version__
print "\tSpeed is numpy (or scipy) time divided by Bottleneck time"
print "\tNaN means all NaNs"
print " Speed Test Shape dtype NaN?"
for nai in na:
Expand Down
4 changes: 2 additions & 2 deletions dsna/src/Makefile → bottleneck/src/Makefile
Expand Up @@ -31,10 +31,10 @@ groups:
mv group.so ../group.so

test:
python -c "import dsna; dsna.test()"
python -c "import bottleneck; bottleneck.test()"

bench:
python -c "import dsna; dsna.benchit()"
python -c "import bottleneck; bottleneck.benchit()"

# Phony targets for cleanup and similar uses

Expand Down

0 comments on commit 9df1719

Please sign in to comment.