CLN/PERF: remove used functions; use C skip list for rolling median #11450

Merged
merged 1 commit into from Nov 2, 2015

Conversation

Projects
None yet
2 participants
Contributor

kawochen commented Oct 28, 2015

removes some unused code
reverts this commit a40226e

performance consideration

import pandas
import numpy
arr = numpy.random.rand(1000000)
%timeit pandas.rolling_median(arr, 1000)

master

1 loops, best of 3: 4.94 s per loop

branch

1 loops, best of 3: 821 ms per loop

jreback added this to the 0.17.1 milestone Oct 28, 2015

@jreback jreback commented on an outdated diff Oct 28, 2015

doc/source/whatsnew/v0.17.1.txt
@@ -58,7 +58,7 @@ Performance Improvements
- Release the GIL on most datetime field operations (e.g. ``DatetimeIndex.year``, ``Series.dt.year``), normalization, and conversion to and from ``Period``, ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` (:issue:`11263`)
-
+- ``rolling_median`` uses c skip list implementation
@jreback

jreback Oct 28, 2015

Contributor

add the pr number (as the issue number)

@jreback jreback commented on the diff Oct 28, 2015

pandas/src/skiplist.h
}
static void node_destroy(node_t *node) {
int i;
if (node) {
- if (node->ref_count == 1) {
+ if (node->ref_count <= 1) {
@jreback

jreback Oct 28, 2015

Contributor

are sure about this?

@kawochen

kawochen Oct 28, 2015

Contributor

I think it makes clean-up code cleaner. I will write something to test for leaks a bit later.

Contributor

jreback commented Oct 28, 2015

you may be able to insert nogil (at the top-level), and make almost all of skiplist.pyx nogil (obviously where you raise you have re-acquire).

Contributor

kawochen commented Oct 30, 2015

In [2]: import pandas
import numpy
arr = numpy.random.rand(1000000)

In [4]: from pandas.util import testing

In [5]: @testing.test_parallel(2)
def g():
    pandas.rolling_median(arr, 1000)
   ...:     

In [6]: @testing.test_parallel(1)
def f():
    pandas.rolling_median(arr, 1000)


In [7]: %timeit f()
1 loops, best of 3: 752 ms per loop


In [8]: %timeit g()
1 loops, best of 3: 972 ms per loop
Contributor

jreback commented Oct 30, 2015

can you add an asv benchmark? not sure we have much for rolling in general......

@jreback jreback commented on an outdated diff Oct 30, 2015

doc/source/whatsnew/v0.17.1.txt
@@ -59,7 +59,7 @@ Performance Improvements
- Release the GIL on most datetime field operations (e.g. ``DatetimeIndex.year``, ``Series.dt.year``), normalization, and conversion to and from ``Period``, ``DatetimeIndex.to_period`` and ``PeriodIndex.to_timestamp`` (:issue:`11263`)
-
+- ``rolling_median`` uses c skip list implementation (:issue:`11450`)
@jreback

jreback Oct 30, 2015

Contributor

say Improved performance of ....

Contributor

kawochen commented Oct 30, 2015

OK. might as well release the GIL on roll_*

Contributor

jreback commented Oct 30, 2015

yep - same or another

Contributor

kawochen commented Nov 1, 2015

Added asv benchmarks for the gil release. But I can't get any of those to show in asv bench at all. Have I written the tests incorrectly?
I do asv continuous upstream/master HEAD -b gil while on my branch.

A simple timeit does show the improvement, e.g.
for rolling_mean

import pandas
import numpy
from pandas.util.testing import test_parallel
arr = numpy.random.rand(1000000)
@test_parallel(num_threads=2)
def f():
     pandas.rolling_mean(arr, 100)
%timeit f()

branch

100 loops, best of 3: 12.5 ms per loop

master

10 loops, best of 3: 19.5 ms per loop

for rolling_kurt
branch

10 loops, best of 3: 65.5 ms per loop

master

10 loops, best of 3: 98.8 ms per loop

the time_rolling_median in stat_ops.py shows the same 5x improvement I get using timeit.

@jreback jreback added a commit that referenced this pull request Nov 2, 2015

@jreback jreback Merge pull request #11450 from kawochen/CLN-PERF-roll-median
CLN/PERF: remove used functions; use C skip list for rolling median
eb66bcc

@jreback jreback merged commit eb66bcc into pandas-dev:master Nov 2, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Contributor

jreback commented Nov 2, 2015

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment