New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN/PERF: remove used functions; use C skip list for rolling median #11450

Merged
merged 1 commit into from Nov 2, 2015

Conversation

Projects
None yet
2 participants
@kawochen
Contributor

kawochen commented Oct 28, 2015

removes some unused code
reverts this commit a40226e

performance consideration

import pandas
import numpy
arr = numpy.random.rand(1000000)
%timeit pandas.rolling_median(arr, 1000)

master

1 loops, best of 3: 4.94 s per loop

branch

1 loops, best of 3: 821 ms per loop

@jreback jreback added this to the 0.17.1 milestone Oct 28, 2015

@jreback

View changes

Show outdated Hide outdated doc/source/whatsnew/v0.17.1.txt Outdated
}
static void node_destroy(node_t *node) {
int i;
if (node) {
if (node->ref_count == 1) {
if (node->ref_count <= 1) {

This comment has been minimized.

@jreback

jreback Oct 28, 2015

Contributor

are sure about this?

@jreback

jreback Oct 28, 2015

Contributor

are sure about this?

This comment has been minimized.

@kawochen

kawochen Oct 28, 2015

Contributor

I think it makes clean-up code cleaner. I will write something to test for leaks a bit later.

@kawochen

kawochen Oct 28, 2015

Contributor

I think it makes clean-up code cleaner. I will write something to test for leaks a bit later.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 28, 2015

Contributor

you may be able to insert nogil (at the top-level), and make almost all of skiplist.pyx nogil (obviously where you raise you have re-acquire).

Contributor

jreback commented Oct 28, 2015

you may be able to insert nogil (at the top-level), and make almost all of skiplist.pyx nogil (obviously where you raise you have re-acquire).

@kawochen

This comment has been minimized.

Show comment
Hide comment
@kawochen

kawochen Oct 30, 2015

Contributor
In [2]: import pandas
import numpy
arr = numpy.random.rand(1000000)

In [4]: from pandas.util import testing

In [5]: @testing.test_parallel(2)
def g():
    pandas.rolling_median(arr, 1000)
   ...:     

In [6]: @testing.test_parallel(1)
def f():
    pandas.rolling_median(arr, 1000)


In [7]: %timeit f()
1 loops, best of 3: 752 ms per loop


In [8]: %timeit g()
1 loops, best of 3: 972 ms per loop
Contributor

kawochen commented Oct 30, 2015

In [2]: import pandas
import numpy
arr = numpy.random.rand(1000000)

In [4]: from pandas.util import testing

In [5]: @testing.test_parallel(2)
def g():
    pandas.rolling_median(arr, 1000)
   ...:     

In [6]: @testing.test_parallel(1)
def f():
    pandas.rolling_median(arr, 1000)


In [7]: %timeit f()
1 loops, best of 3: 752 ms per loop


In [8]: %timeit g()
1 loops, best of 3: 972 ms per loop
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 30, 2015

Contributor

can you add an asv benchmark? not sure we have much for rolling in general......

Contributor

jreback commented Oct 30, 2015

can you add an asv benchmark? not sure we have much for rolling in general......

@jreback

View changes

Show outdated Hide outdated doc/source/whatsnew/v0.17.1.txt Outdated
@kawochen

This comment has been minimized.

Show comment
Hide comment
@kawochen

kawochen Oct 30, 2015

Contributor

OK. might as well release the GIL on roll_*

Contributor

kawochen commented Oct 30, 2015

OK. might as well release the GIL on roll_*

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Oct 30, 2015

Contributor

yep - same or another

Contributor

jreback commented Oct 30, 2015

yep - same or another

@kawochen

This comment has been minimized.

Show comment
Hide comment
@kawochen

kawochen Nov 1, 2015

Contributor

Added asv benchmarks for the gil release. But I can't get any of those to show in asv bench at all. Have I written the tests incorrectly?
I do asv continuous upstream/master HEAD -b gil while on my branch.

A simple timeit does show the improvement, e.g.
for rolling_mean

import pandas
import numpy
from pandas.util.testing import test_parallel
arr = numpy.random.rand(1000000)
@test_parallel(num_threads=2)
def f():
     pandas.rolling_mean(arr, 100)
%timeit f()

branch

100 loops, best of 3: 12.5 ms per loop

master

10 loops, best of 3: 19.5 ms per loop

for rolling_kurt
branch

10 loops, best of 3: 65.5 ms per loop

master

10 loops, best of 3: 98.8 ms per loop

the time_rolling_median in stat_ops.py shows the same 5x improvement I get using timeit.

Contributor

kawochen commented Nov 1, 2015

Added asv benchmarks for the gil release. But I can't get any of those to show in asv bench at all. Have I written the tests incorrectly?
I do asv continuous upstream/master HEAD -b gil while on my branch.

A simple timeit does show the improvement, e.g.
for rolling_mean

import pandas
import numpy
from pandas.util.testing import test_parallel
arr = numpy.random.rand(1000000)
@test_parallel(num_threads=2)
def f():
     pandas.rolling_mean(arr, 100)
%timeit f()

branch

100 loops, best of 3: 12.5 ms per loop

master

10 loops, best of 3: 19.5 ms per loop

for rolling_kurt
branch

10 loops, best of 3: 65.5 ms per loop

master

10 loops, best of 3: 98.8 ms per loop

the time_rolling_median in stat_ops.py shows the same 5x improvement I get using timeit.

jreback added a commit that referenced this pull request Nov 2, 2015

Merge pull request #11450 from kawochen/CLN-PERF-roll-median
CLN/PERF: remove used functions; use C skip list for rolling median

@jreback jreback merged commit eb66bcc into pandas-dev:master Nov 2, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 2, 2015

Contributor

thanks!

Contributor

jreback commented Nov 2, 2015

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment