New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Speedup minute to session sampling. #1549

Merged
merged 1 commit into from Oct 24, 2016

Conversation

Projects
None yet
2 participants
@ehebert
Member

ehebert commented Oct 20, 2016

The minute to session sampling reading was creating two DataFrame
objects, the first to hold the minute data, and then a second returned
by the DataFrame.groupby to sample down to sessions.

Instead use the arrays returned by the minute readers load_raw_arrays
and implement sampling logic which takes advantage that the minutes
being passed start with the first minute of the first session and end
with the last minute of the last session.

On my machine this takes the tests in test/test_continuous_futures
from ~4.0 to about ~0.1 seconds.

@ehebert

This comment has been minimized.

Member

ehebert commented Oct 20, 2016

This can be further improved by using Cython to get direct access to the numpy arrays. However this patch already has an order of magnitude speed up, so starting with this change.

EDIT: This PR now includes a Cython extension providing _minute_to_session_open, etc.

@coveralls

This comment has been minimized.

coveralls commented Oct 20, 2016

Coverage Status

Coverage increased (+0.04%) to 86.933% when pulling 255ea2d on speedup-resample into 7c72eef on master.

@ehebert ehebert force-pushed the speedup-resample branch from 255ea2d to cbd6170 Oct 21, 2016

@coveralls

This comment has been minimized.

coveralls commented Oct 21, 2016

Coverage Status

Coverage increased (+0.04%) to 86.937% when pulling cbd6170 on speedup-resample into 7c72eef on master.

@ehebert ehebert force-pushed the speedup-resample branch from cbd6170 to 7773b76 Oct 21, 2016

@coveralls

This comment has been minimized.

coveralls commented Oct 21, 2016

Coverage Status

Coverage increased (+0.08%) to 86.97% when pulling 7773b76 on speedup-resample into 7c72eef on master.

PERF: Speedup minute to session sampling.
The minute to session sampling reading was creating two DataFrame
objects, the first to hold the minute data, and then a second returned
by the `DataFrame.groupby` to sample down to sessions.

Instead use the arrays returned by the minute readers `load_raw_arrays`
and implement sampling logic which takes advantage that the minutes
being passed start with the first minute of the first session and end
with the last minute of the last session.

On my machine this takes the tests in `test/test_continuous_futures`
from ~4.0 to about ~0.1 seconds.

@ehebert ehebert force-pushed the speedup-resample branch from 7773b76 to a4205a0 Oct 24, 2016

@coveralls

This comment has been minimized.

coveralls commented Oct 24, 2016

Coverage Status

Coverage increased (+0.01%) to 86.97% when pulling a4205a0 on speedup-resample into 54ebd9e on master.

@ehebert ehebert merged commit 506832e into master Oct 24, 2016

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ehebert ehebert deleted the speedup-resample branch Oct 24, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment