Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
PERF: Updated andrews_curves to use Numpy arrays for its samples #11534
Conversation
khs26
changed the title from
Updated andrews_curves to use Numpy arrays for its samples to PERF: Updated andrews_curves to use Numpy arrays for its samples
Nov 6, 2015
sinhrks
added the
Visualization
label
Nov 7, 2015
|
lgtm. @TomAugspurger ? |
sinhrks
added the
Performance
label
Nov 7, 2015
|
Yeah. @khs26 could you add an item to the release notes under enhancements? |
jreback
added this to the
0.17.1
milestone
Nov 7, 2015
|
@khs26 are those times in seconds? |
jreback
and 1 other
commented on an outdated diff
Nov 7, 2015
| x1 = amplitudes[0] | ||
| result = x1 / sqrt(2.0) | ||
| harmonic = 1.0 | ||
| for x_even, x_odd in zip(amplitudes[1::2], amplitudes[2::2]): | ||
| - result += (x_even * sin(harmonic * x) + | ||
| - x_odd * cos(harmonic * x)) | ||
| + result += (x_even * np.sin(harmonic * t) + |
jreback
Contributor
|
|
jreback
commented on an outdated diff
Nov 8, 2015
| @@ -64,6 +64,7 @@ Performance Improvements | ||
| - Improved performance of ``rolling_median`` (:issue:`11450`) | ||
| - Improved performance to ``to_excel`` (:issue:`11352`) | ||
| +- Improved performance of ``andrews_curves`` |
jreback
Contributor
|
|
ok, some comments. Let's create another issue that shows the performance implications (after this PR), and maybe you (or another brave soul) can address in the future. |
khs26
referenced
this pull request
Nov 8, 2015
Closed
PERF: Use numpy arrays in andrews_curves plots #11554
|
I actually found a bit of time to fix it up. I'm going to update the branch shortly. This change didn't really change things hugely. I suspected it would only make a significant difference on wide dataframes (i.e. those with a large number of coefficients). I also think it would be pushing the utility of Andrews plots as a visualisation technique. It's definitely a close thing between a slight performance increase, with slightly lessened readability. Here are the timings:
|
|
I've left it as a separate commit for the moment, but will squash it, if we're happy with it. |
|
can you add an asv benchmark for this? http://pandas.pydata.org/pandas-docs/stable/contributing.html#running-the-performance-test-suite otherwise looks good. |
jreback
commented on the diff
Nov 13, 2015
| @@ -560,14 +575,14 @@ def f(x): | ||
| for i in range(n): |
jreback
Contributor
|
jreback
modified the milestone: Next Major Release, 0.17.1
Nov 18, 2015
|
@khs26 can you rebase / update |
|
Sorry that took a while, been quite busy. I ran a profile on it, and it turns out that the existing changes make the slowest part be drawing in I added an |
|
Also rebased the commit on top of the latest master, so it should be good to merge. |
jreback
commented on an outdated diff
Nov 22, 2015
| @@ -146,6 +146,7 @@ Performance Improvements | ||
| - Performance improvement in ``Categorical.remove_unused_categories``, (:issue:`11643`). | ||
| - Improved performance of ``Series`` constructor with no data and ``DatetimeIndex`` (:issue:`11433`) | ||
| - Improved performance of ``shift``, ``cumprod``, and ``cumsum`` with groupby (:issue:`4095`) | ||
| +- Improved performance of ``andrews_curves`` (:issue:`11534`) |
|
|
jreback
modified the milestone: 0.18.0, Next Major Release
Nov 22, 2015
|
looks good. ping on green. |
|
Travis is happy and I'm gonna leave it be now, so can merge whenever you want. |
jreback
added a commit
that referenced
this pull request
Nov 24, 2015
|
|
jreback |
5bc191a
|
jreback
merged commit 5bc191a
into pandas-dev:master
Nov 24, 2015
1 check passed
|
thank you sir! |
khs26
deleted the
khs26:numpify-andrews-curves branch
Nov 24, 2015
jreback
added a commit
that referenced
this pull request
Nov 29, 2015
|
|
jreback |
d9e679a
|
khs26 commentedNov 6, 2015
Hello,
I hope I've followed the contribution guidelines correctly, but am happy to change things if necessary.
I noticed that andrews_curves doesn't make use of numpy arrays in what I thought was a sensible use case: for generating its samples.
I added a test which uses variable length random data, so that I could check the timing changes between the numpy and non-numpy versions and found the following (rough data):
The test adds some overhead (though it is decorated with
@slow), so I'm happy to amend the commit and remove it. Otherwise, the changes seem to have resulted in a small speed up, which becomes more important for larger data (my original motivation, since I was trying to do it with a 100k x 5 dataframe).Thanks,
Kyle