PERF: some regressions #11084

jreback · 2015-09-13T00:08:24Z

http://pydata.github.io/pandas/# is a view since 0.14 (its not every tag, but a sampling).
The regressions pages is now working here

timeseries / period related:

jorisvandenbossche · 2015-09-20T22:41:30Z

@sinhrks I was looking at the time series plotting slowdown (time_plot_timeseries_period, there is a ca 5 times slowdown in timeseries plotting since 0.16.2)

It is related to some of the Period changes, namely that freq is no longer a string but a DateOffset object.
If you profile df.plot(), most of the time is cause by to_offset. At a certain point (in converter.py:convert), a object dtyped array of Period objects is converted back to a PeriodIndex:

In [1]: values = pd.period_range('1/1/1975', periods=2000).astype(object).values

In [2]: values
Out[2]:
array([Period('1975-01-01', 'D'), Period('1975-01-02', 'D'),
       Period('1975-01-03', 'D'), ..., Period('1980-06-20', 'D'),
       Period('1980-06-21', 'D'), Period('1980-06-22', 'D')], dtype=object)

In [3]: %timeit pd.PeriodIndex(values, freq='D')
100 loops, best of 3: 1.86 ms per loop

Above is with 0.16.2, on master this gives me 109 ms instead of 1.86 ms. Reason for the slowdown is that PeriodIndex._from_arraylike will try to extract the freq from each object, and checks if the freq is equal to the given freq. Previously this was a string equality check, now a DateOffset/string equality check.

Now, looking for a possible fix, this commit: jorisvandenbossche@55ecbf0 (making it compare strings again) does solve the perf issue for a big part. But I was wondering, do you know a better approach?
Maybe we could prevent this step (array of Period objects -> PeriodIndex) altogether in the plotting code? (although this is initially called from

sinhrks · 2015-09-21T00:13:45Z

Thanks for catching this. In addition to your ideas, caching str -> freq mapping in to_offset may work. This is already done in get_offset, and I think the same cache can be used.

Let me look into this.

jreback · 2020-01-01T15:55:38Z

closing as stale

jreback added Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version labels Sep 13, 2015

jreback modified the milestones: 0.17.0, 0.17.1 Sep 13, 2015

jorisvandenbossche modified the milestones: 0.17.0, 0.17.1 Sep 20, 2015

jreback modified the milestones: 0.17.1, 0.17.0 Sep 20, 2015

chris-b1 mentioned this issue Sep 21, 2015

PERF: nested dict DataFrame construction #11158

Merged

jorisvandenbossche mentioned this issue Sep 26, 2015

PERF: compare freq strings (timeseries plotting perf) #11194

Merged

jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015

jreback closed this as completed Jan 1, 2020

jreback modified the milestones: Contributions Welcome, No action Jan 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: some regressions #11084

PERF: some regressions #11084

jreback commented Sep 13, 2015

jorisvandenbossche commented Sep 20, 2015

sinhrks commented Sep 21, 2015

jreback commented Jan 1, 2020

PERF: some regressions #11084

PERF: some regressions #11084

Comments

jreback commented Sep 13, 2015

jorisvandenbossche commented Sep 20, 2015

sinhrks commented Sep 21, 2015

jreback commented Jan 1, 2020