Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
PERF: Period factorization very slow in 0.19.0 #14338
Comments
|
Can you simplify the example here to the simplest possible setup? e.g., by removing the |
|
It's not the In [24]: %time p = pd.DatetimeIndex(df.date).to_period('D')
CPU times: user 339 ms, sys: 6.71 ms, total: 345 ms
Wall time: 350 msAlso, post the actual code you're running if you could (imports too). |
|
Here's a smaller example, import time
import pandas as pd
p = pd.period_range('2010-01-01', freq='D', periods=100000)
t0 = time.time()
pd.factorize(p)
t1 = time.time()
print('{}: {:.2f}s'.format(pd.__version__, t1 - t0))Some outputs:
|
|
This is probably due to # 0.18.1
In [5]: np.asarray(p)
Out[5]: array([ 14610, 14611, 14612, ..., 114607, 114608, 114609])# 0.19
In [4]: np.asarray(p)
Out[4]:
array([Period('2010-01-01', 'D'), Period('2010-01-02', 'D'),
Period('2010-01-03', 'D'), ..., Period('2283-10-14', 'D'),
Period('2283-10-15', 'D'), Period('2283-10-16', 'D')], dtype=object)cc @sinhrks I think. |
|
Probably just need a check similar to datetimetz around here to view as an https://github.com/pydata/pandas/blob/v0.19.0/pandas/core/algorithms.py#L294 |
chris-b1
changed the title from
to_period now very slow in 0.19.0 to PERF: Period factorization very slow in 0.19.0
Oct 3, 2016
chris-b1
added Performance Regression
labels
Oct 3, 2016
chris-b1
added this to the
0.19.1
milestone
Oct 3, 2016
|
@MattRijk Personally, I use SublimeText, usually just on a laptop. But this is off topic for this issue. |
jreback
added the
Period
label
Oct 3, 2016
|
yeah this is a pretty easy fix, IIRC this was in @sinhrks PeriodBlock PR, but must have been backed out...something like
|
jreback
added Difficulty Novice Effort Low
labels
Oct 3, 2016
bmoscon
referenced
this issue
in manahl/arctic
Oct 3, 2016
Closed
Chunkstore to_chunks very slow in latest version of pandas #252
|
Caused by #13988. I think the logic of period/datetimetz can be merged using And the following comment is no longer correct... |
|
Looks like a 0.19.1 may be close around the corner... |
bmoscon commentedOct 3, 2016
Expected Output
outputs dataframe
Output of
pd.show_versions()0.19.0
The output is not the issue, the issue is that in any version before 0.19.0, this was incredibly fast, like ~1 second or less. With 0.19.0, after waiting many minutes I just give up.