BUG: fix PeriodConverter issue when given a list of integers (GH9012) #9050

jorisvandenbossche · 2014-12-10T12:38:24Z

For example coming up when [0, 1] is passed by axhline

test and whatsnew are coming

jreback · 2014-12-10T13:09:56Z

the perf diff was massive
you need to find a better way

jreback · 2014-12-10T14:05:01Z

something like this

In [8]: from pandas.tseries.converter import get_datevalue

In [9]: pr = pd.period_range('2000',periods=10000,freq='D')

In [10]: [ get_datevalue(v,pr.freq) for v in pr ][0:10]
Out[10]: [10957, 10958, 10959, 10960, 10961, 10962, 10963, 10964, 10965, 10966]

In [11]: pd.PeriodIndex(pr).values
Out[11]: array([10957, 10958, 10959, ..., 20954, 20955, 20956])

In [12]: %timeit pd.PeriodIndex(pr).values 
100000 loops, best of 3: 3.96 us per loop

In [13]: %timeit [ get_datevalue(v,pr.freq) for v in pr ]      
10 loops, best of 3: 116 ms per loop

Though i don't think this is actually causing the bug, they return the same results AFAICT

jorisvandenbossche · 2014-12-10T14:13:44Z

In this case they return the same (as it are periods), but not when you provide eg ints:

In [2]: values = [0, 1]

In [3]: [ get_datevalue(v, 'B') for v in values ]
Out[3]: [0, 1]

In [4]: pd.PeriodIndex(values, freq='D').values
...
DateParseError: day is out of range for month

jorisvandenbossche · 2014-12-10T14:16:09Z

And the thing is, if it is a PeriodIndex, it is already special cased: https://github.com/pydata/pandas/blob/v0.15.1/pandas/tseries/converter.py#L116, so this will not impact perfomance.

But, I think if you just do ts.plot() the x values are not provided as a PeriodIndex, but as an ndarray of Periods, so that hits the line in question, and in this way it will indeed impact performance.

jreback · 2014-12-10T14:18:13Z

ok, so obviously need a test for that, I thing should be straightforward to fix. The thing is to avoid reinterpreting the freq when you already have a PeriodIndex.

jorisvandenbossche · 2014-12-10T14:23:44Z

Maybe specifically check for an array of Periods?

In [38]: pr = pd.period_range('2000',periods=10000,freq='D')

In [39]: a = np.array([p for p in pr])

In [40]: a
Out[40]:
array([Period('2000-01-01', 'D'), Period('2000-01-02', 'D'),
       Period('2000-01-03', 'D'), ..., Period('2027-05-16', 'D'),
       Period('2027-05-17', 'D'), Period('2027-05-18', 'D')], dtype=object)

In [41]: pd.lib.infer_dtype(a)
Out[41]: 'period'

In [42]: pd.core.common.is_period_arraylike(a)
Out[42]: True

This is the performance difference (that you fixed for the general time series plotting):

In [49]: pd.PeriodIndex(a, freq='D').values
Out[49]: array([10957, 10958, 10959, ..., 20954, 20955, 20956], dtype=int64)

In [50]: [get_datevalue(x, 'D') for x in a][0:5]
Out[50]: [10957L, 10958L, 10959L, 10960L, 10961L]

In [51]: %timeit pd.PeriodIndex(a, freq='D').values
100 loops, best of 3: 7.88 ms per loop

In [52]: %timeit [get_datevalue(x, 'D') for x in a]
10 loops, best of 3: 167 ms per loop

In [53]: s = pd.Series(np.random.randn(10000), index=pd.date_range('2012-01-01', periods=10000, freq='s'))

# current master
In [36]: %%timeit
   ....: fig, ax = plt.subplots()
   ....: s.plot(ax=ax)

1 loops, best of 3: 207 ms per loop

# patched with how it was before
In [38]: %%timeit
   ....: fig, ax = plt.subplots()
   ....: s.plot(ax=ax)

1 loops, best of 3: 353 ms per loop

jreback · 2014-12-10T14:32:14Z

you prob can use isinstance(a, np.ndarray) and is_period_arraylike(a)

because then a PeriodIndex would be excluded

jorisvandenbossche · 2014-12-10T14:36:17Z

yes, but I don't think that is needed, because: PeriodIndex is already checked for in an if before that line, and also lists of Periods can go directly to PeriodIndex(..) I think

jreback · 2014-12-10T14:36:44Z

ok then!

jorisvandenbossche · 2014-12-10T14:38:03Z

ah, but is_period_arraylike does not seem to handle lists of Periods (gives False)

jreback · 2014-12-10T14:39:49Z

In [1]: pr = pd.period_range('20130101',periods=3,freq='D')

In [2]: pr.tolist()
Out[2]: 
[Period('2013-01-01', 'D'),
 Period('2013-01-02', 'D'),
 Period('2013-01-03', 'D')]

In [3]: np.array(pr.tolist())
Out[3]: 
array([Period('2013-01-01', 'D'), Period('2013-01-02', 'D'),
       Period('2013-01-03', 'D')], dtype=object)

In [4]: pandas.core.common.is_period_arraylike(np.array(pr.tolist()))
Out[4]: True

really

jorisvandenbossche · 2014-12-10T14:42:29Z

yeah, but I expected it to handle lists without having to convert it to array:

In [67]: pandas.core.common.is_period_arraylike(pr.tolist())
Out[67]: False

But in any case, not going to worry about that particular case for now

jreback · 2014-12-10T14:42:49Z

ahh, ok, that makes sense then (it assuem ndarray ness)

jorisvandenbossche · 2014-12-10T14:46:08Z

OK, updated the PR, no time now to further look at it, but will try to add tests later this evening

jreback · 2014-12-10T14:46:30Z

ok, will try to do the release tonight in that case

For example coming up when [0, 1] is passed by axhline Restore old behaviour, but special case arrays of Periods to keep the performance gain of using PeriodIndex instead of get_datevalue for that case.

jreback · 2014-12-11T01:34:14Z

merged via b69b9d3

jorisvandenbossche added the Visualization plotting label Dec 10, 2014

jorisvandenbossche added this to the 0.15.2 milestone Dec 10, 2014

jorisvandenbossche force-pushed the gh9012 branch from 71418ea to 879d6db Compare December 10, 2014 14:44

BUG: fix PeriodConverter issue when given a list of integers (GH9012)

200f6fa

For example coming up when [0, 1] is passed by axhline Restore old behaviour, but special case arrays of Periods to keep the performance gain of using PeriodIndex instead of get_datevalue for that case.

jorisvandenbossche force-pushed the gh9012 branch from 879d6db to 200f6fa Compare December 10, 2014 14:48

jreback closed this Dec 11, 2014

jreback mentioned this pull request Dec 11, 2014

Plotting problem in pandas 0.15.1 for DataFrame with datetime-index #9012

Closed

jorisvandenbossche mentioned this pull request Dec 11, 2014

TST: start tests for PeriodConverter #9055

Merged

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: fix PeriodConverter issue when given a list of integers (GH9012) #9050

BUG: fix PeriodConverter issue when given a list of integers (GH9012) #9050

jorisvandenbossche commented Dec 10, 2014

jreback commented Dec 10, 2014

jreback commented Dec 10, 2014

jorisvandenbossche commented Dec 10, 2014

jorisvandenbossche commented Dec 10, 2014

jreback commented Dec 10, 2014

jorisvandenbossche commented Dec 10, 2014

jreback commented Dec 10, 2014

jorisvandenbossche commented Dec 10, 2014

jreback commented Dec 10, 2014

jorisvandenbossche commented Dec 10, 2014

jreback commented Dec 10, 2014

jorisvandenbossche commented Dec 10, 2014

jreback commented Dec 10, 2014

jorisvandenbossche commented Dec 10, 2014

jreback commented Dec 10, 2014

jreback commented Dec 11, 2014

BUG: fix PeriodConverter issue when given a list of integers (GH9012) #9050

BUG: fix PeriodConverter issue when given a list of integers (GH9012) #9050

Conversation

jorisvandenbossche commented Dec 10, 2014

jreback commented Dec 10, 2014

jreback commented Dec 10, 2014

jorisvandenbossche commented Dec 10, 2014

jorisvandenbossche commented Dec 10, 2014

jreback commented Dec 10, 2014

jorisvandenbossche commented Dec 10, 2014

jreback commented Dec 10, 2014

jorisvandenbossche commented Dec 10, 2014

jreback commented Dec 10, 2014

jorisvandenbossche commented Dec 10, 2014

jreback commented Dec 10, 2014

jorisvandenbossche commented Dec 10, 2014

jreback commented Dec 10, 2014

jorisvandenbossche commented Dec 10, 2014

jreback commented Dec 10, 2014

jreback commented Dec 11, 2014