PERF: fastpath on Period construction from PeriodIndex #5155

Closed
jreback opened this Issue Oct 8, 2013 · 11 comments

Comments

Projects
None yet
3 participants
Contributor

jreback commented Oct 8, 2013

vbench already in timeseries/period_setitem
very inefficient because on the boxing of the PeriodIndex (to an object index), it recomputes the freq for each Period (even though it is passed), need a fastpath on this type of construction

In [1]: rng = period_range('1/1/1990', freq='S', periods=20000)

In [2]: df = DataFrame(index=range(len(rng)))

In [3]: def f():
   ...:     df['col'] = rng
   ...:     
   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    40000    0.055    0.000    0.095    0.000 abc.py:128(__instancecheck__)
    20000    0.043    0.000    0.345    0.000 period.py:66(__init__)
    20000    0.037    0.000    0.381    0.000 period.py:683(<lambda>)
    80031    0.037    0.000    0.131    0.000 {isinstance}
    20000    0.034    0.000    0.223    0.000 frequencies.py:76(get_freq_code)
    60003    0.029    0.000    0.029    0.000 _weakrefset.py:68(__contains__)
    20000    0.028    0.000    0.063    0.000 frequencies.py:464(_base_and_stride)
    20000    0.028    0.000    0.028    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
    20000    0.027    0.000    0.038    0.000 frequencies.py:767(_period_str_to_code)
    40000    0.023    0.000    0.144    0.000 common.py:1730(is_integer)
        1    0.016    0.016    0.398    0.398 {pandas.lib.map_infer}
    40007    0.010    0.000    0.010    0.000 {getattr}
    20000    0.009    0.000    0.012    0.000 frequencies.py:112(_get_freq_str)
    60000    0.008    0.000    0.008    0.000 {method 'get' of 'dict' objects}
    40000    0.006    0.000    0.006    0.000 {method 'group' of '_sre.SRE_Match' objects}
    20000    0.004    0.000    0.004    0.000 {method 'lower' of 'str' objects}
    20000    0.002    0.000    0.002    0.000 {method 'upper' of 'str' objects}
    20015    0.002    0.000    0.002    0.000 {len}
Contributor

jtratner commented Oct 15, 2013

@jreback did you resolve this already? I might have to deal with this soon anyways, as I'm about to tackle Period and DatetimeIndex fixups.

Contributor

jreback commented Oct 15, 2013

nope
it shouldn't be too hard; Period needs a fast path where it just sets stuff

it works now but does lots of work when u have a long PeriodIndex

Contributor

jreback commented Oct 15, 2013

IOW it reinterprets the freq code on every Period construction even when u r passing valid values (but it doesn't know that)

Contributor

jtratner commented Oct 15, 2013

You should also make sure you're fastpathing asobject (not sure if it matters for PI though)

Contributor

jreback commented Oct 15, 2013

not planning in this till 0.14

Contributor

jtratner commented Oct 15, 2013

got it

On Mon, Oct 14, 2013 at 11:10 PM, jreback notifications@github.com wrote:

not planning in this till 0.14


Reply to this email directly or view it on GitHubhttps://github.com/pydata/pandas/issues/5155#issuecomment-26306475
.

Contributor

jreback commented Apr 9, 2014

revisit after #5148

Contributor

nehalecky commented Apr 24, 2014

Any relevance: pydata#2949?

Contributor

jreback commented Apr 24, 2014

that's a separate issue...this is a pretty easy fix

Contributor

nehalecky commented Apr 24, 2014

Thanks, just wanted to throw it in the mix. I love (conceptually) PeriodIndexs, but they need a lot of tlc soon to be all they can be. :)

@jreback jreback modified the milestone: 0.15.0, 0.14.0 Apr 25, 2014

jreback referenced this issue Aug 3, 2014

Merged

CLN/INT: remove Index as a sub-class of NDArray #7891

11 of 11 tasks complete

@jreback jreback modified the milestone: 0.15.1, 0.15.0 Aug 3, 2014

Contributor

jreback commented Aug 3, 2014

@nehalecky turns out this was pretty trivial, see #7891

Period constructor has lots of inference going on. So when you know that you simply want to wrap an ordinal/freq its trivial to construct. (you can usually only do this AFTER its constructed, but that's usually when you pay the cost, eg. when passing to matplotlib or setting actual periods in a data frame).

jreback closed this in #7891 Aug 7, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment