Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: vectorized DateOffset with months #11205

Merged
merged 1 commit into from
Oct 2, 2015

Conversation

chris-b1
Copy link
Contributor

This is a follow-up to #10744. In that, vectorized versions of some offsets were implemented, mostly by changing to periods and back.

The case of shifting by years/months (which is actually most useful to me) required some extra hoops and had poorer performance - this PR implements a special cython routine for that, for about a 10x improvement.

In [3]: s = pd.Series(pd.date_range('1900-1-1', periods=100000))

# Master
In [4]: %timeit s + pd.DateOffset(months=1)
1 loops, best of 3: 140 ms per loop

# PR
In [2]: %timeit s + pd.DateOffset(months=1)
100 loops, best of 3: 14.2 ms per loo

to_timedelta(base.days_in_month - 1, unit='D'))
i = base + day_offset + time
shifted = tslib.shift_months(i.asi8, months)
i = i._constructor(shifted)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will lose the name of the Index - I think you want to use _shallow_copy:

In [57]: df.index._constructor(df.index)
Out[57]: 
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
            19],
           dtype='int64')

In [58]: df.index._shallow_copy(df.index)
Out[58]: 
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
            19],
           dtype='int64', name='hi')

(there's a broader issue about the constructor keeping the name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually doesn't cause a problem in this case because the values are going to be unboxed/boxed here anyways:
https://github.com/pydata/pandas/blob/master/pandas/tseries/index.py#L716

But I'll change it if it's the more idiomatic way to get the constructor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes shallow copy is the idiom

@chris-b1
Copy link
Contributor Author

@jreback - pushed changes for your comments and added MonthEnd and MonthBegin

For your comment about using this routine in the DateOffset are you talking about cythonizing the actual apply method too? On that front, I see there is an offsets.pyx file, but doesn't look it's actually used?

@jreback jreback added Performance Memory or execution speed performance Frequency DateOffsets labels Oct 1, 2015
@@ -4386,6 +4387,73 @@ cpdef normalize_date(object dt):
raise TypeError('Unrecognized type: %s' % type(dt))


cdef inline int _year_add_months(pandas_datetimestruct dts,
int months):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string to these

@jreback
Copy link
Contributor

jreback commented Oct 1, 2015

@jreback jreback added this to the 0.17.0 milestone Oct 1, 2015
@chris-b1
Copy link
Contributor Author

chris-b1 commented Oct 1, 2015

Made those doc changes. Yeah, there is an asv for DateOffset

    before     after       ratio
  [8ea8968 ] [c1d53f9 ]
-  121.83ms    18.49ms      0.15 timeseries.timeseries_datetimeindex_offset_fast.time_timeseries_datetimeindex_offset_fast
-  121.50ms    21.70ms      0.18  timeseries.timeseries_series_offset_fast.time_timeseries_series_offset_fast

jreback added a commit that referenced this pull request Oct 2, 2015
PERF: vectorized DateOffset with months
@jreback jreback merged commit 9fc9201 into pandas-dev:master Oct 2, 2015
@jreback
Copy link
Contributor

jreback commented Oct 2, 2015

@chris-b1 thanks! these pr's are awesome! keep em coming!

@chris-b1 chris-b1 deleted the faster-offsets branch October 4, 2015 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants