Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/PERF: offsets.apply doesnt preserve nanosecond #7697

Merged
merged 1 commit into from
Jul 25, 2014

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Jul 8, 2014

Main Fix is to preserve nanosecond info which can lost during offset.apply, but it also includes:

  • Support dateutil timezone
  • Little performance improvement. Even though v0.14.1 should take longer than v0.14.0 because perf test in v0.14 doesn't perform timestamp conversion which was fixed in BUG: offsets.apply may return datetime #7502.
    NOTE: This caches Tick.delta because it was calculated 3 times repeatedly, but does it cause any side effect?

Before

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
timeseries_year_incr                         |   0.0164 |   0.0103 |   1.5846 |
timeseries_year_apply                        |   0.0153 |   0.0094 |   1.6356 |
timeseries_day_incr                          |   0.0187 |   0.0053 |   3.5075 |
timeseries_day_apply                         |   0.0164 |   0.0033 |   4.9048 |

Target [d0076db] : PERF: Improve index.min and max perf
Base   [da0f7ae] : RLS: 0.14.0 final

After the fix

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
timeseries_year_incr                         |   0.0150 |   0.0087 |   1.7339 |
timeseries_year_apply                        |   0.0126 |   0.0073 |   1.7283 |
timeseries_day_incr                          |   0.0130 |   0.0053 |   2.4478 |
timeseries_day_apply                         |   0.0107 |   0.0033 |   3.2143 |

Target [64dd021] : BUG: offsets.apply doesnt preserve nanosecond
Base   [da0f7ae] : RLS: 0.14.0 final

@@ -26,15 +27,17 @@

# convert to/from datetime/timestamp to allow invalid Timestamp ranges to pass thru
def as_timestamp(obj):
if isinstance(obj, Timestamp):
return obj
if type(obj) == date:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be isinstance(obj, date)? not sure if you need to include (np.datetime64,datetime,date)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because `datetimeis a subclass ofdate``. Modified to use``isinstance``.

>>> dt = datetime.datetime(2011, 1, 1)
>>> isinstance(dt, datetime.date)
True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, Timestamp can accept datetime.date. Thus this check seems not to be required at all.

@jreback jreback added this to the 0.15.0 milestone Jul 8, 2014
@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

@sinhrks revisit?

@sinhrks
Copy link
Member Author

sinhrks commented Jul 21, 2014

@jreback Rebased. Is any other thing required?

@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

  • can you show an example of where this fails (in current master)
  • can you point to where Tick is calculated several times (currently)?

@sinhrks
Copy link
Member Author

sinhrks commented Jul 22, 2014

Affected offset (reset nanosecond)

If the aply logic includes datetime conversion, nanosecond will be lost.

  • CustomBusinessDay
  • CustomBusinessMonthEnd
  • CustomBusinessMonthBegin
  • MonthBegin
  • BusinessMonthBegin
  • MonthEnd
  • BusinessMonthEnd
  • YearBegin
  • BYearBegin
  • YearEnd
  • BYearEnd
  • QuarterBegin
  • BQuarterBegin

Affected offset (dateutil support)

tz.localize raises AttributeError: 'tzfile' object has no attribute 'localize' if tz is dateutil timezone.

  • BQuarterBegin
  • QuarterEnd
  • BQuarterEnd
  • LastWeekOfMonth
  • FY5253Quarter
  • FY5253
  • WeekOfMonth
  • Easter

@sinhrks
Copy link
Member Author

sinhrks commented Jul 22, 2014

can you point to where Tick is calculated several times (currently)?

Caused by these:

  • Check hasattr

https://github.com/pydata/pandas/blob/master/pandas/tslib.pyx#784

  • Twice again in nanosecond conversion (hasattr and actual addition)

https://github.com/pydata/pandas/blob/master/pandas/tslib.pyx#784

@jreback
Copy link
Contributor

jreback commented Jul 22, 2014

not sure what you mean (about Tick being calced more than once). The 2nd time is simply int_64 addition. AFAICT. delta_to_nanoseconds is necessary. What do you think this should change to?

@sinhrks
Copy link
Member Author

sinhrks commented Jul 22, 2014

Yeah all operations are necessary. What I meant is every time delta is being calculated, thus it may better to change cache_readonly
https://github.com/pydata/pandas/blob/master/pandas/tseries/offsets.py#L2016.

Actually this is not affects to performance so much, thus it is possible to leave it as normal property.

@jreback
Copy link
Contributor

jreback commented Jul 22, 2014

I think that delta and nanos could be cache_readonly. Once you create an offset they are not changed (I don't think). You can try that (but separate PR). I am not sure how to test that, maybe just trace the code and see.

@jreback
Copy link
Contributor

jreback commented Jul 22, 2014

@sinhrks otherwise this looks ok. It just changes a lot of code so trying to review.

@sinhrks
Copy link
Member Author

sinhrks commented Jul 22, 2014

OK, modified to normal property.

@jreback
Copy link
Contributor

jreback commented Jul 22, 2014

seems, apply_wraps is on every apply, except for in DateOffset. maybe add a note there why this is (or is it right?)

value = result.value
result = Timestamp(value + nano)

if tz is not None and result.tzinfo is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be result= tslib._localize_pydatetime(result, tz) as well here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a flow for Timestamp, no need to care for datetime here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, maybe add a note (or you can simply use the other routine). I found it confusing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Modified to use tslib._localize_pydatetime(result, tz) to avoid any confusion.

@sinhrks
Copy link
Member Author

sinhrks commented Jul 24, 2014

DateOffset needs apply_wraps. I missed because of misunderstanding that DateOffset cannot be used by itself (#7375). Fixed and added tests.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

ok, ping when green

@sinhrks
Copy link
Member Author

sinhrks commented Jul 24, 2014

@jreback now green.

jreback added a commit that referenced this pull request Jul 25, 2014
BUG/PERF: offsets.apply doesnt preserve nanosecond
@jreback jreback merged commit 415fbfc into pandas-dev:master Jul 25, 2014
@jreback
Copy link
Contributor

jreback commented Jul 25, 2014

@sinhrks thanks for this...cleans up a large amount of code.....

@sinhrks sinhrks deleted the offsetnano branch July 25, 2014 20:42
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Frequency DateOffsets Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants