Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: passing TDA and wrong freq to TimedeltaIndex #31268

Merged
merged 4 commits into from
Jan 25, 2020

Conversation

jbrockmendel
Copy link
Member

Caught while working towards making TimedeltaArray._simple_new more strict about its inputs.

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there perf implications of going through _from_sequence instead of _shallow_copy / _simple_new ?

@jbrockmendel
Copy link
Member Author

Are there perf implications of going through _from_sequence instead of _shallow_copy / _simple_new ?

Yes, this removes two fast-paths. The alternative is adding the relevant checks to those fastpaths, but I prefer to avoid the duplication that entails.

In [3]: tdi = pd.timedelta_range("1 day", periods=10000)                                                                                                                                                                                                
In [4]: tda = tdi._data                                                                                                                                                                                                                                 
In [5]: %timeit pd.TimedeltaIndex(tdi)                                                                                                                                                                                                                  
7.48 µs ± 261 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)   # <-- master
20.6 µs ± 210 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)   # <-- PR

In [6]: %timeit pd.TimedeltaIndex(tda)                                                                                                                                                                                                                  
4.93 µs ± 49.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)   # <-- master
17.4 µs ± 317 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)   # <-- PR

Once this bug is fixed, we'll be able to simplify TDA/TDI._simple_new which should make up some lost ground.

Also there's an easy optimization to make in sequence_to_td64ns that brings the times here to 15.5us and 11.9us, respectively.

@jbrockmendel
Copy link
Member Author

I guess we could just add and freq is None to the TDA check and keep the fastpaths.

@jorisvandenbossche
Copy link
Member

I guess we could just add and freq is None to the TDA check and keep the fastpaths.

That sounds good to me as well. In any case, I would find it a bit strange that passing an TimedeltaArray to TimedeltaIndex(..) wouldn't be a relatively fast path.

@jbrockmendel
Copy link
Member Author

updated to restore fastpath, just skips it if freq is non-None

@jreback jreback added Timedelta Timedelta data type Frequency DateOffsets labels Jan 25, 2020
@jreback jreback added this to the 1.1 milestone Jan 25, 2020
@jreback jreback merged commit 839e7f1 into pandas-dev:master Jan 25, 2020
@jreback
Copy link
Contributor

jreback commented Jan 25, 2020

thanks, if we don't have already, some asvs on Index constructions from equiv Arrays would be good.

@jbrockmendel jbrockmendel deleted the bug-tdi-freq-constructor branch January 25, 2020 16:31
keechongtan added a commit to keechongtan/pandas that referenced this pull request Jan 27, 2020
…ndexing-1row-df

* upstream/master: (194 commits)
  DOC Remove Python 2 specific comments from documentation (pandas-dev#31198)
  Follow up PR: pandas-dev#28097 Simplify branch statement (pandas-dev#29243)
  BUG: DatetimeIndex.snap incorrectly setting freq (pandas-dev#31188)
  Move DataFrame.info() to live with similar functions (pandas-dev#31317)
  ENH: accept a dictionary in plot colors (pandas-dev#31071)
  PERF: add shortcut to Timestamp constructor (pandas-dev#30676)
  CLN/MAINT: Clean and annotate stata reader and writers (pandas-dev#31072)
  REF: define _get_slice_axis in correct classes (pandas-dev#31304)
  BUG: DataFrame.floordiv(ser, axis=0) not matching column-wise bheavior (pandas-dev#31271)
  PERF: optimize is_scalar, is_iterator (pandas-dev#31294)
  BUG: Series rolling count ignores min_periods (pandas-dev#30923)
  xfail sparse warning; closes pandas-dev#31310 (pandas-dev#31311)
  REF: DatetimeIndex.get_value wrap DTI.get_loc (pandas-dev#31314)
  CLN: internals.managers (pandas-dev#31316)
  PERF: avoid copies if possible in fill_binop (pandas-dev#31300)
  Add test for multiindex json (pandas-dev#31307)
  BUG: passing TDA and wrong freq to TimedeltaIndex (pandas-dev#31268)
  BUG: inconsistency between PeriodIndex.get_value vs get_loc (pandas-dev#31172)
  CLN: remove _set_subtyp (pandas-dev#31301)
  CI: Updated version of macos image (pandas-dev#31292)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants