Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: restore DatetimeIndex.__iter__ performance by using non-EA implementation #26702

Merged
merged 1 commit into from Jun 7, 2019

Conversation

@qwhelan
Copy link
Contributor

commented Jun 7, 2019

The timeseries.TimeIteration.time_iter benchmark shows a 6.4x regression here: https://qwhelan.github.io/pandas/#timeseries.Iteration.time_iter?p-time_index=%3Cfunction%20date_range%3E&commits=08395af4-fc24c2cd . An iteration of asv find blames 1fc76b8#diff-26a6d2ca7adfca586aabbb1c9dd8bf36R74 . I believe what is happening is:

  • pandas.core.indexes.datetimelike.ea_passthrough was modified from calling getattr() to being passed a classmethod directly
  • The corresponding change in DatetimeIndexOpsMixin.__init__ then implicitly changed __iter__ from DatetimeArray.__iter__ to DatetimeLikeArrayMixin.__iter__. The latter must assume an ExtensionArray, so is intrinsically slower.
  • This regression was then hidden in asv due to inadvertent inclusion of memory addresses when using certain parameters. This is resolved in airspeed-velocity/asv#771 and explains why my asv results shows this regression while http://pandas.pydata.org/speed/pandas/#timeseries.Iteration.time_iter? does not

Reverting back to old behavior yields a 7x speedup:

$ asv compare HEAD^ HEAD --only-changed
       before           after         ratio
     [649ad5c9]       [4d0c13cc]
     <any_all_fix>       <datetime_iter>
-      3.30±0.02s          469±6ms     0.14  timeseries.Iteration.time_iter(<function date_range>)
-        34.7±1ms       11.2±0.2ms     0.32  timeseries.Iteration.time_iter_preexit(<function date_range>)
  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
@codecov

This comment has been minimized.

Copy link

commented Jun 7, 2019

Codecov Report

Merging #26702 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26702      +/-   ##
==========================================
- Coverage   91.88%   91.87%   -0.01%     
==========================================
  Files         174      174              
  Lines       50701    50702       +1     
==========================================
- Hits        46588    46584       -4     
- Misses       4113     4118       +5
Flag Coverage Δ
#multiple 90.41% <100%> (ø) ⬆️
#single 41.91% <100%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 96.37% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.6% <0%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 649ad5c...4d0c13c. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Jun 7, 2019

Codecov Report

Merging #26702 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26702      +/-   ##
==========================================
- Coverage   91.88%   91.88%   -0.01%     
==========================================
  Files         174      174              
  Lines       50702    50703       +1     
==========================================
- Hits        46590    46587       -3     
- Misses       4112     4116       +4
Flag Coverage Δ
#multiple 90.42% <100%> (ø) ⬆️
#single 41.9% <100%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 96.37% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ff4f38...a1e703f. Read the comment docs.

@TomAugspurger
Copy link
Contributor

left a comment

LGTM. cc @jbrockmendel if you have a chance to verify.

Release note in 0.25.0.rst would be nice to have @qwhelan. We have a performance improvements section.

@TomAugspurger TomAugspurger added this to the 0.25.0 milestone Jun 7, 2019

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Jun 7, 2019

LGTM. Only question is if in principle we should be wrapping self._data.__iter__, but I don't think its worth the hassle.

@qwhelan qwhelan force-pushed the qwhelan:datetime_iter branch from 4d0c13c to a1e703f Jun 7, 2019

@qwhelan

This comment has been minimized.

Copy link
Contributor Author

commented Jun 7, 2019

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jun 7, 2019

Thanks. In theory, a similar chunking trick could be done for PeriodArray & TimedeltaArray, right? The only thing to change would be the function doing the conversion in

            converted = tslib.ints_to_pydatetime(data[start_i:end_i],
                                                 tz=self.tz, freq=self.freq,
                                                 box="timestamp")

Happy to leave that as a followup.

@jreback jreback merged commit cdc07fd into pandas-dev:master Jun 7, 2019

11 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20190607.35 succeeded
Details
pandas-dev.pandas (Checks) Checks succeeded
Details
pandas-dev.pandas (Linux py35_compat) Linux py35_compat succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow) Linux py36_locale_slow succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow_old_np) Linux py36_locale_slow_old_np succeeded
Details
pandas-dev.pandas (Linux py37_locale) Linux py37_locale succeeded
Details
pandas-dev.pandas (Linux py37_np_dev) Linux py37_np_dev succeeded
Details
pandas-dev.pandas (Windows py36_np15) Windows py36_np15 succeeded
Details
pandas-dev.pandas (Windows py37_np141) Windows py37_np141 succeeded
Details
pandas-dev.pandas (macOS py35_macos) macOS py35_macos succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jun 7, 2019

thanks @qwhelan

@qwhelan qwhelan deleted the qwhelan:datetime_iter branch Jun 7, 2019

@qwhelan

This comment has been minimized.

Copy link
Contributor Author

commented Jun 7, 2019

@jreback Thanks for merging!

@TomAugspurger Yes, definitely a possibility. There's probably mid-single-digit factor speedups available there. Unfortunately, I'm headed to Europe for vacation next week and starting a new job shortly thereafter so can't promise to make any headway on those.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jun 7, 2019

datapythonista added a commit to datapythonista/pandas that referenced this pull request Jun 10, 2019

Docs azure (#8)
* CI/DOC: Building the documentation with azure-pipelines

* Synchronizing requirements-dev.txt

* Changing dependencies of the doc build in azure

* Reverting changes to dependencies

* Fixing typos, and improving instructions about private/public keys

* Adding missing parenthesis in comment

* DEPS: Using cpplint from conda-forge (pandas-dev#26693)

* DEPS: Using cpplint from conda-forge

* Perf: use python int in RangeIndex.get_loc (pandas-dev#26697)

* BUG: TDI/PI comparison with zero-dim array (pandas-dev#26707)

* Add Sponsors Button to GitHub Repo (pandas-dev#26694)

* PERF: restore DatetimeIndex.__iter__ performance by using non-EA implementation (pandas-dev#26702)

* TST: avoid matplotlib warnings in tests (pandas-dev#26678)

* TST: avoid matplotlib warnings in tests

* ASV: remove large IntervalIndex benchmark (pandas-dev#26712)

* Updated error messages (pandas-dev#26726)

* TST: skip assert for mpl 3.1+ (pandas-dev#26715)

* TST: skip assert for mpl 3.1+

* MPL compat

* remove pyarrow

* mpl compat

* CI: pin to sphinx 2.0.1 (pandas-dev#26727)

* Fixturize frame/test_rank.py (pandas-dev#26733)

* Fix mpl 3.0.3 grid settings test failure (pandas-dev#26737)

* Excel Tests Continued Fixture Cleanup (pandas-dev#26662)

* TST/CLN: remove TestData from tests\frame\test_rank.py (pandas-dev#26739)

*  TST: return pytest MarkDecorator from td.skip_if_no (pandas-dev#26735)

* Remove isort dep (pandas-dev#26732)

* PERF: building MultiIndex with categorical levels (pandas-dev#26721)

* Move ExcelWriter Tests (pandas-dev#26740)

* Remove isort dep (pandas-dev#26741)

* Add typing annotation to assert_index_equal (pandas-dev#26535)

* Convert Unions to TypeVar (pandas-dev#26588)

* PLOT: Split matplotlib specific code from pandas plotting (pandas-dev#26414)

* CLN: remove Panel-specific parts of core.generic (pandas-dev#26738)

* Split test_excel into submodule (pandas-dev#26755)

* DOC: Skip creating redirects if docs build fails (pandas-dev#26752)

* TST/CLN: reuse float_frame fixture in tests\io\formats\test_format.py (pandas-dev#26756)

* BUG: .isin on datetimelike indexes do not validate input of level parameter (pandas-dev#26677)

* Not failing the build if the push of the docs fails (making the push fail to test if it works)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.