PERF: improved performance of CategoricalIndex.is_monotonic* #21025

topper-123 · 2018-05-14T00:16:10Z

passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

>>> n = 1000000
>>> ci = pd.CategoricalIndex(list('a' * n + 'b' * n + 'c' * n))
>>> %t ci.is_monotonic_increasing
22 ms # v0.22 and master
227 ns  # this commit

There seem to be a few more like this, where CategoricalIndex should use self._engine but doesn't.

@TomAugspurger?

jreback · 2018-05-14T00:23:05Z

this hit the same code path; so check this

topper-123 · 2018-05-14T00:36:20Z

Not sure I follow, but these two versions do not follow the same code path, as the old version required creating a new Int64Index which is expensive.

CategoricalIndex.is_monotonic is already tested in indexes/test_category.py::TestCategoricalIndex::test_is_monotonic.

codecov · 2018-05-14T01:20:37Z

Codecov Report

Merging #21025 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #21025   +/-   ##
=======================================
  Coverage   91.83%   91.83%           
=======================================
  Files         153      153           
  Lines       49495    49495           
=======================================
  Hits        45454    45454           
  Misses       4041     4041

Flag	Coverage Δ
#multiple	`90.23% <100%> (ø)`	⬆️
#single	`41.88% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/category.py	`97.03% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 501f041...c815d62. Read the comment docs.

jreback · 2018-05-14T12:32:35Z

can you add additional tests using strings (and not just integers) in that same test. otherwise lgtm.

jreback · 2018-05-14T12:32:57Z

do we have sufficient asv's for this?

topper-123 · 2018-05-14T17:10:06Z

There were no asv's for this. However, if you run my code snippet above, there is a huge spike in RAM usage, when run in the old version. I've even gotten a few MemoryErrors.

So my ASV is done using only N = 1000 to limit memory usage. The result is here 60 microseconds (old version) vs 260 ns (new version).

Also, Series.is_monotonic* wasn't added untill 0.19. should that be put in a try/except clause, to avoid failing on older versions of pandas?

jreback

minor comment on the asv. its ok if it fails under 0.19, that's pretty far back now

jreback · 2018-05-14T23:58:49Z

asv_bench/benchmarks/categoricals.py

+        self.c = pd.CategoricalIndex(list('a'*N + 'b'*N + 'c'*N))
+        self.s = pd.Series(self.c)
+
+    def time_categorical_index_is_monotonic(self):


these shouldn't be in the same asv, you can do this with params I think

pep8speaks · 2018-05-15T16:15:39Z

Hello @topper-123! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on May 16, 2018 at 19:04 Hours UTC

jreback · 2018-05-15T23:43:46Z

doc/source/whatsnew/v0.23.0.txt

@@ -1079,6 +1079,7 @@ Performance Improvements
 - Improved performance of :func:`pandas.core.groupby.GroupBy.pct_change` (:issue:`19165`)
 - Improved performance of :func:`Series.isin` in the case of categorical dtypes (:issue:`20003`)
 - Improved performance of ``getattr(Series, attr)`` when the Series has certain index types. This manifiested in slow printing of large Series with a ``DatetimeIndex`` (:issue:`19764`)
+- Improved performance of :meth:`CategoricalIndex.is_monotonic_increasing`, :meth:`CategoricalIndex.is_monotonic_decreasing` and :meth:`CategoricalIndex.is_monotonic` (:issue:`21025`)


will need to be in 0.23.1 (not yet in repo, soon)

Moved to 0.23.1.

jreback · 2018-05-17T00:21:58Z

thanks @topper-123

) (cherry picked from commit 1ee5ecf)

(cherry picked from commit 1ee5ecf)

topper-123 force-pushed the is_monotonic_perf branch from b7f6e04 to a775186 Compare May 14, 2018 00:17

topper-123 force-pushed the is_monotonic_perf branch from a775186 to 3378b3a Compare May 14, 2018 00:27

jreback added Performance Memory or execution speed performance Categorical Categorical Data Type labels May 14, 2018

topper-123 force-pushed the is_monotonic_perf branch 2 times, most recently from 1ee1d93 to 6bdbb5d Compare May 14, 2018 17:06

jreback reviewed May 14, 2018

View reviewed changes

topper-123 force-pushed the is_monotonic_perf branch from 6bdbb5d to 6d4aea9 Compare May 15, 2018 16:15

topper-123 force-pushed the is_monotonic_perf branch from 6d4aea9 to 2e34678 Compare May 15, 2018 16:19

jreback requested changes May 15, 2018

View reviewed changes

jreback added this to the 0.23.1 milestone May 15, 2018

improved performance of CategoricalIndex.is_monotonic*

c815d62

topper-123 force-pushed the is_monotonic_perf branch from 2e34678 to c815d62 Compare May 16, 2018 19:04

jreback approved these changes May 17, 2018

View reviewed changes

jreback merged commit 1ee5ecf into pandas-dev:master May 17, 2018

jreback added the Needs Backport label May 17, 2018

topper-123 deleted the is_monotonic_perf branch May 21, 2018 21:00

jorisvandenbossche removed the Needs Backport label Jun 8, 2018

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Jun 8, 2018

improved performance of CategoricalIndex.is_monotonic* (pandas-dev#21025

57f6f45

) (cherry picked from commit 1ee5ecf)

fjetter mentioned this pull request Jun 9, 2018

PERF: __contains__ method for Categorical #21022

Closed

4 tasks

jorisvandenbossche pushed a commit that referenced this pull request Jun 9, 2018

improved performance of CategoricalIndex.is_monotonic* (#21025)

e469400

(cherry picked from commit 1ee5ecf)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: improved performance of CategoricalIndex.is_monotonic* #21025

PERF: improved performance of CategoricalIndex.is_monotonic* #21025

topper-123 commented May 14, 2018 •

edited

jreback commented May 14, 2018

topper-123 commented May 14, 2018

codecov bot commented May 14, 2018 •

edited

jreback commented May 14, 2018

jreback commented May 14, 2018

topper-123 commented May 14, 2018

jreback left a comment

jreback May 14, 2018

pep8speaks commented May 15, 2018 •

edited

jreback May 15, 2018

topper-123 May 16, 2018

jreback commented May 17, 2018

PERF: improved performance of CategoricalIndex.is_monotonic* #21025

PERF: improved performance of CategoricalIndex.is_monotonic* #21025

Conversation

topper-123 commented May 14, 2018 • edited

jreback commented May 14, 2018

topper-123 commented May 14, 2018

codecov bot commented May 14, 2018 • edited

Codecov Report

jreback commented May 14, 2018

jreback commented May 14, 2018

topper-123 commented May 14, 2018

jreback left a comment

Choose a reason for hiding this comment

jreback May 14, 2018

Choose a reason for hiding this comment

pep8speaks commented May 15, 2018 • edited

Comment last updated on May 16, 2018 at 19:04 Hours UTC

jreback May 15, 2018

Choose a reason for hiding this comment

topper-123 May 16, 2018

Choose a reason for hiding this comment

jreback commented May 17, 2018

topper-123 commented May 14, 2018 •

edited

codecov bot commented May 14, 2018 •

edited

pep8speaks commented May 15, 2018 •

edited