Rank categorical perf #15518

jeetjitsu · 2017-02-27T13:54:13Z

closes PERF: categorical rank #15498
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew

codecov-io · 2017-02-27T14:44:37Z

Codecov Report

Merging #15518 into master will decrease coverage by -0.71%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15518      +/-   ##
==========================================
- Coverage   91.06%   90.36%   -0.71%     
==========================================
  Files         136      136              
  Lines       49099    49556     +457     
==========================================
+ Hits        44714    44783      +69     
- Misses       4385     4773     +388

Impacted Files	Coverage Δ
pandas/core/algorithms.py	`94.48% <ø> (ø)`	✅
pandas/core/categorical.py	`96.92% <100%> (+0.01%)`	✅
pandas/io/gbq.py	`16.46% <0%> (-70.21%)`	❌
pandas/util/decorators.py	`68.6% <0%> (-0.63%)`	❌
pandas/tools/merge.py	`91.78% <0%> (-0.35%)`	❌
pandas/core/common.py	`91.02% <0%> (-0.34%)`	❌
pandas/types/cast.py	`85.42% <0%> (-0.24%)`	❌
pandas/core/groupby.py	`94.98% <0%> (-0.02%)`	❌
pandas/core/frame.py	`97.83% <0%> (-0.01%)`	❌
pandas/core/generic.py	`96.31% <0%> (-0.01%)`	❌
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d0a281f...30b49b9. Read the comment docs.

jreback · 2017-02-27T14:50:41Z

can you add an asv similar to from the issue (ideally tests multple dtypes as well)

jreback · 2017-02-27T14:51:03Z

also a whatsnew (in Performance) entry would be great. code lgtm.

jorisvandenbossche

Looks good to me!

jorisvandenbossche · 2017-02-27T20:44:08Z

pandas/tests/series/test_analytics.py

+            [1, 2, 3, 4, 5, 6],
+        ).astype('category').cat.set_categories(
+            [1, 2, 3, 4, 5, 6],
+            ordered=False


you can simplify this creation by doing astype('category', categories=[1, 2, 3, 4, 5, 6], ordered=False)

jorisvandenbossche · 2017-02-27T20:46:27Z

pandas/core/categorical.py

            values = np.array(self)
+        else:
+            values = np.array(
+                self.rename_categories(Series(self.categories).rank())


Can you add a small comment on why do this here?

jeetjitsu · 2017-03-01T09:29:05Z

I have made all the minor changes but can't seem to run the asv benchmarks. Tried running them on upstream/master as well but the tests fail with the following error

TypeError: unbound methods must have non-NULL im_class

I used the following command to run asv benchmarks

asv continuous -f 1.1 -E virtualenv upstream/master HEAD -b ^categorical

Googling for a solution hasn't helped. If you can point me towards a solution i will run the benchmarks and push my changes.

Here's a complete trace

� Creating environments
� Discovering benchmarks
�� Uninstalling from virtualenv-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
�� Installing into virtualenv-py2.7-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt..
�� Error running /home/jeet/Projects/pandas/asv_bench/env/dbb8113eac34359a9ef26b93e54a1d03/bin/python /home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/benchmark.py discover /home/jeet/Projects/pandas/asv_bench/benchmarks /tmp/tmpklwg1xod
             STDOUT -------->
             
             STDERR -------->
             Traceback (most recent call last):
               File "/home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/benchmark.py", line 818, in <module>
                 commands[mode](args)
               File "/home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/benchmark.py", line 763, in main_discover
                 list_benchmarks(benchmark_dir, fp)
               File "/home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/benchmark.py", line 748, in list_benchmarks
                 for benchmark in disc_benchmarks(root):
               File "/home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/benchmark.py", line 653, in disc_benchmarks
                 for name, class_attr in inspect.getmembers(module_attr):
               File "/usr/lib/python2.7/inspect.py", line 253, in getmembers
                 value = getattr(object, key)
               File "/home/jeet/Projects/pandas/asv_bench/env/dbb8113eac34359a9ef26b93e54a1d03/local/lib/python2.7/site-packages/pandas/util/decorators.py", line 265, in __get__
                 return types.MethodType(self, instance)
             TypeError: unbound methods must have non-NULL im_class
Traceback (most recent call last):
  File "/home/jeet/.virtualenvs/pandas-dev/bin/asv", line 11, in <module>
    load_entry_point('asv==0.3.dev1099+a95b68de', 'console_scripts', 'asv')()
  File "/home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/main.py", line 38, in main
    result = args.func(args)
  File "/home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/commands/__init__.py", line 48, in run_from_args
    return cls.run_from_conf_args(conf, args)
  File "/home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/commands/continuous.py", line 61, in run_from_conf_args
    quick=args.quick, **kwargs
  File "/home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/commands/continuous.py", line 86, in run
    _returns=run_objs, _machine_file=_machine_file)
  File "/home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/commands/run.py", line 185, in run
    benchmarks = Benchmarks(conf, repo, environments, regex=bench)
  File "/home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/benchmarks.py", line 336, in __init__
    benchmarks = self.disc_benchmarks(conf, repo, environments)
  File "/home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/benchmarks.py", line 385, in disc_benchmarks
    dots=False)
  File "/home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/plugins/virtualenv.py", line 192, in run
    return self.run_executable('python', args, **kwargs)
  File "/home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/environment.py", line 556, in run_executable
    return util.check_output([exe] + args, **kwargs)
  File "/home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/util.py", line 571, in check_output
    raise ProcessError(args, retcode, stdout, stderr)
asv.util.ProcessError: Command '/home/jeet/Projects/pandas/asv_bench/env/dbb8113eac34359a9ef26b93e54a1d03/bin/python /home/jeet/.virtualenvs/pandas-dev/lib/python3.6/site-packages/asv/benchmark.py discover /home/jeet/Projects/pandas/asv_bench/benchmarks /tmp/tmpklwg1xod' returned non-zero exit status 1

jorisvandenbossche · 2017-03-01T11:01:02Z

@ikilledthecat Can you push your changes anyway (regardless whether you get asv running), so we can already review (and getting the asv benchs running is not a blocker for merging this PR in this case).

Regarding the asv, I don't have experience with running it using virtualenv, only used it with conda and never saw such an error.
As the error seems to come from asv itself, you may want to raise an issue on their tracker (https://github.com/spacetelescope/asv)

jorisvandenbossche · 2017-03-01T11:12:40Z

@ikilledthecat something went wrong with your rebase, as you have many other commits included now as well. Doing

git fetch upstream
git checkout rank_categorical_perf
git rebase upstream/master
git push -f origin rank_categorical_perf

should normally always do the right thing

no need to rename categories where they are already ordered

check for numeric instead of monotonic

no need to rename categories where they are already ordered

jeetjitsu · 2017-03-01T11:19:49Z

@jorisvandenbossche sorry, pushed after re-mergeing upstream/master.

jreback · 2017-03-01T19:22:45Z

you'll want to rebase on master as travis was having some issues

jreback · 2017-03-01T21:56:32Z

thanks @ikilledthecat

closes pandas-dev#15498 Author: Prasanjit Prakash <jeet@gmail.com> Closes pandas-dev#15518 from ikilledthecat/rank_categorical_perf and squashes the following commits: 30b49b9 [Prasanjit Prakash] PERF: GH15498 - pep8 changes ad38544 [Prasanjit Prakash] PERF: GH15498 - asv tests and whatsnew 1ebdb56 [Prasanjit Prakash] PERF: categorical rank GH#15498 a67cd85 [Prasanjit Prakash] PERF: categorical rank GH#15498 81df7df [Prasanjit Prakash] PERF: categorical rank GH#15498 45dd125 [Prasanjit Prakash] PERF: categorical rank GH#15498 33249b3 [Prasanjit Prakash] PERF: categorical rank GH#15498

jreback added Categorical Categorical Data Type Performance Memory or execution speed performance labels Feb 27, 2017

jorisvandenbossche reviewed Feb 27, 2017

View reviewed changes

jeet63 added 6 commits March 1, 2017 16:43

PERF: categorical rank GH#15498

33249b3

PERF: categorical rank GH#15498

45dd125

no need to rename categories where they are already ordered

PERF: categorical rank GH#15498

81df7df

check for numeric instead of monotonic

PERF: categorical rank GH#15498

a67cd85

PERF: categorical rank GH#15498

1ebdb56

no need to rename categories where they are already ordered

PERF: GH15498 - asv tests and whatsnew

ad38544

jeetjitsu force-pushed the rank_categorical_perf branch from 2e790d2 to ad38544 Compare March 1, 2017 11:18

PERF: GH15498 - pep8 changes

30b49b9

jreback added this to the 0.20.0 milestone Mar 1, 2017

jreback closed this in 1c106c8 Mar 1, 2017

jeetjitsu deleted the rank_categorical_perf branch March 1, 2017 23:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rank categorical perf #15518

Rank categorical perf #15518

jeetjitsu commented Feb 27, 2017 •

edited

Loading

codecov-io commented Feb 27, 2017 •

edited

Loading

jreback commented Feb 27, 2017

jreback commented Feb 27, 2017

jorisvandenbossche left a comment

jorisvandenbossche Feb 27, 2017

jorisvandenbossche Feb 27, 2017

jeetjitsu commented Mar 1, 2017 •

edited

Loading

jorisvandenbossche commented Mar 1, 2017

jorisvandenbossche commented Mar 1, 2017

jeetjitsu commented Mar 1, 2017

jreback commented Mar 1, 2017

jreback commented Mar 1, 2017

Rank categorical perf #15518

Rank categorical perf #15518

Conversation

jeetjitsu commented Feb 27, 2017 • edited Loading

codecov-io commented Feb 27, 2017 • edited Loading

Codecov Report

jreback commented Feb 27, 2017

jreback commented Feb 27, 2017

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche Feb 27, 2017

Choose a reason for hiding this comment

jorisvandenbossche Feb 27, 2017

Choose a reason for hiding this comment

jeetjitsu commented Mar 1, 2017 • edited Loading

jorisvandenbossche commented Mar 1, 2017

jorisvandenbossche commented Mar 1, 2017

jeetjitsu commented Mar 1, 2017

jreback commented Mar 1, 2017

jreback commented Mar 1, 2017

jeetjitsu commented Feb 27, 2017 •

edited

Loading

codecov-io commented Feb 27, 2017 •

edited

Loading

jeetjitsu commented Mar 1, 2017 •

edited

Loading