Skip to content

Conversation

jtratner
Copy link
Contributor

Closes #5367 - generally as fast or faster than value_counts (which makes sense because it has to construct Series), so should be relatively good performance-wise. Also, doesn't get stuck on value_counts' pathological case (huge array with # uniques close to/at the size of the array).

Not using result of hashtable.value_count() under the hood and instead iterating over klib table directly gives a huge speedup. I just moved the value_count table creation method to a separate function (zero perf hit). For performance breakouts check out this gist:

https://gist.github.com/jtratner/7225878

DataFrame version delegates to Series' version at each level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chained indexing! not in the tests!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad - it's so natural feeling haha

@jtratner
Copy link
Contributor Author

@jreback should I put this on hold until after 0.13?

@jreback
Copy link
Contributor

jreback commented Oct 30, 2013

@jtratner seems reasonably tested.....i think its ok, need a doc mention? (in api, but near value_counts somewhere?)

@jtratner
Copy link
Contributor Author

Yeah, I'll add it.

@jorisvandenbossche
Copy link
Member

Can you also add entries to api.rst?

@jtratner
Copy link
Contributor Author

jtratner commented Nov 4, 2013

I missed putting it there. I'll do that--thanks for noticing that!

@jtratner
Copy link
Contributor Author

jtratner commented Nov 5, 2013

added doc mention for Series and DataFrame

Abstract building up the tables with counts so value_count and mode can
share. Aww! DOC: Add documentation about mode()
jtratner added a commit that referenced this pull request Nov 5, 2013
ENH: Add mode method to Series and DataFrame
@jtratner jtratner merged commit 2d2e8b5 into pandas-dev:master Nov 5, 2013
@jtratner jtratner deleted the add-mode-to-series-and-frame branch November 5, 2013 06:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add mode() function to pandas.Series

3 participants