Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: series rank has a percentage rank option #5978

Merged
merged 1 commit into from
Feb 16, 2014
Merged

ENH: series rank has a percentage rank option #5978

merged 1 commit into from
Feb 16, 2014

Conversation

MichaelWS
Copy link
Contributor

closes #5971
This pull allows people to compute percentage ranks in cython for a series. I found myself computing this all the time and it will make this calculation much faster.

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

can you add a vbench for both cases (the original and new one?)

@@ -940,6 +940,7 @@ New features
- Access to historical Google Finance data in pandas.io.data (:issue:`3814`)
- DataFrame plotting methods can sample column colors from a Matplotlib
colormap via the ``colormap`` keyword. (:issue:`3860`)
- ``Series.rank()`` has a percentage rank option (:issue: `5971`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you put it accidentally in pandas 0.12 section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that is fixed.

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

any thoughts on 'percentage' vs. say 'per'? @jorisvandenbossche @y-p ?

@MichaelWS
Copy link
Contributor Author

I prefer 'pct' to 'per' @jorisvandenbossche @y-p

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

@MichaelWS so change to that..

@MichaelWS
Copy link
Contributor Author

done

@MichaelWS
Copy link
Contributor Author

also, I have never worked with vbench before, so I am not sure if I am doing what is intended

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

add a benchmark in vb_suite/groupby.py.....just copy-paste what you want to do

./test.perf.sh -b prior_commit -t last hash_of_yours

@MichaelWS
Copy link
Contributor Author

I am getting some odd errors. I will have to do some more research later today on it.

sqlalchemy.exc.IntegrityError: (IntegrityError) column checksum is not unique u'INSERT INTO benchmarks (checksum, name, description) VALUES (?, ?, ?)' ('ea1993ef61c3cc4e871d2cce3c5d983c', 'eval_frame_chained_cmp_python', None)

@@ -3894,6 +3894,11 @@ def test_rank(self):
iranks = iseries.rank()
exp = iseries.astype(float).rank()
assert_series_equal(iranks, exp)
iseries = Series(np.arange(5)) + 1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a couple of more tests, maybe all nan series and for groupby, group that has 1 element

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added more tests with partial nan's and duplicate values. nan's will always be nan's so not sure if we would ever catch a bug if all nan's.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that is the check; make sure you propogate nans; the edge cases are always important to test (and usually the hardest to get right)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easy enough. I added that as well.

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

this looks fine
can u rebase and move release notes to 0.14

@MichaelWS
Copy link
Contributor Author

easy enough. That is done.

@@ -79,9 +79,12 @@ Improvements to existing features
- ``plot(legend='reverse')`` will now reverse the order of legend labels for most plot kinds.
(:issue:`6014`)
- Allow multi-index slicers (:issue:`6134`, :issue:`4036`, :issue:`3057`, :issue:`2598`, :issue:`5641`)
<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your rebase introduced these....need to edit

@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

can you reset the release.rst to master HEAD then add your change again. Their were some formatting updates which you are reverting....this should only be your 1 line change

jreback added a commit that referenced this pull request Feb 16, 2014
ENH: series rank has a percentage rank option
@jreback jreback merged commit d3dd67c into pandas-dev:master Feb 16, 2014
@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

thanks @MichaelWS !

@MichaelWS
Copy link
Contributor Author

Thanks @jreback

@jreback
Copy link
Contributor

jreback commented Mar 27, 2014

@MichaelWS I think we need to have this pct option on DataFrame for consistency as well. Can you add?

issue is here: #6717

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestion: percentile ranks
3 participants