ENH: series rank has a percentage rank option #5978

MichaelWS · 2014-01-16T19:05:10Z

closes #5971
This pull allows people to compute percentage ranks in cython for a series. I found myself computing this all the time and it will make this calculation much faster.

jreback · 2014-01-16T19:07:45Z

can you add a vbench for both cases (the original and new one?)

jorisvandenbossche · 2014-01-16T19:08:48Z

doc/source/release.rst

@@ -940,6 +940,7 @@ New features
  - Access to historical Google Finance data in pandas.io.data (:issue:`3814`)
  - DataFrame plotting methods can sample column colors from a Matplotlib
    colormap via the ``colormap`` keyword. (:issue:`3860`)
+  - ``Series.rank()`` has a percentage rank option (:issue: `5971`)


I think you put it accidentally in pandas 0.12 section

Thanks, that is fixed.

jreback · 2014-01-16T19:15:21Z

any thoughts on 'percentage' vs. say 'per'? @jorisvandenbossche @y-p ?

MichaelWS · 2014-01-16T19:18:20Z

I prefer 'pct' to 'per' @jorisvandenbossche @y-p

jreback · 2014-01-16T19:21:09Z

@MichaelWS so change to that..

MichaelWS · 2014-01-16T19:27:46Z

done

MichaelWS · 2014-01-16T20:15:52Z

also, I have never worked with vbench before, so I am not sure if I am doing what is intended

jreback · 2014-01-16T20:50:07Z

add a benchmark in vb_suite/groupby.py.....just copy-paste what you want to do

./test.perf.sh -b prior_commit -t last hash_of_yours

MichaelWS · 2014-01-16T21:50:00Z

I am getting some odd errors. I will have to do some more research later today on it.

sqlalchemy.exc.IntegrityError: (IntegrityError) column checksum is not unique u'INSERT INTO benchmarks (checksum, name, description) VALUES (?, ?, ?)' ('ea1993ef61c3cc4e871d2cce3c5d983c', 'eval_frame_chained_cmp_python', None)

jreback · 2014-01-17T13:51:49Z

pandas/tests/test_series.py

@@ -3894,6 +3894,11 @@ def test_rank(self):
        iranks = iseries.rank()
        exp = iseries.astype(float).rank()
        assert_series_equal(iranks, exp)
+        iseries = Series(np.arange(5)) + 1.0


can you add a couple of more tests, maybe all nan series and for groupby, group that has 1 element

added more tests with partial nan's and duplicate values. nan's will always be nan's so not sure if we would ever catch a bug if all nan's.

but that is the check; make sure you propogate nans; the edge cases are always important to test (and usually the hardest to get right)

easy enough. I added that as well.

jreback · 2014-02-16T12:16:54Z

this looks fine
can u rebase and move release notes to 0.14

MichaelWS · 2014-02-16T13:10:50Z

easy enough. That is done.

jreback · 2014-02-16T13:14:40Z

doc/source/release.rst

@@ -79,9 +79,12 @@ Improvements to existing features
 - ``plot(legend='reverse')`` will now reverse the order of legend labels for most plot kinds.
  (:issue:`6014`)
 - Allow multi-index slicers (:issue:`6134`, :issue:`4036`, :issue:`3057`, :issue:`2598`, :issue:`5641`)
+<<<<<<< HEAD


your rebase introduced these....need to edit

jreback · 2014-02-16T16:54:02Z

can you reset the release.rst to master HEAD then add your change again. Their were some formatting updates which you are reverting....this should only be your 1 line change

ENH: series rank has a percentage rank option

jreback · 2014-02-16T21:23:18Z

thanks @MichaelWS !

MichaelWS · 2014-02-16T21:26:56Z

Thanks @jreback

jreback · 2014-03-27T12:08:01Z

@MichaelWS I think we need to have this pct option on DataFrame for consistency as well. Can you add?

issue is here: #6717

jorisvandenbossche reviewed Jan 16, 2014
View reviewed changes

jreback reviewed Jan 17, 2014
View reviewed changes

jreback added Algos labels Feb 16, 2014

jreback reviewed Feb 16, 2014
View reviewed changes

checkin of percentage rank

7b37858

jreback added a commit that referenced this pull request Feb 16, 2014

Merge pull request #5978 from MichaelWS/master

d3dd67c

ENH: series rank has a percentage rank option

jreback merged commit d3dd67c into pandas-dev:master Feb 16, 2014

jreback mentioned this pull request Feb 16, 2014

ENH: add 'dense' ranking method #6333

Closed

rosnfeld mentioned this pull request Mar 9, 2014

ENH: including offset/freq in Timestamp repr (#4553) #6575

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: series rank has a percentage rank option #5978

ENH: series rank has a percentage rank option #5978

MichaelWS commented Jan 16, 2014

jreback commented Jan 16, 2014

jorisvandenbossche Jan 16, 2014

MichaelWS Jan 16, 2014

jreback commented Jan 16, 2014

MichaelWS commented Jan 16, 2014

jreback commented Jan 16, 2014

MichaelWS commented Jan 16, 2014

MichaelWS commented Jan 16, 2014

jreback commented Jan 16, 2014

MichaelWS commented Jan 16, 2014

jreback Jan 17, 2014

MichaelWS Jan 17, 2014

jreback Jan 17, 2014

MichaelWS Jan 17, 2014

jreback commented Feb 16, 2014

MichaelWS commented Feb 16, 2014

jreback Feb 16, 2014

jreback commented Feb 16, 2014

jreback commented Feb 16, 2014

MichaelWS commented Feb 16, 2014

jreback commented Mar 27, 2014

ENH: series rank has a percentage rank option #5978

ENH: series rank has a percentage rank option #5978

Conversation

MichaelWS commented Jan 16, 2014

jreback commented Jan 16, 2014

jorisvandenbossche Jan 16, 2014

Choose a reason for hiding this comment

MichaelWS Jan 16, 2014

Choose a reason for hiding this comment

jreback commented Jan 16, 2014

MichaelWS commented Jan 16, 2014

jreback commented Jan 16, 2014

MichaelWS commented Jan 16, 2014

MichaelWS commented Jan 16, 2014

jreback commented Jan 16, 2014

MichaelWS commented Jan 16, 2014

jreback Jan 17, 2014

Choose a reason for hiding this comment

MichaelWS Jan 17, 2014

Choose a reason for hiding this comment

jreback Jan 17, 2014

Choose a reason for hiding this comment

MichaelWS Jan 17, 2014

Choose a reason for hiding this comment

jreback commented Feb 16, 2014

MichaelWS commented Feb 16, 2014

jreback Feb 16, 2014

Choose a reason for hiding this comment

jreback commented Feb 16, 2014

jreback commented Feb 16, 2014

MichaelWS commented Feb 16, 2014

jreback commented Mar 27, 2014