Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: sorted() methods on everything #9816

Closed
brandon-rhodes opened this issue Apr 5, 2015 · 7 comments · Fixed by #10726
Closed

Feature request: sorted() methods on everything #9816

brandon-rhodes opened this issue Apr 5, 2015 · 7 comments · Fixed by #10726
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff API Design
Milestone

Comments

@brandon-rhodes
Copy link
Contributor

It would make Pandas easier to teach, easier to learn, and easier to use if the sorting behavior were the same between series and dataframes. But the existing order() and sort() methods are locked into their old behaviors by all of the code that already depends on them.

But a new sorted() method could bring symmetry between series and dataframes for code written from now on:

Series.sorted()      =>  same as existing Series.order()
DataFrame.sorted()   =>  same as existing DataFrame.sort()

Having this new pair of methods with identical conventions, where possible, would solve several different problems that learners have with Pandas today:

  • In Pandas, nearly all methods return a new object by default instead of doing modification in-place, but learners discover that Series.sort() is a special case.
  • In Python, a sort() method traditionally returns None and does an in-place sort, but learners have to discover that DataFrame.sort() violates this convention in order to match the behavior of the rest of Pandas.
  • The new-object sorter for series objects is Series.order() which is very difficult to discover, as nothing else in the Python ecosystem is named order(), and since one would normally expect an order() method to tell you the order (ascending? descending? none?) instead of imposing a new order.
  • The standard Python name for a sort that returns a new object is sorted(), per the universally loved Python built-in, but learners cannot transfer this knowledge to Pandas, where that concept exists but under the two different names Series.order() and DataFrame.sort().

Yes, the ed at the end of sorted() would be one character longer than order() and two characters longer than the current practice of df.sort(). But, on balance, I think that most programmers would happily cede two characters in order to be able to use the same method name when they are flipping code between handling series and handling dataframes, and happy to have the option of using the standard Python name for the concept of a non-in-place sort.

I suspect that deprecating the old names would be overly disruptive at this point, and they could probably live alongside the new sorted() methods without much trouble — new documentation could adopt the new, consistent terminology where possible, if the Pandas developers did not want to disrupt current users of the old inconsistent names.

@jreback
Copy link
Contributor

jreback commented Apr 5, 2015

see #8239 for much of the same discussion

this would actually be a nice soln as changing the existing behavior or order/sort is back incompatible
but s new method would solve this problem and we could easily deprecate the original methods.

pull requested are welcome!

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Difficulty Intermediate labels Apr 5, 2015
@jreback jreback added this to the 0.17.0 milestone Apr 5, 2015
@brandon-rhodes
Copy link
Contributor Author

Thanks for the vote of interest — I will look for the Pandas team at the PyCon sprints :)

@jreback
Copy link
Contributor

jreback commented Apr 5, 2015

awesome! this would be amazing to do then!

@shoyer
Copy link
Member

shoyer commented Apr 5, 2015

Agreed, this is a nice solution! 👍

@jreback jreback changed the title Feature request: sorted() methods on everything Feature request: sorted() methods on everything Apr 5, 2015
@jorisvandenbossche jorisvandenbossche removed the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Apr 6, 2015
@jorisvandenbossche
Copy link
Member

From me a 👍 as well!

But, I think there are some other aspects of the interface that needs discussion (as seen in #8239): how to specify to sort on a certain column, or column/index combination, default of sorting on index or values, ...
But in any case: a good proposal to deal with the possible back compat problems. Now finding a good interface.

@BrenBarn
Copy link

Is there a reason that we can't just add an order method to DataFrame that does the same as what order does for Series? What exactly are the different capabilities that each method provides that we want to keep?

@jreback
Copy link
Contributor

jreback commented Apr 26, 2015

order is actually an odd term (from R I believe) and sort/sorted is more pythonic

the intention would be to replicate sort for DataFrame and order for Series
iow the non-in place behavior (as sort for a Series is in place: this came from numpy.sort originally which is in place)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff API Design
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants