Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
API: unified sorting #8239
Comments
jreback
added API Design Reshaping
labels
Sep 11, 2014
jreback
added this to the
0.16
milestone
Sep 11, 2014
-1. Although I'm not fan of this behaviour, IMO we're stuck with it*, this would break lots of code and there's no way to make a clean depreciation/migration path. +1 to cleaning up sort API, the other changes seem reasonable. *Perhaps we should name everything order ?? :S |
|
Regarding I tried to come up with a way to detect when users might expect the original behavior... couldn't find anything clean. The best one was probably: def sort( ..., inplace=None):
...
inplace = True if inplace is None else False
...Unfortunately it's still a pretty bad idea. Backward-compatible, sure, but it's really just kicking the can down the road. |
|
I just don't see the value proposition in this particular breakage, it will affect a lot of users, and you're not even "fixing" anything (i.e. fixing their buggy code) - you'd just be changing syntax. To quote @y-p:
I'd say I was on the liberal side of API breakages, but I don't see how this one can fly! |
|
The more unified the sort API becomes, the more glaring the inplace inconsistency will be. That said, I think the argument is stronger to have consistent behavior vs. consistent signatures. Such a change should wait for a major version bump. (wait, who said 1.0?) So keeping inplace=false for Series.sort means:
|
big if! Definitely such a change should be discussed in the ML, but I think it's a tough sell. I agree the inconsistency sucks, but practicality beats purity.... and this will (fairly) annoy a lot of people. I think if you're changing the API there needs to be some carrot cake rather than just stick (with this change I just see stick). I was being serious about using/preferring Edit: To me "sort" sounds inplace, whereas "order" is temporary arrangement. |
|
OK, that edit-note makes sense to me. I'll have a look at |
jreback
referenced
this issue
Sep 16, 2014
Open
CLN/TST: move consoliate sort_index to core/generic.py #8283
jreback
modified the milestone: 0.16, 0.15.1
Oct 7, 2014
jreback
referenced
this issue
Jan 26, 2015
Closed
Enable referring to index level in functions like DataFrame.sort_index #2615
jreback
added the
Master Tracker
label
Mar 6, 2015
jreback
modified the milestone: 0.16.0, Next Major Release
Mar 6, 2015
jreback
referenced
this issue
Apr 5, 2015
Closed
Feature request: sorted() methods on everything #9816
|
@patricktokeeffe see also #9816 |
I don't think this is equivalent. |
jorisvandenbossche
referenced
this issue
Jul 7, 2015
Closed
sort seems to sort inplace by default unlike documentation #10522
patricktokeeffe
closed this
Aug 2, 2015
jreback
added a commit
to jreback/pandas
that referenced
this issue
Aug 18, 2015
|
|
jreback |
13d2d71
|
patricktokeeffe commentedSep 11, 2014
originally #5190
xref #9816
xref #3942
This issue is for creating a unified API to Series & DataFrame sorting methods. Panels are not addressed (yet) but a unified API should be easy to extend to them. Related are #2094, #5190, #6847, #7121, #2615. As discussion proceeds, this post will be edited.
For reference, the 0.14.1 signatures are:
Proposed unified signature for
Series.sortandDataFrame.sort(except Series version retains current inplace=True):The
sort_indexsignatures change too andsort_columnsis created:Proposed changes:
makemaybe, possibly in 1.0inplace=Falsedefault (changesSeries.sort)byargument to accept column-name/list-of-column-names in first positioncolumnskeyword ofDataFrame.sort, replaced withby(df.sort signature would need to retain columns keyword until finally removed but it's not shown in proposal)columnsarg ofDataFrame.sortallows tuples); use newlevelargument insteadby/axisinDataFrame.sort_index(see change 7)axisis too so for the sake of working with dataframes, it gets first positionlevelargument to accept integer/level-name/list-of-ints/list-of-level-names for sorting (multi)index by particular level(s)columnsarg ofDataFrame.sortlevelargument tosort_indexin first position so level(s) of multilevel index can be specified; this makessort_index==sortlevel(see change 8)sort_remainingarg to handle multi-level indexesDataFrame.sort_columns==sort(axis=1)(see syntax below)Series.ordersince change 1 makesSeries.sortequivalent (?)inplace,kind, andna_positionarguments toSeries.sort_index(to matchDataFrame.sort_index);byandaxisargs are not added since they don't make sense for seriesbyargument fromDataFrame.sort_indexsince it makessort_indexequivalent tosortsortlevelsince change 3b makessort_indexequivalentNotes:
sortis still object-dependent: for series, sorts by values and for data frames, sorts by indexlevelarg makessort_indexandsortlevelequivalent. if sortlevel is retained:sortleveltosort_levelfor naming conventionsSeries.sortlevelshould haveinplaceargument addedlevelandsort_remainingargs tosort_indexso it's not equivalent tosort_level(intentionally limiting sort_index seems like a bad idea though)level=Noneforsort_columns. probably not since level=None falls back to level=0 anywaybyandaxisarguments should be ignored bySeries.sortSyntax:
sort()==sort(level=0)==sort_index()==sortlevel()sort(['A','B'])sort(level='spam')==sort_index('spam')==sortlevel('spam')sort(['A','B'], level='spam')levelcontrols here even though columns are specified so sort happens along row index named 'spam' first, then nested sort occurs using columns 'A' and 'B'sort(axis=1)==sort(axis=1, level=0)==sort_columns()sort(['A','B'], axis=1)==sort_columns(['A','B'])sort(['A','B'], axis=1, level='spam')==sort_columns(['A','B'], level='spam')axiscontrolslevelso sort will be on columns named 'A' and 'B' in column index named 'spam'sort()==order()-- sorts on valueslevelspecified, sorts on index/named index/level of multi-index:sort(level=0)==sort_index()==sortlevel()sort(level='spam')==sort_index('spam')==sortlevel('spam')Comments welcome.