Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add ignore_index for df.sort_values and series.sort_values #30402

Merged
merged 24 commits into from
Dec 27, 2019

Conversation

charlesdong1991
Copy link
Member

@charlesdong1991 charlesdong1991 commented Dec 22, 2019

This is the first step for closing the PR #30114, I will come up with another PR for drop_duplicates

@gfyoung gfyoung added API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 22, 2019
pandas/core/frame.py Show resolved Hide resolved
@@ -474,6 +474,7 @@ Other API changes
Supplying anything else than ``how`` to ``**kwargs`` raised a ``TypeError`` previously (:issue:`29388`)
- When testing pandas, the new minimum required version of pytest is 5.0.1 (:issue:`29664`)
- :meth:`Series.str.__iter__` was deprecated and will be removed in future releases (:issue:`28277`).
- ``ignore_index`` is added in :meth:`DataFrame.sort_values` to be able to get reset index after sorting (:issue:`30114`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- ``ignore_index`` is added in :meth:`DataFrame.sort_values` to be able to get reset index after sorting (:issue:`30114`)
- ``ignore_index`` is added in :meth:`DataFrame.sort_values` to be able to reset index after sorting (:issue:`30114`)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! will change!

@charlesdong1991 charlesdong1991 changed the title ENH: Add ignore_index for df.sort_values ENH: Add ignore_index for df.sort_values and series.sort_values Dec 23, 2019
@@ -2855,6 +2858,9 @@ def _try_kind_sort(arr):

result = self._constructor(arr[sortedIdx], index=self.index[sortedIdx])

if ignore_index:
result = result.reset_index(drop=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for this to differ with the frame implementation? Should be the same in both cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, sorry! I thought the other one was faster in the first place, and then later found basically no difference, and forgot to align them to keep consistency.

Changed! thanks! @WillAyd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh!! i recall why they are different, this new_data here is BlockManager while in series is not. I revert the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use result.index = ibase.default_index(len(indexer))

this is NOT a BlockManager here, but a series

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, sorry for my bad English, i meant new_data in frame.py is BlockManager (should have used there other than here to be precise 😅 ), here it is Series. @jreback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will see my comment in the PR about drop_duplicates. We do not want to use .reset_index directly at all. This causes yet another copy of the data which we can easily avoid here.

you have to be careful to avoid mutating things though.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good, some comments.

pandas/core/generic.py Show resolved Hide resolved
pandas/core/frame.py Show resolved Hide resolved
@@ -474,6 +474,7 @@ Other API changes
Supplying anything else than ``how`` to ``**kwargs`` raised a ``TypeError`` previously (:issue:`29388`)
- When testing pandas, the new minimum required version of pytest is 5.0.1 (:issue:`29664`)
- :meth:`Series.str.__iter__` was deprecated and will be removed in future releases (:issue:`28277`).
- ``ignore_index`` is added in :meth:`DataFrame.sort_values` and :meth:`Series.sort_values` to be able to reset index after sorting (:issue:`30114`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse the ordering a bit :meth:`DataFrame.sort_values` and :meth:`Series.sort_values` have gained the ignore_index......

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed and moved!

@@ -474,6 +474,7 @@ Other API changes
Supplying anything else than ``how`` to ``**kwargs`` raised a ``TypeError`` previously (:issue:`29388`)
- When testing pandas, the new minimum required version of pytest is 5.0.1 (:issue:`29664`)
- :meth:`Series.str.__iter__` was deprecated and will be removed in future releases (:issue:`28277`).
- ``ignore_index`` is added in :meth:`DataFrame.sort_values` and :meth:`Series.sort_values` to be able to reset index after sorting (:issue:`30114`)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to other enhancements

@@ -2855,6 +2858,9 @@ def _try_kind_sort(arr):

result = self._constructor(arr[sortedIdx], index=self.index[sortedIdx])

if ignore_index:
result = result.reset_index(drop=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use result.index = ibase.default_index(len(indexer))

this is NOT a BlockManager here, but a series

@@ -2820,7 +2825,7 @@ def _try_kind_sort(arr):
return arr.argsort(kind="quicksort")

arr = self._values
sortedIdx = np.empty(len(self), dtype=np.int32)
sorted_index = np.empty(len(self), dtype=np.int32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orthogonal to this but I wonder why this is specified as np.int32; might be a bug for input with more than 2 ** 32 - 1 rows

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy uses int32 for argsort so best can do (to be honest 2B is enough)

@jreback jreback added this to the 1.0 milestone Dec 27, 2019
@jreback jreback merged commit f738581 into pandas-dev:master Dec 27, 2019
@jreback
Copy link
Contributor

jreback commented Dec 27, 2019

thanks @charlesdong1991

keechongtan added a commit to keechongtan/pandas that referenced this pull request Dec 29, 2019
…ndexing-1row-df

* upstream/master: (333 commits)
  CI: troubleshoot Web_and_Docs failing (pandas-dev#30534)
  WARN: Ignore NumbaPerformanceWarning in test suite (pandas-dev#30525)
  DEPR: camelCase in offsets, get_offset (pandas-dev#30340)
  PERF: implement scalar ops blockwise (pandas-dev#29853)
  DEPR: Remove Series.compress (pandas-dev#30514)
  ENH: Add numba engine for rolling apply (pandas-dev#30151)
  [ENH] Add to_markdown method (pandas-dev#30350)
  DEPR: Deprecate pandas.np module (pandas-dev#30386)
  ENH: Add ignore_index for df.drop_duplicates (pandas-dev#30405)
  BUG: The setting xrot=0 in DataFrame.hist() doesn't work with by and subplots pandas-dev#30288 (pandas-dev#30491)
  CI: Fix GBQ Tests (pandas-dev#30478)
  Bug groupby quantile listlike q and int columns (pandas-dev#30485)
  ENH: Add ignore_index for df.sort_values and series.sort_values (pandas-dev#30402)
  TYP: Typing hints in pandas/io/formats/{css,csvs}.py (pandas-dev#30398)
  BUG: raise on non-hashable Index name, closes pandas-dev#29069 (pandas-dev#30335)
  Replace "foo!r" to "repr(foo)" syntax pandas-dev#29886 (pandas-dev#30502)
  BUG: preserve EA dtype in transpose (pandas-dev#30091)
  BLD: add check to prevent tempita name error, clsoes pandas-dev#28836 (pandas-dev#30498)
  REF/TST: method-specific files for test_append (pandas-dev#30503)
  marked unused parameters (pandas-dev#30504)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants