Skip to content

Commit

Permalink
BUG: fix memory error in sortlevel when many multiindex levels. close #…
Browse files Browse the repository at this point in the history
  • Loading branch information
wesm committed Jan 21, 2013
1 parent d738b64 commit 2842ad1
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 1 deletion.
3 changes: 3 additions & 0 deletions RELEASE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ pandas 0.10.1
- Raise a more helpful error message in merge operations when one DataFrame
has duplicate columns (GH2649_)
- Fix partial date parsing issue occuring only when code is run at EOM (GH2618_)
- Prevent MemoryError when using counting sort in sortlevel with
high-cardinality MultiIndex objects (GH2684_)

**API Changes**

Expand Down Expand Up @@ -133,6 +135,7 @@ pandas 0.10.1
.. _GH2643: https://github.com/pydata/pandas/issues/2643
.. _GH2649: https://github.com/pydata/pandas/issues/2649
.. _GH2668: https://github.com/pydata/pandas/issues/2668
.. _GH2684: https://github.com/pydata/pandas/issues/2684
.. _GH2689: https://github.com/pydata/pandas/issues/2689
.. _GH2690: https://github.com/pydata/pandas/issues/2690
.. _GH2692: https://github.com/pydata/pandas/issues/2692
Expand Down
7 changes: 6 additions & 1 deletion pandas/core/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2244,7 +2244,12 @@ def _indexer_from_factorized(labels, shape, compress=True):
comp_ids = group_index
max_group = np.prod(shape)

indexer, _ = _algos.groupsort_indexer(comp_ids.astype(np.int64), max_group)
if max_group > 1e6:
# Use mergesort to avoid memory errors in counting sort
indexer = comp_ids.argsort(kind='mergesort')
else:
indexer, _ = _algos.groupsort_indexer(comp_ids.astype(np.int64),
max_group)

return indexer

Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/test_multilevel.py
Original file line number Diff line number Diff line change
Expand Up @@ -612,6 +612,15 @@ def test_sortlevel(self):
rs.sortlevel(0, inplace=True)
assert_frame_equal(rs, self.frame.sortlevel(0))

def test_sortlevel_large_cardinality(self):
# #2684
index = MultiIndex.from_arrays([np.arange(4000)]*3)
df = DataFrame(np.random.randn(4000), index=index)

# it works!
result = df.sortlevel(0)
self.assertTrue(result.index.lexsort_depth == 3)

def test_delevel_infer_dtype(self):
tuples = [tuple for tuple in cart_product(['foo', 'bar'],
[10, 20], [1.0, 1.1])]
Expand Down

0 comments on commit 2842ad1

Please sign in to comment.