Fix nsmallest/nlargest With Identical Values #15299

RogerThomas · 2017-02-03T16:38:25Z

closes Weird behavior using nlargest/nsmallest when there are the n smallest/largest values are identical #15297
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

RogerThomas · 2017-02-03T16:40:52Z

@jreback What do you think of this approach? The issue before was on this line
https://github.com/pandas-dev/pandas/blob/master/pandas/core/algorithms.py#L917
when merging series with the frame when values are identical you end up duplicating the df potentially n times

(obviously PR isn't ready to go, just wanted your opinion on the approach first)

jreback · 2017-02-03T16:42:01Z

pandas/tests/frame/test_analytics.py

@@ -1323,6 +1323,12 @@ def test_nlargest_multiple_columns(self):
        expected = df.sort_values(['a', 'b'], ascending=False).head(5)
        tm.assert_frame_equal(result, expected)

+    def test_nlargest_identical_values(self):
+        df = pd.DataFrame({'a': [1] * 5, 'b': [1, 2, 3, 4, 5]})


add the issue number here as a comment

test on nsmallest as well.

jreback · 2017-02-03T16:42:12Z

pandas/tests/frame/test_analytics.py

+        result = df.nlargest(3, 'a')
+        expected = pd.DataFrame({'a': [1] * 3, 'b': [1, 2, 3]})
+        tm.assert_frame_equal(result, expected)
+


can you add similar tests for series

jreback · 2017-02-03T16:44:33Z

pandas/core/algorithms.py

@@ -911,10 +911,15 @@ def select_n_frame(frame, columns, n, method, keep):
    if not is_list_like(columns):
        columns = [columns]
    columns = list(columns)
+    tmp = Series(frame.index)
+    frame.reset_index(inplace=True, drop=True)
    ser = getattr(frame[columns[0]], method)(n, keep=keep)
    if isinstance(ser, Series):
        ser = ser.to_frame()


so the right way to fix this is to make the internal routines return an indexer instead (e.g. integers), select_n_series already has this (just needs to return it).

then this is a simple .take()

@jreback, i'm a little confused here. I don't see how the current implementation produces the desired result when using multiple columns, for example if we have:

df = pd.DataFrame(dict( a=[1, 1, 1, 2, 2], b=[4, 3, 8, 9, 5] ))

and we call df.nsmallest(2, ['a', 'b'])

There's no code in the current implementation that would try to order by both columns a and b in that order to give us a resulting frame of

pd.DataFrame(dict( a=[1, 1], b=[3, 4] ))

right I don't think it IS there. what you can easily do it simply use the routine to select the nsmallest/largest (its according to a single column), then sort accordting to all of the columns provided (this is still quite efficient as the selection uses partition sort which).

Hmm, not sure we can do that @jreback, take this for example

df = pd.DataFrame(dict( a=[2, 2, 1, 1, 1], b=[4, 3, 8, 9, 5], )) result = df.nsmallest(2, ['a', 'b'])

If we do nsmallest on col a and then sort on cols a and b
we'll get

a c 2 1 8 3 1 9

when the desired result is

a c 4 1 5 2 1 8

I'm not sure there's much more we can do than sort_values(columns).head/tail(n)

@RogerThomas that defeats the entire purpose of partition sort.

a way to do this might be to (for say nsmallest when multiple columns)
run nsmallest with the first column, take the last value (or values if they are duplicates), select from the frame and argsort them. Then your result are the top values from the first run + the (totalrequested - top values of the 2nd run).

e.g in your example above since nsmallest(2, 'a') yields 1 (two rows of them), you could then select the 1 from the frame and just argsort at that point.

This would allow the single column to be quite efficient and avoid the next-n problem.

You would only need to do this if you have multiple columns.

I would look around and see if there is a better solution out there (IOW google and/or look at how some other stats packages do this).

codecov-io · 2017-02-03T18:11:57Z

Codecov Report

Merging #15299 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15299      +/-   ##
==========================================
+ Coverage   90.96%   90.96%   +<.01%     
==========================================
  Files         145      145              
  Lines       49534    49560      +26     
==========================================
+ Hits        45057    45082      +25     
- Misses       4477     4478       +1

Flag	Coverage Δ
#multiple	`88.72% <100%> (ø)`	⬆️
#single	`40.61% <3.12%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/algorithms.py	`93.92% <100%> (+0.23%)`	⬆️
pandas/core/common.py	`90.68% <0%> (-0.35%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1fbdc23...d3964f8. Read the comment docs.

jreback · 2017-02-16T17:53:02Z

any progress?

RogerThomas · 2017-02-16T17:54:17Z

Ah, sorry @jreback been super busy, haven't gotten around to this. Will try and get to it over the weekend, is that ok?

jreback · 2017-02-16T18:00:20Z

@RogerThomas np. just pushing things along. you can ping when you are ready for a review.

jreback · 2017-02-21T14:43:52Z

here is another example: #14846 (comment)

jreback · 2017-03-28T18:00:43Z

@RogerThomas can you rebase / update.

RogerThomas · 2017-03-30T10:26:13Z

@jreback sorry took so long on this, been super busy.
This should be good to go now

jreback · 2017-03-30T12:51:14Z

pandas/core/algorithms.py

+    reverse = method == 'nlargest'
+    for i, column in enumerate(columns):
+        series = frame[column]
+        if reverse:


doesn't this defeat the purpose of .nlargest()? e.g. you are sorting the entire series.

why not just use series.nlargest(n) here? (or .nsmallest). And defer to the duplicate handling there (I think that works).

But if there are duplicates I need to do another filter using isin this returns an unsorted view of the series, I then have no way of aligning the series according to the sorting requirement, i'd still need to do an argsort after the nsmallest/nlargest call

Is this sorta what you mean @jreback ?

RogerThomas#4

Note I had to copy the index incase there is duplicate index in incoming frame

jreback · 2017-03-30T12:51:55Z

pandas/core/algorithms.py

+        else:
+            inds = series.argsort()[:n]
+        values = series.iloc[inds]
+        if i != len(columns) - 1 and values.duplicated().any():


values.is_unique is more idiomatic

jreback · 2017-03-30T12:54:24Z

pandas/tests/series/test_analytics.py

@@ -1455,6 +1455,16 @@ def test_nsmallest_nlargest(self):
        expected = s.sort_values().head(3)
        assert_series_equal(result, expected)

+        # GH 15297
+        s = Series([1] * 5, index=[1, 2, 3, 4, 5])
+        expected = Series([1] * 3, index=[1, 2, 3])


add to the doc-string for both Series/DataFrame what the policy for dealing with duplicates is. IOW, we are going to always return n (or less values), so the duplicated portion would be in order of the index (see if you can reword!)

@jreback are you sure we need to add to the docstring, I think it's obvious enough here. keep=first in this example, and the series values are all duplicated, so you would expect it to return [1, 2, 3] as the index regardless of using nsmallest or nlargest

jreback · 2017-03-30T19:58:49Z

pandas/core/algorithms.py

+            cur_frame = cur_frame[series.isin(values)].sort_values(
+                column, ascending=ascending
+            )
+    frame.index = tmp.ix[frame.index]


don't use .ix

jreback · 2017-03-30T19:59:45Z

pandas/core/algorithms.py

+
+    ascending = method == 'nsmallest'
+    tmp = Series(frame.index)
+    frame.reset_index(inplace=True, drop=True)


what are you trying to do here?

inplace is not idiomatic at all.

I need to reorder the incoming frame based on the specified sorting method. I can't do that using the original index as it might contain duplicates which gives rise to the original issue. Therefore I save the original index to temp and reset the index of frame inplace so we're not needlessly copying the entire frame.

ok, try to make as simple as possible then. Generally you want to use indexers (meaning locations/positions) for anything that could possibly be duplicated, then do a take at the end.

also you don't want to modify the input AT ALL.

Hey @jreback, i understand the need to not change the input, but as you see on L968 I change the index back.
I don't think this solution is ideal, but I don't see how we can support duplicate indexes and not temporarily change the index on frame.

you can copy it if you need to change it.

copy is quite cheap (compared to everything else). we cannot modify the input at all.

OK fair enough, just in my personal experience I deal with large dataframes (frames that just about fit in memory) and I have to be very careful about copying.

I have to sign off for the day, buy I'll clean this up tomorrow with a copy (also a positive is we only have to copy of the index is duplicated)

*if the index is duplicated

@RogerThomas yes, this procedure only needs to be done on duplicated anyhow (which to be honest is not very common :)

thanks!

ping when you need a look

RogerThomas · 2017-03-31T10:43:31Z

OK @jreback pretty sure this is ready to go now.

Saving and restoring duplicated index turned out to be challenging when considering multi-indexes, but managed to get it.

jreback

acutally I don't see any reason you need to actually copy the original frame, you simply need to keep a reference to it original_index. Then you drop the index. I will reverse what I said earlier, it seems you can just do this for both duplicated and unique indexes.

jreback · 2017-03-31T13:05:41Z

doc/source/whatsnew/v0.20.0.txt

@@ -1057,3 +1057,4 @@ Bug Fixes
 - Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
 - Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)
 - Bug in ``pd.read_msgpack()`` which did not allow to load dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`)
+- Bug in ``DataFrame.nsmallest`` and ``DataFrame.nlargest`` where identical values resulted in duplicated rows (:issue:`15297`)


pls move this to the correct section (reshaping) in Bug FIxes. be sure to rebase on master.

jreback · 2017-03-31T13:08:14Z

pandas/core/algorithms.py

+        # We must save frame's index to tmp
+        tmp = Series(np.arange(len(frame)), index=frame.index)
+        frame = frame.reset_index(drop=True)
+    for i, column in enumerate(columns):


can you add a comment on what this algo is doing.

jreback · 2017-03-31T13:10:27Z

pandas/core/algorithms.py

+    if not index_is_unique:
+        # If index not unique we must reset index to allow re-indexing below
+        # We must save frame's index to tmp
+        tmp = Series(np.arange(len(frame)), index=frame.index)


you can just do

original_index = frame.index

jreback · 2017-03-31T19:50:20Z

pandas/tests/frame/test_analytics.py

+                 ['c'],
+                 ['a', 'b'],
+                 ['a', 'c'],
+                 ['b', 'a'],


@RogerThomas you will see this test failing for the below. The issues is that if one of the columns you are n* on is object this will fail. So I think that should raise a nice error message (you can check if any are object upfront).

___________________________________________________________________________________________ TestNLargestNSmallest.test_n[10-order119] ___________________________________________________________________________________________ pandas/tests/frame/test_analytics.py:1945: in test_n result = df.nsmallest(n, order) pandas/core/frame.py:3478: in nsmallest return algorithms.select_n_frame(self, columns, n, 'nsmallest', keep) pandas/core/algorithms.py:966: in select_n_frame values = getattr(series, method)(n, keep=keep) pandas/core/series.py:1907: in nsmallest method='nsmallest') pandas/core/algorithms.py:915: in select_n_series raise TypeError("Cannot use method %r with dtype %s" % (method, dtype)) E TypeError: Cannot use method 'nsmallest' with dtype object

RogerThomas · 2017-04-03T13:58:26Z

@jreback updated tests for object dtypes

Also had to update algorithm as your tests exposed a flaw in the previous algorithm

jreback · 2017-04-03T14:01:56Z

pandas/core/algorithms.py

-    return ser.merge(frame, on=columns[0], left_index=True)[frame.columns]
+    for column in columns:
+        dtype = frame[column].dtype
+        if not issubclass(dtype.type, (np.integer, np.floating, np.datetime64,


use is_integer_dtype, is_float_dtype, is_datetime64_any_dtype, is_timedelta64_dtype; this would break on others (e.g. categorical).

its possible we need additional tests for this (just checking)

though, I think you call EACH column (as a Series) right? so this check is already there (well not 100% that is actually checks things).

Or you may skip a column if no dupes?

What I am trying to say is that I want to defer the dtype checks to the other algos if possible (and not do it here).

Ah right, ok, sorry I thought by your "check upfront" comment you wanted to do it for each column in this method.
But you're right the line getattr(series, method)(cur_n, keep=keep) should defer the check to n* on the series

right, if we ARE checking all columns with this method (whether they are duplicated or not). Then this is fine. Is that always true?

jreback · 2017-04-03T14:03:27Z

pandas/core/algorithms.py

+        series = cur_frame[column]
+        values = getattr(series, method)(cur_n, keep=keep)
+        is_last_column = len(columns) - 1 == i
+        if is_last_column or len(values.unique()) == sum(series.isin(values)):


use values.nunique() == series.isin(values).sum() (you don't want to use python operators)

jreback · 2017-04-03T14:06:28Z

pandas/core/algorithms.py

+        duplicated_filter = series.duplicated(keep=False)
+        non_duplicated = values[~duplicated_filter]
+        duplicated = values[duplicated_filter]
+        if method == 'nsmallest':


might be a tad more clear to do

def get_indexer(x, y): if method == ....

then (eg.)

indexer = get_indexer(indexer, values.index)

You know what, I was SO close to doing that, but wasn't sure how you guys felt about inline functions. But, I agree, i'll do it now

yes, I am big fan of DRY, even at the expense of some readability.

jreback · 2017-04-03T14:10:24Z

pandas/tests/frame/test_analytics.py

+            expected = df.sort_values(order, ascending=False).head(n)
+            tm.assert_frame_equal(result, expected)
+
+    def test_n_error(self, df_strings):


prob should expand this with a different df that has pretty much all dtypes, then can test ones which have an error

heres a sample

df = pd.DataFrame( {'group': [1, 1, 2], 'int': [1, 2, 3], 'float': [4., 5., 6.], 'string': list('abc'), 'category_string': pd.Series(list('abc')).astype('category'), 'category_int': [7, 8, 9], 'datetime': pd.date_range('20130101', periods=3), 'datetimetz': pd.date_range('20130101', periods=3, tz='US/Eastern'), 'timedelta': pd.timedelta_range('1 s', periods=3, freq='s')}, columns=['group', 'int', 'float', 'string', 'category_string', 'category_int', 'datetime', 'datetimetz', 'timedelta'])

jreback · 2017-04-03T14:11:02Z

pandas/tests/series/test_analytics.py

+    @pytest.mark.parametrize(
+        "r", [Series([3., 2, 1, 2, '5'], dtype='object'),
+              Series([3., 2, 1, 2, 5], dtype='object'),
+              # not supported on some archs


maybe add a mini-example from test_error in frame

RogerThomas · 2017-04-03T15:58:20Z

Hey @jreback
How about this?

RogerThomas#5

jreback · 2017-04-05T13:07:01Z

pandas/tests/frame/test_analytics.py

+    def test_n(self, df_strings, method, n, order):
+        # GH10393
+        df = df_strings
+        if order[0] == 'b':


shouldn't this error if 'b' is in order (IOW, it is by-definition an unorderable column)

Well, no because the only time the algorithm considers 'b' is if it's the first field it encounters, otherwise, because 'a' and 'c' are unique, 'b' will not be considered. See comment on line 1967

hmm, I think that is a bit odd then. Why shouldn't it raise if a non-valid column is in here?

jreback · 2017-04-05T13:08:45Z

pandas/tests/frame/test_analytics.py

+    # Top / bottom
+    @pytest.mark.parametrize(
+        'method, n, order',
+        product(['nsmallest', 'nlargest'], range(1, 11),


I think adding method doesn't add much here (and instead add both as comparision tests below)

but after seeing your other methods, maybe ok. if its easier to debug then ok.

jreback · 2017-04-05T13:09:03Z

pandas/tests/frame/test_analytics.py

+            # Only expect error when 'b' is first in order, as 'a' and 'c' are
+            # unique
+            error_msg = (
+                "'b' has dtype: object, cannot use method 'nsmallest' "


you can use tm.assertRaisesRegexp when you want to compare the message as well (also a context manager)

jreback · 2017-04-05T13:26:40Z

this is prob because Series is not tested as well. Let me see if I can fix this (and you can rebase on top).

=================================== FAILURES ===================================
___________________ TestNLargestNSmallest.test_n_all_dtypes ____________________
[gw0] linux2 -- Python 2.7.13 /home/travis/miniconda3/envs/pandas/bin/python
self = <pandas.tests.frame.test_analytics.TestNLargestNSmallest object at 0x7ff6da2795d0>
df_main_dtypes =    group  int  float string category_string  category_int   datetime  \
0     ...1 2013-01-02 00:00:00-05:00  00:00:02  
2 2013-01-03 00:00:00-05:00  00:00:03  
    def test_n_all_dtypes(self, df_main_dtypes):
        df = df_main_dtypes
>       df.nsmallest(2, list(set(df) - {'category_string', 'string'}))
/home/travis/miniconda3/envs/pandas/lib/python2.7/site-packages/pandas/tests/frame/test_analytics.py:1998: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/home/travis/miniconda3/envs/pandas/lib/python2.7/site-packages/pandas/core/frame.py:3478: in nsmallest
    return algorithms.select_n_frame(self, columns, n, 'nsmallest', keep)
/home/travis/miniconda3/envs/pandas/lib/python2.7/site-packages/pandas/core/algorithms.py:976: in select_n_frame
    values = getattr(series, method)(cur_n, keep=keep)
/home/travis/miniconda3/envs/pandas/lib/python2.7/site-packages/pandas/core/series.py:1907: in nsmallest
    method='nsmallest')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
series = 0   2013-01-01 00:00:00-05:00
1   2013-01-02 00:00:00-05:00
2   2013-01-03 00:00:00-05:00
Name: datetimetz, dtype: datetime64[ns, US/Eastern]
n = 2, keep = 'first', method = 'nsmallest'
    def select_n_series(series, n, keep, method):
        """Implement n largest/smallest for pandas Series
    
        Parameters
        ----------
        series : pandas.Series object
        n : int
        keep : {'first', 'last'}, default 'first'
        method : str, {'nlargest', 'nsmallest'}
    
        Returns
        -------
        nordered : Series
        """
        dtype = series.dtype
        if not issubclass(dtype.type, (np.integer, np.floating, np.datetime64,
                                       np.timedelta64)):
>           raise TypeError("Cannot use method %r with dtype %s" % (method, dtype))
E           TypeError: Cannot use method 'nsmallest' with dtype datetime64[ns, US/Eastern]
/home/travis/miniconda3/envs/pandas/lib/python2.7/site-packages/pandas/core/algorithms.py:915: TypeError
============ 1 failed, 8537 passed, 1991 skipped in 546.94 seconds =============

RogerThomas · 2017-04-05T13:37:27Z

pandas/core/algorithms.py

@@ -944,14 +944,60 @@ def select_n_frame(frame, columns, n, method, keep):
    -------
    nordered : DataFrame
    """
-    from pandas.core.series import Series
+    from pandas import Int64Index


@jreback I only put this import here as the previous algorithm imported Series here, why do you do inline imports?
I would always have all my imports at the top

because some of the routines from algorithms are imported at the very top level (e.g. pd.factorize and you would get circular imports otherwise.

AH ok, thanks!

jreback · 2017-04-05T14:05:50Z

@RogerThomas ok #15902 fixes the tz problem in series nsmallest. after I merge you should be able to easilly rebase on top, just takepandas/tests/series/test_analytics.py from master and you should be good to go.

xref #15299 Author: Jeff Reback <jeff@reback.net> Closes #15902 from jreback/series_n and squashes the following commits: 657eac8 [Jeff Reback] TST: better testing of Series.nlargest/nsmallest

jreback · 2017-04-05T19:19:21Z

@RogerThomas ok merged that PR in. pls rebase and should work!

RogerThomas · 2017-04-05T21:12:25Z

OK @jreback tests passing for me

jreback · 2017-04-05T21:24:41Z

pandas/core/algorithms.py

@@ -981,14 +981,60 @@ def select_n_frame(frame, columns, n, method, keep):
    -------
    nordered : DataFrame
    """
-    from pandas.core.series import Series
+    from pandas import Int64Index
    if not is_list_like(columns):
        columns = [columns]
    columns = list(columns)


I think we need to do this check on all the columns upfront (its cheap).

you can put this in a separate function on core/algorithms (called by there and here): https://github.com/pandas-dev/pandas/blob/master/pandas/core/algorithms.py#L949

Do u want to reference the column in the exception? IOW

def _ensure_is_valid_dtype_n_method(method, dtype): if not ((is_numeric_dtype(dtype) and not is_complex_dtype(dtype)) or needs_i8_conversion(dtype)): raise TypeError("Cannot use method '{method}' with " "dtype {dtype}".format(method=method, dtype=dtype))

or something like

def _is_valid_dtype_n_method(dtype): return ((is_numeric_dtype(dtype) and not is_complex_dtype(dtype)) or needs_i8_conversion(dtype)):

and then in select_n_frame function something like

if not _is_valid_dtype_n_method(dtype): raise TypeError("Column `column` has dtype `dtype` can't use n with column")

and then in select_n_series function something like

if not _is_valid_dtype_n_method(dtype): raise TypeError("Cannot use method `method` with dtype")

def _is_valid_dtype_n_method(dtype): return ((is_numeric_dtype(dtype) and not is_complex_dtype(dtype)) or needs_i8_conversion(dtype))

then you can have separate messages for a series calling this (alone) and a dataframe with many columns.

though keep the message for Series pretty much the same (actually it IS tested).

jreback · 2017-04-06T13:44:30Z

thanks @RogerThomas very nice PR! and excellent responsiveness on changes.

keem em coming!

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 3, 2017

jreback requested changes Feb 3, 2017

View reviewed changes

RogerThomas force-pushed the fix_nsmallest_nlargest_with_n_identical_values branch 5 times, most recently from d9db74c to 739704e Compare March 30, 2017 10:24

RogerThomas force-pushed the fix_nsmallest_nlargest_with_n_identical_values branch from f070082 to 4955d67 Compare March 30, 2017 10:48

jreback reviewed Mar 30, 2017

View reviewed changes

jreback requested changes Mar 30, 2017

View reviewed changes

RogerThomas force-pushed the fix_nsmallest_nlargest_with_n_identical_values branch 6 times, most recently from 1664eb5 to 1a8043f Compare March 31, 2017 10:42

jreback requested changes Mar 31, 2017

View reviewed changes

RogerThomas force-pushed the fix_nsmallest_nlargest_with_n_identical_values branch from 2b47439 to 5f772db Compare March 31, 2017 13:37

jreback reviewed Mar 31, 2017

View reviewed changes

RogerThomas force-pushed the fix_nsmallest_nlargest_with_n_identical_values branch from d08a60e to 7f8cd04 Compare April 3, 2017 13:54

jreback reviewed Apr 3, 2017

View reviewed changes

jreback reviewed Apr 5, 2017

View reviewed changes

RogerThomas force-pushed the fix_nsmallest_nlargest_with_n_identical_values branch from 75ac0b7 to 901ab4f Compare April 5, 2017 13:35

RogerThomas commented Apr 5, 2017

View reviewed changes

jreback mentioned this pull request Apr 5, 2017

TST: better testing of Series.nlargest/nsmallest #15902

Closed

jreback mentioned this pull request Apr 5, 2017

CLN: fix core/algorithms.py dtype detection #15903

Closed

RogerThomas force-pushed the fix_nsmallest_nlargest_with_n_identical_values branch 2 times, most recently from 1d4be5e to 333e126 Compare April 5, 2017 21:08

jreback reviewed Apr 5, 2017

View reviewed changes

Fix nsmallest/nlargest With Identical Values

d3964f8

RogerThomas force-pushed the fix_nsmallest_nlargest_with_n_identical_values branch from 4c1720d to d3964f8 Compare April 6, 2017 09:18

jreback closed this in c112252 Apr 6, 2017

jreback mentioned this pull request May 12, 2017

DataFrame.nlargest result error #16314

Closed

jreback mentioned this pull request Sep 9, 2017

Result of groupby -> select -> nlargest duplicates index levels #17477

Open

Fix nsmallest/nlargest With Identical Values #15299

Fix nsmallest/nlargest With Identical Values #15299

Conversation

RogerThomas commented Feb 3, 2017 • edited Loading

RogerThomas commented Feb 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerThomas Feb 6, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback Feb 6, 2017 • edited Loading

Choose a reason for hiding this comment

codecov-io commented Feb 3, 2017 • edited by codecov bot Loading

Codecov Report

jreback commented Feb 16, 2017

RogerThomas commented Feb 16, 2017

jreback commented Feb 16, 2017

jreback commented Feb 21, 2017

jreback commented Mar 28, 2017

RogerThomas commented Mar 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerThomas Mar 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerThomas commented Mar 31, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerThomas commented Apr 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerThomas Apr 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerThomas commented Apr 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 5, 2017

jreback commented Apr 5, 2017

RogerThomas commented Apr 5, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RogerThomas commented Feb 3, 2017 •

edited

Loading

RogerThomas commented Feb 3, 2017 •

edited

Loading

RogerThomas Feb 6, 2017 •

edited

Loading

jreback Feb 6, 2017 •

edited

Loading

codecov-io commented Feb 3, 2017 •

edited by codecov bot

Loading

RogerThomas Mar 31, 2017 •

edited

Loading

RogerThomas Apr 3, 2017 •

edited

Loading