Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
DataFrame .duplicated() / .drop_duplicates() flagging unique rows as duplicated in 0.17.1 #11864
Comments
jreback
added Bug Reshaping Difficulty Intermediate Effort Low
labels
Dec 18, 2015
jreback
added this to the
0.18.0
milestone
Dec 18, 2015
|
cc @behzadnouri |
|
64-bit integers are getting sliced to 32-bit in
I've got an easy fix (just change a type from int to int64_t). I'll do a PR tonight. |
evanpw
referenced
this issue
Dec 24, 2015
Closed
BUG: Spurious matches in DataFrame.duplicated when keep=False #11894
jreback
added a commit
that referenced
this issue
Jan 7, 2016
|
|
evanpw + jreback |
b431f85
|
|
closed by #11894 |
jreback
closed this
Jan 7, 2016
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
capelastegui commentedDec 18, 2015
Dataframe.duplicated() is flagging rows as duplicates when they are in fact distinct. This happens when using large dataframes, and duplicated(keep=False):
Out[]: 0
Out[]:110
Changing column order results in different (but still incorrect) behavior.
Out[]:2138
Tested on 0.17.1. Environment details are provided below:
>> pd.util.print_versions.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
This looks like the same kind of problem described in #11668, though the specific examples provided in that issue work properly in 0.17.1