Potential bug: drop_duplicates() and duplicated() fail for multiple integer columns #11543

pekaalto · 2015-11-07T14:08:35Z

It seems that drop_duplicates() and duplicated() methods are not working properly for large integer columns. Here is my example data frame http://pastebin.com/KVHxUpgz

import pandas as pd

pd.read_clipboard(delimiter=',')
r = x.duplicated(keep=False)
print(x[r])

This gives me:
x1 x2
8 16000010001 8470207
95 16000010009 8470039

Clearly these are not duplicates but seems like pandas thinks they are!

Also drop_duplicates() seems to fail:

print(len(x),len(x.drop_duplicates()))

gives: 101 100

When I convert my columns to string they are not duplicates anymore:

r1 = x.apply(lambda x: '%d-%d' % tuple(x),axis=1).duplicated()
print(r1.sum())

is 0 as it should.

Here is the versions:

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: fi_FI

pandas: 0.17.0
nose: None
pip: 7.1.2
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: None
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None

The text was updated successfully, but these errors were encountered:

jreback · 2015-11-07T14:12:28Z

thanks for the report, a dupe of: #11376

this was already fixed here: #11403

and will be in forthcoming 0.17.1 (it's in master now)

pekaalto · 2015-11-07T14:13:49Z

Thanks! My searching skills suck :(
E: ...for some reason I was searching open issues.

pekaalto changed the title ~~drop_duplicates() and duplicated() fail for multiple integer columns~~ Potential bug: drop_duplicates() and duplicated() fail for multiple integer columns Nov 7, 2015

jreback closed this as completed Nov 7, 2015

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug: drop_duplicates() and duplicated() fail for multiple integer columns #11543

Potential bug: drop_duplicates() and duplicated() fail for multiple integer columns #11543

pekaalto commented Nov 7, 2015

jreback commented Nov 7, 2015

pekaalto commented Nov 7, 2015

Potential bug: drop_duplicates() and duplicated() fail for multiple integer columns #11543

Potential bug: drop_duplicates() and duplicated() fail for multiple integer columns #11543

Comments

pekaalto commented Nov 7, 2015

INSTALLED VERSIONS

jreback commented Nov 7, 2015

pekaalto commented Nov 7, 2015