You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that drop_duplicates() and duplicated() methods are not working properly for large integer columns. Here is my example data frame http://pastebin.com/KVHxUpgz
import pandas as pd
pd.read_clipboard(delimiter=',')
r = x.duplicated(keep=False)
print(x[r])
The text was updated successfully, but these errors were encountered:
pekaalto
changed the title
drop_duplicates() and duplicated() fail for multiple integer columns
Potential bug: drop_duplicates() and duplicated() fail for multiple integer columns
Nov 7, 2015
It seems that drop_duplicates() and duplicated() methods are not working properly for large integer columns. Here is my example data frame http://pastebin.com/KVHxUpgz
This gives me:
x1 x2
8 16000010001 8470207
95 16000010009 8470039
Clearly these are not duplicates but seems like pandas thinks they are!
Also drop_duplicates() seems to fail:
gives: 101 100
When I convert my columns to string they are not duplicates anymore:
is 0 as it should.
Here is the versions:
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: fi_FI
pandas: 0.17.0
nose: None
pip: 7.1.2
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: None
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.4.3
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
The text was updated successfully, but these errors were encountered: