You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
# Your code hereimportpandasaspdleft=pd.DataFrame({'a':[1567808378753000000], 'b':[0]})
right=pd.DataFrame({'a':[1567808378753274000], 'b':[0]})
print(left.compare(right))
Problem description
The current output is
a
self other
0 1567808378752999936 1567808378753274112
which differs from the values in the original dataframe.
[this should explain why the current behaviour is a problem and why the expected output is a better solution]
Expected Output
a
self other
0 1567808378753000000 1567808378753274000
Output of pd.show_versions()
[paste the output of pd.show_versions() here leaving a blank line after the details tag]
INSTALLED VERSIONS
commit : 7d32926
python : 3.8.7.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-1036-gcp
Version : #39-Ubuntu SMP Thu Jan 14 18:41:17 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
As part of generating the result, positions with equal value are masked out. So since they are equal in the column b, there ends up being a NaN added in that position.
This issue could be fixed, but it would likely not be easy. What's happening is that since a and b are integer columns, they are a stored in a single block. There is logic hit that essentially says - if the block can't hold NaN, then coerce to something which can. So column a is coerced as well, even though it does not necessarily need to be.
Because of this, if b is a float column this will actually work because a and b will be stored in different blocks.
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
The current output is
which differs from the values in the original dataframe.
[this should explain why the current behaviour is a problem and why the expected output is a better solution]
Expected Output
0 1567808378753000000 1567808378753274000
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here leaving a blank line after the details tag]INSTALLED VERSIONS
commit : 7d32926
python : 3.8.7.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-1036-gcp
Version : #39-Ubuntu SMP Thu Jan 14 18:41:17 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.2.2
numpy : 1.20.1
pytz : 2021.1
dateutil : 2.8.1
pip : 19.3.1
setuptools : 49.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.22
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: