You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
While writing a float64 or float32 column to csv cudf is truncating or just writing upto precision 9 for both float64 and float32. This is leading to data truncation incase of float64 and incorrect data representation incase of float32.
Steps/Code to reproduce bug
For float64:
In[55]: pdf=pd.DataFrame({'a':[1.1234567891234564367]})
In[56]: pdfOut[56]:
a01.123457In[57]: pdf.to_csv()
Out[57]: ',a\n0,1.1234567891234564\n'# Notice how pandas allows 16 digits after decimal for float64In[58]: gdf=cudf.DataFrame({'a':[1.1234567891234564367]})
In[59]: gdfOut[59]:
a01.123457In[60]: gdf.to_csv()
Out[60]: ',a\n0,1.123456789\n'# cudf seems to be allowing only 9 digits after decimal for float64In[61]: pdf['a']
Out[61]:
01.123457Name: a, dtype: float64In[62]: gdf['a']
Out[62]:
01.123457Name: a, dtype: float64
For float32:
In[41]: pdf=pd.DataFrame({'a':pd.Series([1.123456789123456], dtype='float32')})
In[42]: pdfOut[42]:
a01.123457In[43]: pdf.to_csv()
Out[43]: ',a\n0,1.1234568\n'# Notice how pandas allows 7 digits after decimal for float32In[44]: gdf=cudf.DataFrame({'a':cudf.Series([1.123456789123456], dtype='float32')})
In[45]: gdfOut[45]:
a01.123457In[46]: gdf.to_csv()
Out[46]: ',a\n0,1.123456836\n'# cudf seems to allow upto 9 digits for float32
Expected behavior
Ideally we should be matching pandas behavior here as the resolutions of float32 and float64 are 1e-06 and 1e-15 respectively:
This 9 digits of precision (actually 10 significant digits) is hardcoded in the float-to-string conversion cudf::strings::from_floats() used by cuio's CSV writer.
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Describe the bug
While writing a
float64
orfloat32
column to csv cudf is truncating or just writing upto precision9
for bothfloat64
andfloat32
. This is leading to data truncation incase offloat64
and incorrect data representation incase offloat32
.Steps/Code to reproduce bug
For
float64
:For
float32
:Expected behavior
Ideally we should be matching pandas behavior here as the resolutions of
float32
andfloat64
are1e-06
and1e-15
respectively:This difference in precision could lead to inequality while comparing data, for example:
cudf.txt
pandas.txt
Diff of these two files: https://www.diffchecker.com/qaAbEXt7
Environment overview (please complete the following information)
0.16
)Environment details
Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsClick here to see environment details
Additional context
Surfaced while running fuzz tests #6001
The text was updated successfully, but these errors were encountered: