DOC: floating point precision on writing/reading to csv #13159

FBartlett · 2016-05-12T18:35:50Z

Code Sample

x0 = 18292498239.824
df1 = pd.DataFrame({'One': x0},index=["bignum"])
df1.to_csv('repr_test.csv')
df2 = pd.DataFrame.from_csv('repr_test.csv')
df3 = pd.read_csv('repr_test.csv')
x1 = df1['One'][0]
x2 = df2['One'][0]
x3 = df3['One'][0]
fh = open('repr_test.csv','rb')
ll = fh.readlines()
x4 = float(ll[1].split(',')[1].split()[0])
print "x0 = %f; x1 = %f; Are they equal? %s" % (x0,x1,(x0 == x1))
print "x0 = %f; x2 = %f; Are they equal? %s" % (x0,x2,(x0 == x2))
print "x0 = %f; x3 = %f; Are they equal? %s" % (x0,x3,(x0 == x3))
print "x0 = %f; x4 = %f; Are they equal? %s" % (x0,x4,(x0 == x4))

Expected Output

x0 = 18292498239.824001; x1 = 18292498239.824001; Are they equal? True
x0 = 18292498239.824001; x2 = 18292498239.824001; Are they equal? True
x0 = 18292498239.824001; x3 = 18292498239.824001; Are they equal? True
x0 = 18292498239.824001; x4 = 18292498239.824001; Are they equal? True

output of `pd.show_versions()`

(Note that there are two, presented side-by-side, with results underneath)

INSTALLED VERSIONS                      INSTALLED VERSIONS
------------------                      ------------------
commit: None                            commit: None
python: 2.7.5.final.0                   python: 2.7.11.final.0
python-bits: 64                         python-bits: 64
OS: Linux                               OS: Linux
OS-release: 2.6.32-431.56.1.el6.x86_64  OS-release: 2.6.32-431.56.1.el6.x86_64
machine: x86_64                         machine: x86_64
processor: x86_64                       processor: x86_64
byteorder: little                       byteorder: little
LC_ALL: None                            LC_ALL: None
LANG: en_US.UTF-8                       LANG: en_US.UTF-8

pandas: 0.15.1                          pandas: 0.18.0
nose: 1.3.4                             nose: 1.3.7
Cython: 0.21.2                          Cython: 0.23.4
numpy: 1.9.1                            numpy: 1.10.4
scipy: 0.14.0                           scipy: 0.17.0                 
statsmodels: 0.6.0                      statsmodels: 0.6.1            
IPython: 2.3.0                          IPython: 4.1.2 
sphinx: 1.2.3                           sphinx: 1.3.5  
patsy: 0.3.0                            patsy: 0.4.0   
dateutil: 2.2                           dateutil: 2.5.1
pytz: 2014.9                            pytz: 2016.2   
bottleneck: None                        bottleneck: 1.0.0
tables: 3.1.1                           tables: 3.2.2    
numexpr: 2.4                            numexpr: 2.5     
matplotlib: 1.4.2                       matplotlib: 1.5.1
openpyxl: None                          openpyxl: 2.3.2  
xlrd: 0.9.3                             xlrd: 0.9.4      
xlwt: 0.7.5                             xlwt: 1.0.0      
xlsxwriter: 0.6.3                       xlsxwriter: 0.8.4
lxml: 3.3.3                             lxml: 3.6.0      
bs4: 4.3.2                              bs4: 4.4.1       
html5lib: None                          html5lib: None   
httplib2: None                          httplib2: None   
apiclient: None                         apiclient: None  
rpy2: None                              
sqlalchemy: None                        sqlalchemy: 1.0.12                                                    
pymysql: None                           pymysql: None 
psycopg2: None                          psycopg2: None
                                        pip: 8.1.1      
                                        xarray: None    
                                        setuptools: 20.3
                                        blosc: None     
                                        jinja2: 2.8     
                                        boto: 2.39.0

Results from left setup (0.15.1):

x0 = 18292498239.824001; x1 = 18292498239.824001; Are they equal? True
x0 = 18292498239.824001; x2 = 18292498239.823997; Are they equal? False
x0 = 18292498239.824001; x3 = 18292498239.823997; Are they equal? False
x0 = 18292498239.824001; x4 = 18292498239.824001; Are they equal? True

Results from right setup (0.18.0):

x0 = 18292498239.824001; x1 = 18292498239.824001; Are they equal? True
x0 = 18292498239.824001; x2 = 18292498239.799999; Are they equal? False
x0 = 18292498239.824001; x3 = 18292498239.799999; Are they equal? False
x0 = 18292498239.824001; x4 = 18292498239.799999; Are they equal? False

Expectations

I expect to be able to write a DataFrame to a csv file and later read it in to a new DataFrame such that the two DataFrames will be identical. The older version (result 0.15.1) is quite a bit better than the newer (since I can round to three decimal places to get the expected results or read from a filehandle instead of using from_csv() or read_csv()). The newer version (0.18.0) loses information, which is not acceptable.

Note that the documentation at http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.DataFrame.from_csv.html reads

It is preferable to use the more powerful pandas.read_csv() for most general purposes, but from_csv makes for an easy roundtrip to and from a file (the exact counterpart of to_csv), especially with a DataFrame of time series data.

But this does not describe what actually happens, as demonstrated above.

The text was updated successfully, but these errors were encountered:

sinhrks · 2016-05-12T23:20:53Z

Specify required precision via float_format.

df1.to_csv('repr_test.csv', float_format='%.6f')
df2 = pd.DataFrame.from_csv('repr_test.csv')
df2.iloc[0, 0]
# 18292498239.824001

Maybe doc should have float_format section (for output), as it does in float_precision (for input).

http://pandas.pydata.org/pandas-docs/stable/io.html#specifying-method-for-floating-point-conversion

jreback · 2016-05-13T00:17:01Z

yes this is a tradeoff between speed of reading and exactness out to a certain ULP. as @sinhrks indicated for reading we offfer a higher precision option; writing is subject to the vagaries of floating point to stringifciation.

kawochen · 2016-05-13T12:54:52Z

I think writing should have something similar to float_precision, since the round-trip-ability is based mostly on the number of significant digits, not the number of digits after the decimal point.

I haven't looked at the code, but the difference here seems to be related to defaulting to __str__() vs __repr__() on P2. __repr__() has enough digits for round-trip.

BlGene · 2017-11-15T18:09:05Z

Please also consider the case where different columns having different rounding levels.

dhavide · 2022-10-19T17:01:58Z

take

hualiu01 · 2024-08-20T16:47:21Z

@dhavide Is this issue resolved? Can I take this issue?

hualiu01 · 2024-08-20T16:58:07Z

take

hualiu01 · 2024-08-20T18:34:41Z

Tested the original error with python3(3.10.14), printed values are as expected.

Specifically, see code:

import pandas as pd

WORKDIR = '../tmp'

x0 = 18292498239.824
df1 = pd.DataFrame({'One': x0},index=["bignum"])
df1.to_csv(f'{WORKDIR}/repr_test.csv')
# df2 = pd.DataFrame.from_csv('repr_test.csv')
df3 = pd.read_csv(f'{WORKDIR}/repr_test.csv')
x1 = df1['One'].loc[df1.index[0]]
# x2 = df2['One'][0]
x3 = df3['One'].loc[df3.index[0]]
fh = open(f'{WORKDIR}/repr_test.csv','rb')
ll = fh.readlines()

# x4 = float(ll[1].split(',')[1].split()[0])
x4 = float(ll[1].decode().split(',')[1].split()[0])

print(f"x0 = {x0}; x1 = {x1}; Are they equal? {x0 == x1}")
# print(f"x0 = {x0}; x2 = {x2}; Are they equal? {x0 == x2}")
print(f"x0 = {x0}; x3 = {x3}; Are they equal? {x0 == x3}")
print(f"x0 = {x0}; x4 = {x4}; Are they equal? {x0 == x4}")

output

x0 = 18292498239.824; x1 = 18292498239.824; Are they equal? True
x0 = 18292498239.824; x3 = 18292498239.824; Are they equal? True
x0 = 18292498239.824; x4 = 18292498239.824; Are they equal? True

sinhrks added the Docs label May 12, 2016

jreback added Difficulty Novice labels May 13, 2016

jreback added this to the Next Major Release milestone May 13, 2016

jreback changed the title ~~to_csv() / from_csv() roundtrip breaks for floats in 0.15.1 and 0.18.0~~ DOC: floating point precision on writing/reading to csv May 13, 2016

TomAugspurger added the good first issue label Oct 11, 2017

jreback removed the Difficulty Novice label Dec 15, 2017

jbrockmendel removed the Effort Low label Oct 21, 2019

jbrockmendel added the IO CSV read_csv, to_csv label Jan 11, 2022

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

github-actions bot assigned dhavide Oct 19, 2022

github-actions bot assigned hualiu01 Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: floating point precision on writing/reading to csv #13159

DOC: floating point precision on writing/reading to csv #13159

FBartlett commented May 12, 2016

sinhrks commented May 12, 2016 •

edited

Loading

jreback commented May 13, 2016

kawochen commented May 13, 2016 •

edited

Loading

BlGene commented Nov 15, 2017

dhavide commented Oct 19, 2022

hualiu01 commented Aug 20, 2024

hualiu01 commented Aug 20, 2024

hualiu01 commented Aug 20, 2024 •

edited

Loading

DOC: floating point precision on writing/reading to csv #13159

DOC: floating point precision on writing/reading to csv #13159

Comments

FBartlett commented May 12, 2016

Code Sample

Expected Output

output of pd.show_versions()

(Note that there are two, presented side-by-side, with results underneath)

Results from left setup (0.15.1):

Results from right setup (0.18.0):

Expectations

sinhrks commented May 12, 2016 • edited Loading

jreback commented May 13, 2016

kawochen commented May 13, 2016 • edited Loading

BlGene commented Nov 15, 2017

dhavide commented Oct 19, 2022

hualiu01 commented Aug 20, 2024

hualiu01 commented Aug 20, 2024

hualiu01 commented Aug 20, 2024 • edited Loading

output of `pd.show_versions()`

sinhrks commented May 12, 2016 •

edited

Loading

kawochen commented May 13, 2016 •

edited

Loading

hualiu01 commented Aug 20, 2024 •

edited

Loading