Precision of floats with to_csv - difference between Python 2 and Python 3 #10777

scls19fr · 2015-08-09T06:57:23Z

Hello,

I noticed that size of a CSV file generated with Pandas and Python are differents depending of which version of Python was used.

I was expecting same behaviour for same code.

With Python 3

pc:~ scls$ cat big.py
import pandas as pd
import numpy as np
(rows, cols) = (400, 10)
a = np.random.random((rows, cols))
df = pd.DataFrame(a)
filename = "big_random.csv"
df.to_csv(filename, index=False)
pc:~ scls$ python --version
Python 3.4.3 :: Anaconda 2.3.0 (x86_64)
pc:~ scls$ python big.py
pc:~ scls$ head big_random.csv
0,1,2,3,4,5,6,7,8,9
0.7175194974125143,0.9868374217576047,0.032049602250014075,0.6289681136928122,0.8096042270600179,0.33685028982497345,0.15762455315620005,0.4691775579462879,0.4456870865050826,0.18795963399879667
0.6587841850716022,0.9138862585748279,0.725186213931711,0.8455808154946725,0.7513894749589103,0.1264561639354813,0.22313629403106283,0.7082854809639854,0.6372581410511284,0.39526526133363016
0.41655411171973145,0.82608272240786,0.39046502732419675,0.21280845299958473,0.7260928524192569,0.13413288736071716,0.6403422588148618,0.38493112678936114,0.1008469225955716,0.7569988810703301
0.4153526009936582,0.31647402611493414,0.9975731184442808,0.6426165566647016,0.09261643366931205,0.3227891182788275,0.8867457623057338,0.27223526407455145,0.3281299815210015,0.9740848774163636
0.18492378563510736,0.6467683901479606,0.040191223061303516,0.06418796210918698,0.6377758098323728,0.3015310590768058,0.35801398526272554,0.3847352145606483,0.5169639983061501,0.7688238573672432
0.12776779442246045,0.13988857304612567,0.5174730743084831,0.48860306709655155,0.6430744296754209,0.7043353997674583,0.9036918523659346,0.8363827082165963,0.10904005101984726,0.3467075055731551
0.8735436905183718,0.3094682378308442,0.3425056806446519,0.6327109907812603,0.027768508379761192,0.7572863534573687,0.013631783039836698,0.9498400284024592,0.7489006948603708,0.26146706653431384
0.00706906732485435,0.398808829510499,0.1603837067149072,0.1162434740119399,0.6308407696050173,0.38437501090290294,0.7084745025285255,0.6766732951295603,0.09640698119674629,0.16475759581133098
0.37288409337600503,0.8170980071434518,0.10346296752178363,0.22734655867481057,0.977310707692392,0.2058569589426188,0.810879704065204,0.4644448946189589,0.7872748134058031,0.21634203693609444

Let's do the same with Python 2

pc:~ scls$ source activate py2
discarding //anaconda/bin from PATH
prepending //anaconda/envs/py2/bin to PATH
(py2)pc:~ scls$ python --version
Python 2.7.10 :: Anaconda 2.3.0 (x86_64)
(py2)pc:~ scls$ python big.py
(py2)pc:~ scls$ head big_random.csv
0,1,2,3,4,5,6,7,8,9
0.5579481683,0.684701543521,0.754306080917,0.618156128389,0.172254680145,0.0174204117472,0.42003733688,0.544810598703,0.501523693218,0.254650528482
0.245211610381,0.242803787702,0.74730831067,0.902427362626,0.79284128878,0.759901967668,0.138869495692,0.409657542539,0.800543764611,0.126875692556
0.157008551856,0.196911813758,0.427114483552,0.513200703916,0.629485103457,0.158393748929,0.725090100741,0.997671387723,0.168756770968,0.307894016467
0.277986851471,0.841819960853,0.948682092484,0.0698344807858,0.843959698756,0.124105138469,0.685600301284,0.638439389501,0.153843520073,0.00693283214343
0.825322391369,0.246830314636,0.76342798427,0.588335209531,0.0639153711562,0.277168287326,0.660799511539,0.246912047114,0.525794863223,0.606527113773
0.422893634037,0.416014910374,0.0282877421175,0.479474754244,0.562079226872,0.554424129574,0.850810096081,0.980219346119,0.376727776223,0.0202092423104
0.107718832593,0.82063197471,0.293988837033,0.0741333403483,0.223505401274,0.506775135928,0.411408416805,0.828313119764,0.670612028027,0.67312260052
0.822882425742,0.0355538636782,0.0453556725915,0.483123830922,0.726536606867,0.265317264415,0.190839972237,0.63416336544,0.776559958794,0.198684003523
0.0159240676555,0.082225869763,0.9188622672,0.628898793501,0.598847602455,0.479313877636,0.830676086143,0.930886044804,0.979980325282,0.42786165221

Compare number of digits of

0.7175194974125143
0.5579481683

Same code leads to 6 digits difference.

Kind regards

The text was updated successfully, but these errors were encountered:

kawochen · 2015-08-09T07:14:01Z

In P3 str is the same as repr (more digits) for floats so that distinct numbers have different representations.

jreback · 2015-08-09T08:05:04Z

further when u compare random numbers like this you should use
np.random.seed to assure they see actually the same

jreback · 2015-08-09T08:08:27Z

https://docs.python.org/2/tutorial/floatingpoint.html

scls19fr mentioned this issue Aug 9, 2015

UnboundLocalError: local variable 'intermediate' referenced before assignment TabViewer/gtabview#18

Closed

jreback closed this as completed Aug 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precision of floats with to_csv - difference between Python 2 and Python 3 #10777

Precision of floats with to_csv - difference between Python 2 and Python 3 #10777

scls19fr commented Aug 9, 2015

kawochen commented Aug 9, 2015

jreback commented Aug 9, 2015

jreback commented Aug 9, 2015

Precision of floats with to_csv - difference between Python 2 and Python 3 #10777

Precision of floats with to_csv - difference between Python 2 and Python 3 #10777

Comments

scls19fr commented Aug 9, 2015

kawochen commented Aug 9, 2015

jreback commented Aug 9, 2015

jreback commented Aug 9, 2015