Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precision of floats with to_csv - difference between Python 2 and Python 3 #10777

Closed
scls19fr opened this issue Aug 9, 2015 · 3 comments
Closed

Comments

@scls19fr
Copy link
Contributor

scls19fr commented Aug 9, 2015

Hello,

I noticed that size of a CSV file generated with Pandas and Python are differents depending of which version of Python was used.

I was expecting same behaviour for same code.

With Python 3

pc:~ scls$ cat big.py
import pandas as pd
import numpy as np
(rows, cols) = (400, 10)
a = np.random.random((rows, cols))
df = pd.DataFrame(a)
filename = "big_random.csv"
df.to_csv(filename, index=False)
pc:~ scls$ python --version
Python 3.4.3 :: Anaconda 2.3.0 (x86_64)
pc:~ scls$ python big.py
pc:~ scls$ head big_random.csv
0,1,2,3,4,5,6,7,8,9
0.7175194974125143,0.9868374217576047,0.032049602250014075,0.6289681136928122,0.8096042270600179,0.33685028982497345,0.15762455315620005,0.4691775579462879,0.4456870865050826,0.18795963399879667
0.6587841850716022,0.9138862585748279,0.725186213931711,0.8455808154946725,0.7513894749589103,0.1264561639354813,0.22313629403106283,0.7082854809639854,0.6372581410511284,0.39526526133363016
0.41655411171973145,0.82608272240786,0.39046502732419675,0.21280845299958473,0.7260928524192569,0.13413288736071716,0.6403422588148618,0.38493112678936114,0.1008469225955716,0.7569988810703301
0.4153526009936582,0.31647402611493414,0.9975731184442808,0.6426165566647016,0.09261643366931205,0.3227891182788275,0.8867457623057338,0.27223526407455145,0.3281299815210015,0.9740848774163636
0.18492378563510736,0.6467683901479606,0.040191223061303516,0.06418796210918698,0.6377758098323728,0.3015310590768058,0.35801398526272554,0.3847352145606483,0.5169639983061501,0.7688238573672432
0.12776779442246045,0.13988857304612567,0.5174730743084831,0.48860306709655155,0.6430744296754209,0.7043353997674583,0.9036918523659346,0.8363827082165963,0.10904005101984726,0.3467075055731551
0.8735436905183718,0.3094682378308442,0.3425056806446519,0.6327109907812603,0.027768508379761192,0.7572863534573687,0.013631783039836698,0.9498400284024592,0.7489006948603708,0.26146706653431384
0.00706906732485435,0.398808829510499,0.1603837067149072,0.1162434740119399,0.6308407696050173,0.38437501090290294,0.7084745025285255,0.6766732951295603,0.09640698119674629,0.16475759581133098
0.37288409337600503,0.8170980071434518,0.10346296752178363,0.22734655867481057,0.977310707692392,0.2058569589426188,0.810879704065204,0.4644448946189589,0.7872748134058031,0.21634203693609444

Let's do the same with Python 2

pc:~ scls$ source activate py2
discarding //anaconda/bin from PATH
prepending //anaconda/envs/py2/bin to PATH
(py2)pc:~ scls$ python --version
Python 2.7.10 :: Anaconda 2.3.0 (x86_64)
(py2)pc:~ scls$ python big.py
(py2)pc:~ scls$ head big_random.csv
0,1,2,3,4,5,6,7,8,9
0.5579481683,0.684701543521,0.754306080917,0.618156128389,0.172254680145,0.0174204117472,0.42003733688,0.544810598703,0.501523693218,0.254650528482
0.245211610381,0.242803787702,0.74730831067,0.902427362626,0.79284128878,0.759901967668,0.138869495692,0.409657542539,0.800543764611,0.126875692556
0.157008551856,0.196911813758,0.427114483552,0.513200703916,0.629485103457,0.158393748929,0.725090100741,0.997671387723,0.168756770968,0.307894016467
0.277986851471,0.841819960853,0.948682092484,0.0698344807858,0.843959698756,0.124105138469,0.685600301284,0.638439389501,0.153843520073,0.00693283214343
0.825322391369,0.246830314636,0.76342798427,0.588335209531,0.0639153711562,0.277168287326,0.660799511539,0.246912047114,0.525794863223,0.606527113773
0.422893634037,0.416014910374,0.0282877421175,0.479474754244,0.562079226872,0.554424129574,0.850810096081,0.980219346119,0.376727776223,0.0202092423104
0.107718832593,0.82063197471,0.293988837033,0.0741333403483,0.223505401274,0.506775135928,0.411408416805,0.828313119764,0.670612028027,0.67312260052
0.822882425742,0.0355538636782,0.0453556725915,0.483123830922,0.726536606867,0.265317264415,0.190839972237,0.63416336544,0.776559958794,0.198684003523
0.0159240676555,0.082225869763,0.9188622672,0.628898793501,0.598847602455,0.479313877636,0.830676086143,0.930886044804,0.979980325282,0.42786165221

Compare number of digits of

  • 0.7175194974125143
  • 0.5579481683

Same code leads to 6 digits difference.

Kind regards

@kawochen
Copy link
Contributor

kawochen commented Aug 9, 2015

In P3 str is the same as repr (more digits) for floats so that distinct numbers have different representations.

@jreback
Copy link
Contributor

jreback commented Aug 9, 2015

further when u compare random numbers like this you should use
np.random.seed to assure they see actually the same

@jreback jreback closed this as completed Aug 9, 2015
@jreback
Copy link
Contributor

jreback commented Aug 9, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants