Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption during Excel export #10970

Closed
bertrandhaut opened this issue Sep 2, 2015 · 5 comments · Fixed by #11237
Closed

Data corruption during Excel export #10970

bertrandhaut opened this issue Sep 2, 2015 · 5 comments · Fixed by #11237
Labels
IO Excel read_excel, to_excel
Milestone

Comments

@bertrandhaut
Copy link
Contributor

In some case a DataFrame exported to excel present some bad values.
It's is not a problem of Excel reading (the data inside the sheet1.xml of the .xlsx file is also incorrect).

The same DataFrame exported to ".csv" is correct.

The problem could be "solved" by renaming the column header as [col-1, col-2,...]. Maybe an encoding problem ?

The issue is that there is no warning/error during the export. It's very easy to miss it.

To reproduce:

import pandas as pd

df = pd.read_pickle('problematic_df.pkl')
df.to_excel('problematic_df.xlsx')
df.to_csv('problematic_df.csv')

with the file available here: https://drive.google.com/file/d/0Bzz_ZaP_wS_HMFdlMkVzaTR0cjA/view?usp=sharing

Note that the content of cell M14 is different in both file (at least when run on my computer)

Using:

  • Python 3.4.3 |Anaconda 2.3.0 (64-bit)
  • pandas 0.16.2
  • Windows 7 64 bits
@jreback
Copy link
Contributor

jreback commented Sep 2, 2015

can you show the dataframe in question above (.info and .dtypes), and a sample .head()

@jreback jreback added the IO Excel read_excel, to_excel label Sep 2, 2015
@bertrandhaut
Copy link
Contributor Author

df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 15 entries, 2015-08-11 00:04:59.999000 to 2015-08-11 01:14:59.999000
Freq: 5T
Data columns (total 20 columns):
TEMP 1 (A11)_°C     3 non-null float64
TEMP 2 (A12)_°C     3 non-null float64
TEMP 3 (A21)_°C     3 non-null float64
TEMP 4 (A23)_°C     3 non-null float64
TEMP 5 (A14)_°C     3 non-null float64
TEMP 6 (A15)_°C     3 non-null float64
TEMP 7 (A16)_°C     3 non-null float64
TEMP 8 (A25)_°C     3 non-null float64
TEMP 9 (B11)_°C     3 non-null float64
TEMP 10 (B21)_°C    3 non-null float64
Ref #11/ADIT_°C     3 non-null Ref #11/ADIT_°C    float64
Ref #11/ADIT_°C    float64
dtype: object
TEMP 11 (B14)_°C    3 non-null float64
TEMP 12 (B24)_°C    3 non-null float64
TEMP 13 (C11)_°C    3 non-null float64
TEMP 14 (C21)_°C    3 non-null float64
TEMP 15 (C14)_°C    3 non-null float64
TEMP 16_°C          3 non-null float64
TEMP 17 (C24)_°C    3 non-null float64
TEMP 18_°C          3 non-null float64
Ref #11/ADIT_°C     3 non-null Ref #11/ADIT_°C    float64
Ref #11/ADIT_°C    float64
dtype: object
dtypes: float64(20)
memory usage: 2.5 KB
None

df.dtypes

TEMP 1 (A11)_°C     float64
TEMP 2 (A12)_°C     float64
TEMP 3 (A21)_°C     float64
TEMP 4 (A23)_°C     float64
TEMP 5 (A14)_°C     float64
TEMP 6 (A15)_°C     float64
TEMP 7 (A16)_°C     float64
TEMP 8 (A25)_°C     float64
TEMP 9 (B11)_°C     float64
TEMP 10 (B21)_°C    float64
Ref #11/ADIT_°C     float64
TEMP 11 (B14)_°C    float64
TEMP 12 (B24)_°C    float64
TEMP 13 (C11)_°C    float64
TEMP 14 (C21)_°C    float64
TEMP 15 (C14)_°C    float64
TEMP 16_°C          float64
TEMP 17 (C24)_°C    float64
TEMP 18_°C          float64
Ref #11/ADIT_°C     float64
dtype: object

df.head()

                        TEMP 1 (A11)_°C  TEMP 2 (A12)_°C  TEMP 3 (A21)_°C  \
2015-08-11 00:04:59.999              NaN              NaN              NaN   
2015-08-11 00:09:59.999              NaN              NaN              NaN   
2015-08-11 00:14:59.999              NaN              NaN              NaN   
2015-08-11 00:19:59.999              NaN              NaN              NaN   
2015-08-11 00:24:59.999              NaN              NaN              NaN   
2015-08-11 00:29:59.999              NaN              NaN              NaN   
2015-08-11 00:34:59.999              NaN              NaN              NaN   
2015-08-11 00:39:59.999              NaN              NaN              NaN   
2015-08-11 00:44:59.999              NaN              NaN              NaN   
2015-08-11 00:49:59.999              NaN              NaN              NaN   
2015-08-11 00:54:59.999              NaN              NaN              NaN   
2015-08-11 00:59:59.999              NaN              NaN              NaN   
2015-08-11 01:04:59.999        28.743081         5.851268        26.925449   
2015-08-11 01:09:59.999        18.642970         3.182361         6.454038   
2015-08-11 01:14:59.999        24.646857        24.838391        21.959323   

                         TEMP 4 (A23)_°C  TEMP 5 (A14)_°C  TEMP 6 (A15)_°C  \
2015-08-11 00:04:59.999              NaN              NaN              NaN   
2015-08-11 00:09:59.999              NaN              NaN              NaN   
2015-08-11 00:14:59.999              NaN              NaN              NaN   
2015-08-11 00:19:59.999              NaN              NaN              NaN   
2015-08-11 00:24:59.999              NaN              NaN              NaN   
2015-08-11 00:29:59.999              NaN              NaN              NaN   
2015-08-11 00:34:59.999              NaN              NaN              NaN   
2015-08-11 00:39:59.999              NaN              NaN              NaN   
2015-08-11 00:44:59.999              NaN              NaN              NaN   
2015-08-11 00:49:59.999              NaN              NaN              NaN   
2015-08-11 00:54:59.999              NaN              NaN              NaN   
2015-08-11 00:59:59.999              NaN              NaN              NaN   
2015-08-11 01:04:59.999        10.989247        17.369523        10.847034   
2015-08-11 01:09:59.999         8.634052         2.338829         5.450807   
2015-08-11 01:14:59.999        21.646028         9.444759        25.819109   

                         TEMP 7 (A16)_°C  TEMP 8 (A25)_°C  TEMP 9 (B11)_°C  \
2015-08-11 00:04:59.999              NaN              NaN              NaN   
2015-08-11 00:09:59.999              NaN              NaN              NaN   
2015-08-11 00:14:59.999              NaN              NaN              NaN   
2015-08-11 00:19:59.999              NaN              NaN              NaN   
2015-08-11 00:24:59.999              NaN              NaN              NaN   
2015-08-11 00:29:59.999              NaN              NaN              NaN   
2015-08-11 00:34:59.999              NaN              NaN              NaN   
2015-08-11 00:39:59.999              NaN              NaN              NaN   
2015-08-11 00:44:59.999              NaN              NaN              NaN   
2015-08-11 00:49:59.999              NaN              NaN              NaN   
2015-08-11 00:54:59.999              NaN              NaN              NaN   
2015-08-11 00:59:59.999              NaN              NaN              NaN   
2015-08-11 01:04:59.999         7.031353        14.956100         4.171982   
2015-08-11 01:09:59.999         0.147043        21.383754        23.385664   
2015-08-11 01:14:59.999        21.723265        19.405757        20.091859   

                         TEMP 10 (B21)_°C  Ref #11/ADIT_°C  TEMP 11 (B14)_°C  \
2015-08-11 00:04:59.999               NaN              NaN               NaN   
2015-08-11 00:09:59.999               NaN              NaN               NaN   
2015-08-11 00:14:59.999               NaN              NaN               NaN   
2015-08-11 00:19:59.999               NaN              NaN               NaN   
2015-08-11 00:24:59.999               NaN              NaN               NaN   
2015-08-11 00:29:59.999               NaN              NaN               NaN   
2015-08-11 00:34:59.999               NaN              NaN               NaN   
2015-08-11 00:39:59.999               NaN              NaN               NaN   
2015-08-11 00:44:59.999               NaN              NaN               NaN   
2015-08-11 00:49:59.999               NaN              NaN               NaN   
2015-08-11 00:54:59.999               NaN              NaN               NaN   
2015-08-11 00:59:59.999               NaN              NaN               NaN   
2015-08-11 01:04:59.999          0.696912         4.977991         10.410759   
2015-08-11 01:09:59.999         10.104453         9.947120         19.196995   
2015-08-11 01:14:59.999         14.178148        38.991964         12.298831   

                         TEMP 12 (B24)_°C  TEMP 13 (C11)_°C  TEMP 14 (C21)_°C  \
2015-08-11 00:04:59.999               NaN               NaN               NaN   
2015-08-11 00:09:59.999               NaN               NaN               NaN   
2015-08-11 00:14:59.999               NaN               NaN               NaN   
2015-08-11 00:19:59.999               NaN               NaN               NaN   
2015-08-11 00:24:59.999               NaN               NaN               NaN   
2015-08-11 00:29:59.999               NaN               NaN               NaN   
2015-08-11 00:34:59.999               NaN               NaN               NaN   
2015-08-11 00:39:59.999               NaN               NaN               NaN   
2015-08-11 00:44:59.999               NaN               NaN               NaN   
2015-08-11 00:49:59.999               NaN               NaN               NaN   
2015-08-11 00:54:59.999               NaN               NaN               NaN   
2015-08-11 00:59:59.999               NaN               NaN               NaN   
2015-08-11 01:04:59.999         15.923412         16.179728         18.776198   
2015-08-11 01:09:59.999          7.768040         12.242002         19.361663   
2015-08-11 01:14:59.999         21.629414         27.349045          9.152102   

                         TEMP 15 (C14)_°C  TEMP 16_°C  TEMP 17 (C24)_°C  \
2015-08-11 00:04:59.999               NaN         NaN               NaN   
2015-08-11 00:09:59.999               NaN         NaN               NaN   
2015-08-11 00:14:59.999               NaN         NaN               NaN   
2015-08-11 00:19:59.999               NaN         NaN               NaN   
2015-08-11 00:24:59.999               NaN         NaN               NaN   
2015-08-11 00:29:59.999               NaN         NaN               NaN   
2015-08-11 00:34:59.999               NaN         NaN               NaN   
2015-08-11 00:39:59.999               NaN         NaN               NaN   
2015-08-11 00:44:59.999               NaN         NaN               NaN   
2015-08-11 00:49:59.999               NaN         NaN               NaN   
2015-08-11 00:54:59.999               NaN         NaN               NaN   
2015-08-11 00:59:59.999               NaN         NaN               NaN   
2015-08-11 01:04:59.999         11.898102           0         11.688812   
2015-08-11 01:09:59.999         26.681062           0         14.318932   
2015-08-11 01:14:59.999         13.217890           0         13.049905   

                         TEMP 18_°C  Ref #11/ADIT_°C  
2015-08-11 00:04:59.999         NaN              NaN  
2015-08-11 00:09:59.999         NaN              NaN  
2015-08-11 00:14:59.999         NaN              NaN  
2015-08-11 00:19:59.999         NaN              NaN  
2015-08-11 00:24:59.999         NaN              NaN  
2015-08-11 00:29:59.999         NaN              NaN  
2015-08-11 00:34:59.999         NaN              NaN  
2015-08-11 00:39:59.999         NaN              NaN  
2015-08-11 00:44:59.999         NaN              NaN  
2015-08-11 00:49:59.999         NaN              NaN  
2015-08-11 00:54:59.999         NaN              NaN  
2015-08-11 00:59:59.999         NaN              NaN  
2015-08-11 01:04:59.999           0        20.809154  
2015-08-11 01:09:59.999           0        27.623193  
2015-08-11 01:14:59.999           0         4.088537  

@bertrandhaut
Copy link
Contributor Author

For information the value incorrectly exported to Excel is TEMP 11 (B14)_°C at time 11/08/2015 01:05:00.

In the .csv file and as printed by ".head()" the value is 10.4107588318426.
In the .xlsx file the value is 20.8091542146332

@dsm054
Copy link
Contributor

dsm054 commented Sep 3, 2015

This might be the old excel-duplicate-column-name problem. The problem goes away if I rename the frame so that no columns are duplicated, and it looks like the errors come in after the first duplicate (Ref #11/ADIT_°C).

@bertrandhaut
Copy link
Contributor Author

I've just tried to modify the name of the last column (the second Ref #11/ADIT_°C ) and indeed it solved the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants