to_csv() and read_csv() do not preserve dtype for a column with integer values but whose original dtype is `object`. #27749

svadali1 · 2019-08-05T03:49:19Z

Code Sample, a copy-pastable example if possible

import csv

import pandas as pd

data_dict = {'visitor_id': [123, 456],
             'name': ['John Doe', 'Jane Doe']}
data_df = pd.DataFrame(data_dict)
data_df['visitor_id'] = data_df['visitor_id'].astype(str)
# Original dtype for visitor_id is object
print(data_df.dtypes)

# dtype for visitor_id is int64 when data file is read back using to_csv()
data_df.to_csv('./data_file.csv', index=False)
read_data_df = pd.read_csv('./data_file.csv')
print(read_data_df.dtypes)

# dtype for visitor_id is float64 when data file is read back using to_csv() 
# with csv.QUOTE_NONNUMERIC
data_df.to_csv('./data_file.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)
read_data_df = pd.read_csv('./data_file.csv', quoting=csv.QUOTE_NONNUMERIC)
print(read_data_df.dtypes)

Problem description

When a Pandas dataframe has a column with integer values but whose dtype is actually object, using to_csv() and then reading back the csv file using read_csv() does not preserve the column's dtype.

I have not seen a similar issue posted before (may have missed in my search). I am on Pandas 0.23 and upgrading to pandas 0.25 does not solve the issue.

Expected Output

The dtype for a column should be preserved for columns which have integer values but with dtype object.

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_US.UTF-8
pandas: 0.24.1
pytest: 4.3.1
pip: 19.0.1
setuptools: 40.7.3
Cython: 0.28.2
numpy: 1.15.4
scipy: 1.2.0
pyarrow: 0.11.1
xarray: None
IPython: 7.2.0
sphinx: 1.7.4
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.4
lxml.etree: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.18
pymysql: 0.9.3
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: 0.2.1
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

rileyschack · 2019-08-08T03:12:34Z

You can not preserve dtypes with a csv. This isn’t a bug with pandas, but a limitation of using csvs. Try using parquet or hdf5 if you want dtypes preserved.

jorisvandenbossche · 2019-08-08T11:54:05Z

@rileyschack is correct. CSV is in general not a good format if you want exact type preservation in roundtrips.

There seems to be something strange with the handling of csv.QUOTE_NONNUMERIC giving float vs int. If you want, you can open a specific issue about that. But going to close this one as this type preservation is not something we can guarantee in general with csv.

jorisvandenbossche added the Usage Question label Aug 8, 2019

jorisvandenbossche added this to the No action milestone Aug 8, 2019

jorisvandenbossche added the IO Data IO issues that don't fit into a more specific label label Aug 8, 2019

jorisvandenbossche closed this as completed Aug 8, 2019

ghost mentioned this issue Jun 16, 2020

Store/read dtypes together with CSV (FileDataSink/Source) schuderer/mllaunchpad#112

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_csv() and read_csv() do not preserve dtype for a column with integer values but whose original dtype is `object`. #27749

to_csv() and read_csv() do not preserve dtype for a column with integer values but whose original dtype is `object`. #27749

svadali1 commented Aug 5, 2019 •

edited

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

rileyschack commented Aug 8, 2019

jorisvandenbossche commented Aug 8, 2019

to_csv() and read_csv() do not preserve dtype for a column with integer values but whose original dtype is object. #27749

to_csv() and read_csv() do not preserve dtype for a column with integer values but whose original dtype is object. #27749

Comments

svadali1 commented Aug 5, 2019 • edited

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

rileyschack commented Aug 8, 2019

jorisvandenbossche commented Aug 8, 2019

to_csv() and read_csv() do not preserve dtype for a column with integer values but whose original dtype is `object`. #27749

to_csv() and read_csv() do not preserve dtype for a column with integer values but whose original dtype is `object`. #27749

svadali1 commented Aug 5, 2019 •

edited

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS