You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importcsvimportpandasaspddata_dict= {'visitor_id': [123, 456],
'name': ['John Doe', 'Jane Doe']}
data_df=pd.DataFrame(data_dict)
data_df['visitor_id'] =data_df['visitor_id'].astype(str)
# Original dtype for visitor_id is objectprint(data_df.dtypes)
# dtype for visitor_id is int64 when data file is read back using to_csv()data_df.to_csv('./data_file.csv', index=False)
read_data_df=pd.read_csv('./data_file.csv')
print(read_data_df.dtypes)
# dtype for visitor_id is float64 when data file is read back using to_csv() # with csv.QUOTE_NONNUMERICdata_df.to_csv('./data_file.csv', index=False, quoting=csv.QUOTE_NONNUMERIC)
read_data_df=pd.read_csv('./data_file.csv', quoting=csv.QUOTE_NONNUMERIC)
print(read_data_df.dtypes)
Problem description
When a Pandas dataframe has a column with integer values but whose dtype is actually object, using to_csv() and then reading back the csv file using read_csv() does not preserve the column's dtype.
I have not seen a similar issue posted before (may have missed in my search). I am on Pandas 0.23 and upgrading to pandas 0.25 does not solve the issue.
Expected Output
The dtype for a column should be preserved for columns which have integer values but with dtype object.
Output of pd.show_versions()
[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS
You can not preserve dtypes with a csv. This isn’t a bug with pandas, but a limitation of using csvs. Try using parquet or hdf5 if you want dtypes preserved.
@rileyschack is correct. CSV is in general not a good format if you want exact type preservation in roundtrips.
There seems to be something strange with the handling of csv.QUOTE_NONNUMERIC giving float vs int. If you want, you can open a specific issue about that. But going to close this one as this type preservation is not something we can guarantee in general with csv.
Code Sample, a copy-pastable example if possible
Problem description
When a Pandas dataframe has a column with integer values but whose dtype is actually
object
, usingto_csv()
and then reading back the csv file usingread_csv()
does not preserve the column's dtype.I have not seen a similar issue posted before (may have missed in my search). I am on Pandas 0.23 and upgrading to pandas 0.25 does not solve the issue.
Expected Output
The dtype for a column should be preserved for columns which have integer values but with dtype
object
.Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_US.UTF-8
pandas: 0.24.1
pytest: 4.3.1
pip: 19.0.1
setuptools: 40.7.3
Cython: 0.28.2
numpy: 1.15.4
scipy: 1.2.0
pyarrow: 0.11.1
xarray: None
IPython: 7.2.0
sphinx: 1.7.4
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 3.0.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.4
lxml.etree: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.18
pymysql: 0.9.3
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: 0.2.1
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: