Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot output csv with IntervalIndex #28210

Closed
AlJohri opened this issue Aug 29, 2019 · 4 comments · Fixed by #28229

Comments

@AlJohri
Copy link

@AlJohri AlJohri commented Aug 29, 2019

Code Sample, a copy-pastable example if possible

Using pd.interval_range:

pd.DataFrame({'a': [1, 2, 3]}, index=pd.interval_range(0, 1, periods=3)).to_csv('hello.csv')

Using pd.IntervalIndex.from_arrays:

pd.DataFrame({'a': [1, 2, 3]}, index=pd.IntervalIndex.from_arrays(np.array([0, 1, 2]), np.array([1, 2, 3]))).to_csv('hello.csv')

Problem description

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-6f913f6c0211> in <module>
----> 1 pd.DataFrame({'a': [1, 2, 3]}, index=pd.IntervalIndex.from_arrays(np.array([0, 1, 2]), np.array([1, 2, 3]))).to_csv('hello.csv')

~/Development/propensity-to-subscribe-modeling/.venv/lib/python3.7/site-packages/pandas/core/generic.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, decimal)
   3226             decimal=decimal,
   3227         )
-> 3228         formatter.save()
   3229 
   3230         if path_or_buf is None:

~/Development/propensity-to-subscribe-modeling/.venv/lib/python3.7/site-packages/pandas/io/formats/csvs.py in save(self)
    200                 self.writer = UnicodeWriter(f, **writer_kwargs)
    201 
--> 202             self._save()
    203 
    204         finally:

~/Development/propensity-to-subscribe-modeling/.venv/lib/python3.7/site-packages/pandas/io/formats/csvs.py in _save(self)
    322                 break
    323 
--> 324             self._save_chunk(start_i, end_i)
    325 
    326     def _save_chunk(self, start_i, end_i):

~/Development/propensity-to-subscribe-modeling/.venv/lib/python3.7/site-packages/pandas/io/formats/csvs.py in _save_chunk(self, start_i, end_i)
    354         )
    355 
--> 356         libwriters.write_csv_rows(self.data, ix, self.nlevels, self.cols, self.writer)

TypeError: Argument 'data_index' has incorrect type (expected numpy.ndarray, got list)

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.4.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 18.7.0
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 0.25.1
numpy            : 1.17.0
pytz             : 2019.2
dateutil         : 2.8.0
pip              : 19.2.2
setuptools       : 41.1.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.10.1
IPython          : 7.7.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.1.1
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 0.14.1
pytables         : None
s3fs             : None
scipy            : 1.3.1
sqlalchemy       : None
tables           : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
@jschendel

This comment has been minimized.

Copy link
Member

@jschendel jschendel commented Aug 29, 2019

Thanks, for the report. I can confirm this behavior on master:

In [2]: df = pd.DataFrame({'a': [1, 2, 3]}, index=pd.interval_range(0, 3))

In [3]: df.to_csv()
---------------------------------------------------------------------------
TypeError: Argument 'data_index' has incorrect type (expected numpy.ndarray, got list)

A temporary workaround is to cast the index to string prior to writing the output:

In [4]: df.index = df.index.astype(str)

In [5]: df.to_csv()
Out[5]: ',a\n"(0, 1]",1\n"(1, 2]",2\n"(2, 3]",3\n'
@MarcoGorelli

This comment has been minimized.

Copy link
Contributor

@MarcoGorelli MarcoGorelli commented Aug 29, 2019

In the example

import pandas

df = pandas.DataFrame({'a': [1, 2, 3]}, index=pandas.interval_range(0, 3))

is the expected output

',a\n"(0, 1]",1\n"(1, 2]",2\n"(2, 3]",3\n'

or is that just a temporary workaround to avoid an error?

@AlJohri

This comment has been minimized.

Copy link
Author

@AlJohri AlJohri commented Aug 29, 2019

I think the string representation would be fine as the expected output. If a user wants a non-string output they can break the IntervalIndex out into index.left and index.right.

@jschendel

This comment has been minimized.

Copy link
Member

@jschendel jschendel commented Aug 29, 2019

Yes, that should be the expected output, or at the very least is consistent with previous versions where this was working:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.23.4'

In [2]: df = pd.DataFrame({'a': [1, 2, 3]}, index=pd.interval_range(0, 3))

In [3]: df.to_csv()
Out[3]: ',a\n"(0, 1]",1\n"(1, 2]",2\n"(2, 3]",3\n

Note that my example produces string output because I didn't pass anything for path_or_buf in to_csv, so the behavior in that case is to display a string of what would be written to a file. Maybe not common knowledge but useful for debugging and the like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.