Skip to content

BUG: Writing Interval columns to Parquet fails #34643

@Hoeze

Description

@Hoeze
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.


Code Sample, a copy-pastable example

import pandas as pd
df = pd.DataFrame({"x": [pd.Interval(0, 1), ]})
df.to_parquet("test.parquet")

Problem description

Saving fails with:

---------------------------------------------------------------------------
ArrowException                            Traceback (most recent call last)
<ipython-input-2-f61461018f44> in <module>
      1 import pandas as pd
      2 df = pd.DataFrame({"x": [pd.Interval(0, 1), ]})
----> 3 df.to_parquet("~/test2.parquet")

/opt/modules/anaconda/envs/env3/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    212                 else:
    213                     kwargs[new_arg_name] = new_arg_value
--> 214             return func(*args, **kwargs)
    215 
    216         return cast(F, wrapper)

/opt/modules/anaconda/envs/env3/lib/python3.7/site-packages/pandas/core/frame.py in to_parquet(self, path, engine, compression, index, partition_cols, **kwargs)
   2114             index=index,
   2115             partition_cols=partition_cols,
-> 2116             **kwargs,
   2117         )
   2118 

/opt/modules/anaconda/envs/env3/lib/python3.7/site-packages/pandas/io/parquet.py in to_parquet(df, path, engine, compression, index, partition_cols, **kwargs)
    267         index=index,
    268         partition_cols=partition_cols,
--> 269         **kwargs,
    270     )
    271 

/opt/modules/anaconda/envs/env3/lib/python3.7/site-packages/pandas/io/parquet.py in write(self, df, path, compression, coerce_timestamps, index, partition_cols, **kwargs)
    122                 compression=compression,
    123                 coerce_timestamps=coerce_timestamps,
--> 124                 **kwargs,
    125             )
    126         if should_close:

/opt/modules/anaconda/envs/env3/lib/python3.7/site-packages/pyarrow/parquet.py in write_table(table, where, row_group_size, version, use_dictionary, compression, write_statistics, use_deprecated_int96_timestamps, coerce_timestamps, allow_truncated_timestamps, data_page_size, flavor, filesystem, compression_level, use_byte_stream_split, data_page_version, **kwargs)
   1620                 data_page_version=data_page_version,
   1621                 **kwargs) as writer:
-> 1622             writer.write_table(table, row_group_size=row_group_size)
   1623     except Exception:
   1624         if _is_path_like(where):

/opt/modules/anaconda/envs/env3/lib/python3.7/site-packages/pyarrow/parquet.py in write_table(self, table, row_group_size)
    590             raise ValueError(msg)
    591 
--> 592         self.writer.write_table(table, row_group_size=row_group_size)
    593 
    594     def close(self):

/opt/modules/anaconda/envs/env3/lib/python3.7/site-packages/pyarrow/_parquet.pyx in pyarrow._parquet.ParquetWriter.write_table()

/opt/modules/anaconda/envs/env3/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowException: Unknown error: data type leaf_count != builder_leaf_count1 2

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.6.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.3.11-1.el7.elrepo.x86_64
machine          : x86_64
processor        : 
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.0.4
numpy            : 1.17.5
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 20.0.2
setuptools       : 45.1.0.post20200119
Cython           : 0.29.14
pytest           : 5.3.5
hypothesis       : None
sphinx           : 2.0.1
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.1
IPython          : 7.12.0
pandas_datareader: None
bs4              : 4.9.0
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.1.3
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.3
pandas_gbq       : None
pyarrow          : 0.17.1
pytables         : None
pytest           : 5.3.5
pyxlsb           : None
s3fs             : None
scipy            : 1.4.1
sqlalchemy       : 1.3.13
tables           : 3.6.1
tabulate         : 0.8.6
xarray           : 0.15.0
xlrd             : 1.2.0
xlwt             : None
xlsxwriter       : None
numba            : 0.48.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions