Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write partitioned Parquet file using to_parquet #23283

Closed
ispmarin opened this issue Oct 22, 2018 · 4 comments

Comments

Projects
None yet
5 participants
@ispmarin
Copy link

commented Oct 22, 2018

Hi,

I'm trying to write a partitioned Parquet file using the to_parquet function:

df.to_parquet('table_name', engine='pyarrow', partition_cols = ['partone', 'parttwo'])
TypeError: __cinit__() got an unexpected keyword argument 'partition_cols'

Problem description

It was my understanding that the to_parquet method pass the kwargs to Pyarrow and save a partitioned table.

Expected Output

Partitioned Parquet file saved.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-5-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 32.3.1
Cython: None
numpy: 1.15.2
scipy: 1.1.0
pyarrow: 0.11.0
xarray: None
IPython: 7.0.1
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Thanks!

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Oct 22, 2018

pandas uses pyarrow.parquet.write_table. It seems like multi-part Datasets are written using pyarrow.parquet.write_to_dataset.

I'm not sure whether it makes sense for us to (optionally) use write_to_dataset, or whether pyarrow should support partition_cols in write_table.

cc @wesm if you have thoughts here.

@xhochy

This comment has been minimized.

Copy link
Contributor

commented Oct 22, 2018

In the case of partition_cols, one should use write_to_dataset. write_table is much more simple/low level function.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Oct 22, 2018

So, pandas could look for kwargs like partition_cols (any others?) and if that's detected use write_to_dataset(table, ...). That seems fine to me.

@anjsudh

This comment has been minimized.

Copy link
Contributor

commented Oct 24, 2018

Will pick this up

@anjsudh anjsudh referenced this issue Oct 24, 2018

Merged

Support for partition_cols in to_parquet #23321

4 of 4 tasks complete

anjsudh added a commit to anjsudh/pandas that referenced this issue Oct 25, 2018

anjsudh added a commit to anjsudh/pandas that referenced this issue Oct 26, 2018

anjsudh added a commit to anjsudh/pandas that referenced this issue Oct 26, 2018

anjsudh added a commit to anjsudh/pandas that referenced this issue Oct 27, 2018

anjsudh added a commit to anjsudh/pandas that referenced this issue Oct 27, 2018

anjsudh added a commit to anjsudh/pandas that referenced this issue Oct 27, 2018

anjsudh added a commit to anjsudh/pandas that referenced this issue Oct 27, 2018

@jreback jreback added this to the 0.24.0 milestone Oct 28, 2018

JustinZhengBC added a commit to JustinZhengBC/pandas that referenced this issue Nov 14, 2018

brute4s99 added a commit to brute4s99/pandas that referenced this issue Nov 19, 2018

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.