New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Further optimization for `read_parquet_glob` and `read_pickle_distributed` that create 1 partition from 1 file #6857

Open

anmyachev opened this issue Jan 12, 2024 · 0 comments

Labels

P2 pandas.io Performance 🚀

Collaborator

anmyachev commented Jan 12, 2024

As an example, one can use the more complex implementation of read_csv_glob function, which can create several partitions from 1 file if necessary.

However, we need to keep in mind the complexity of support, especially since these are additional public functions that are not available in pandas.

The text was updated successfully, but these errors were encountered:

anmyachev added Performance 🚀 P2 labels

anmyachev mentioned this issue

FEAT-#6831: Implement read_parquet_glob and to_parquet_glob #6854

Merged

7 tasks

anmyachev added the pandas.io label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment