Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further optimization for read_parquet_glob and read_pickle_distributed that create 1 partition from 1 file #6857

Open
anmyachev opened this issue Jan 12, 2024 · 0 comments
Labels
P2 Minor bugs or low-priority feature requests pandas.io Performance 🚀 Performance related issues and pull requests.

Comments

@anmyachev
Copy link
Collaborator

As an example, one can use the more complex implementation of read_csv_glob function, which can create several partitions from 1 file if necessary.

However, we need to keep in mind the complexity of support, especially since these are additional public functions that are not available in pandas.

@anmyachev anmyachev added Performance 🚀 Performance related issues and pull requests. P2 Minor bugs or low-priority feature requests labels Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Minor bugs or low-priority feature requests pandas.io Performance 🚀 Performance related issues and pull requests.
Projects
None yet
Development

No branches or pull requests

1 participant