Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HDFS to scan_parquet #16064

Open
barak1412 opened this issue May 5, 2024 · 2 comments
Open

Add support for HDFS to scan_parquet #16064

barak1412 opened this issue May 5, 2024 · 2 comments
Labels
A-io-parquet Area: reading/writing Parquet files enhancement New feature or an improvement of an existing feature

Comments

@barak1412
Copy link

barak1412 commented May 5, 2024

Description

As described in the title, HDFS support for the scan_parquet function will be welcomed.

The alternative, scan_pyarrow_dataset is not enough since it doesn't support streaming.

Any fsspec fallback is an option?

Thanks in advance.

@barak1412 barak1412 added the enhancement New feature or an improvement of an existing feature label May 5, 2024
@stinodego stinodego added the A-io-parquet Area: reading/writing Parquet files label May 5, 2024
@bruriah1999
Copy link

Description

As described in the title, HDFS support for the scan_parquet function will be welcomed.

The aleternative, scan_pyarrow_dataset is not enough since it doesn't support streaming.

Any fsspec fallback is an option?

Thanks in advance.

+1

@ion-elgreco
Copy link
Contributor

Might be possible with: https://github.com/Kimahriman/hdfs-native

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-parquet Area: reading/writing Parquet files enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

4 participants