Use pyarrow native filesystems for reading from URI string

`fsspec`, which we use via `UPath`, looks suboptimal for most of our users, see https://github.com/astronomy-commons/lsdb/issues/936 `pandas` doesn't really use `fsspec` if given URI string is natively supported by `pyarrow`, which makes pandas HTTP and S3 reads much faster than ours (especially in the case of column selection).
I propose to:
1. Change to pyarrow filesystems for supported URI strings
2. Introduce small block size (e.g. 32kiB) for HTTP filesystem which is not natively supported by pyarrow, as discussed in https://github.com/astronomy-commons/lsdb/issues/936#issuecomment-3115018085
3. Add S3 and HTTPS ASV benchmarks for file read with column selection performance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use pyarrow native filesystems for reading from URI string #316

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use pyarrow native filesystems for reading from URI string #316

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions