Skip to content

removing fsspec in python in favour of object_store in rust #11056

@svaningelgem

Description

@svaningelgem

Description

Hi @ritchie46 ,

I was in discussion with @Qqwy about the following:
Within rust we have now the *_cloud methods for sinking to a cloud service via object_store.
Is it an idea to generalize every call to the read/scan/write methods to make use of object_store instead of relying (on the python side) on fsspec?

My idea is this:

  • normal methods: accept str | Path | BytesIO (writable bytestream for write, readable for sink/scan)
  • cloud methods: accept str | Path (like S3Path) | BytesIO (writable bytestream for write, readable for sink/scan) + cloud options

Within the Python API I would combine these 2 with a default parameter for the cloud options that could optionally be passed in for cloud paths. Sadly the same is not possible in Rust.

My only issue was with globs: how to handle these, but it seems (Qqwy checked it) that these are handled by object_store as well. So we're fine there.

Second what Qqwy brought up was that there are methods like parse_url that parses a given uri tries to figure out what you're trying to do. [And relative paths are not supported].
==> For the parse_url & glob logic he concluded that it's already present in polars/object_store.

My ultimate wish for the "file interface api" would be that there is struct that accepts a list of writable/readable streams and does whatever it needs doing for the specific filetype.
If that's done, then any kind of file format could be easily implemented as the ability to read/write from a stream. All the other code would be boilerplate.

My question for this ticket is: what is your view on the API?
(and secondary: is this überhaupt possible in rust?)

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    acceptedReady for implementationenhancementNew feature or an improvement of an existing feature

    Type

    No type

    Projects

    Status

    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions