-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Description
Hi @ritchie46 ,
I was in discussion with @Qqwy about the following:
Within rust we have now the *_cloud methods for sinking to a cloud service via object_store.
Is it an idea to generalize every call to the read/scan/write methods to make use of object_store instead of relying (on the python side) on fsspec?
My idea is this:
- normal methods: accept str | Path | BytesIO (writable bytestream for write, readable for sink/scan)
- cloud methods: accept str | Path (like S3Path) | BytesIO (writable bytestream for write, readable for sink/scan) + cloud options
Within the Python API I would combine these 2 with a default parameter for the cloud options that could optionally be passed in for cloud paths. Sadly the same is not possible in Rust.
My only issue was with globs: how to handle these, but it seems (Qqwy checked it) that these are handled by object_store as well. So we're fine there.
Second what Qqwy brought up was that there are methods like parse_url that parses a given uri tries to figure out what you're trying to do. [And relative paths are not supported].
==> For the parse_url & glob logic he concluded that it's already present in polars/object_store.
My ultimate wish for the "file interface api" would be that there is struct that accepts a list of writable/readable streams and does whatever it needs doing for the specific filetype.
If that's done, then any kind of file format could be easily implemented as the ability to read/write from a stream. All the other code would be boilerplate.
My question for this ticket is: what is your view on the API?
(and secondary: is this überhaupt possible in rust?)
Thanks
Metadata
Metadata
Assignees
Labels
Type
Projects
Status