-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Precasting of series before final dataframe is created #3805
Comments
Why not casting after the reading? There is no difference in the amount of compute. |
curious, is that metadata in a Rust's Struct(e.g. somewhere from the file's metadata), or in a Arrow' |
Sorry, I meant in an Arrow DataType You can find base64 encoded examples here: |
It should be equally fast - do note that that has no equivalent to Arrow's native types. Arrow logical timestamp have a single offset/tz stored in the |
Is there a recommended way to do so? The reason I'm asking is because for https://github.com/elixir-nx/explorer we want to remap arbitrary types the user passes in, which we could do via cast (which works great). The problem with Snowflake IPC Streams is like you mentioned above, where it becomes a lot trickier, so my idea is to:
4c. Create a new datetime using chrono Of course, any other types that need custom casting we do that in step 4. My testing shows this is very fast (thanks to the trifecta of arrow2, polars, rust 👍 ) Snowflake implementation. Would it be better to use apply, or is that advised against for changing the type? We'd also be interested in passing in custom mapping somehow from Elixir as well, but that needs much more thought. Sorry for my questions, I'd be happy to not make this an issue and instead a discussion board somewhere. 🙇 |
Describe your feature request
With Snowflake, we receive Arrow Streaming IPC files, which we can parse.
However, they send us timestamp data in a Struct, which we have to process. This struct contains two fields, an epoch in i64 seconds, and a fraction (nanoseconds).
We also receive other parts in kind of a weird format (int32 etc), so we'll need to do it for a couple of other types as well.
I think it's because they adopted Arrow early and before certain columns were finished in Arrow? (Just guessing)
I would like to have something like this:
I'm not sure what the best way to pass the casting functions would be, and I know this might add a lot of complexity so would like ideas around this.
The text was updated successfully, but these errors were encountered: