-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for IPC Streaming Read/Write #3783
Add support for IPC Streaming Read/Write #3783
Conversation
Thanks for the PR.
Yes, could you do that? And would you then feature gate this under |
// predicates in predicate pushdown predicates were always combined with: lit(true) && predicate now, we simply insert the predicate. // joins joins branch the query executor, the schemas therefore need to be split by join branch // python activate timezone feature
That makes sense, I've added it. Pretty straight forward process. |
I've moved it to the We just need to bump arrow2 to support this -- #3793 Sorry as well for the multiple commits, wasn't 100% sure how to do it for feature flags (I'm new to Rust, so apologies as I'm still learning). |
No worries, I have got a squash button! :) |
Yeah, was just about some people review commits or get notifications on pushes. As this is my first PR to this repository, and I'm new to Rust, please give any feedback you find helpful 🙌 |
Codecov Report
@@ Coverage Diff @@
## master #3783 +/- ##
==========================================
- Coverage 78.12% 77.96% -0.16%
==========================================
Files 445 446 +1
Lines 73691 73838 +147
==========================================
+ Hits 57569 57570 +1
- Misses 16122 16268 +146
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I have a few comments, nothing big.
I'm running this through my streaming files from SF and will let you know if I have any dramas with them. |
I have ran this across lots of Snowflake files using their sample dataset + some test ones from myself and it seems to be working well. The two main changes I've made are:
let f = File::open("file.arrow-stream").unwrap();
let mut reader = BufReader::new(f);
let mut stream_reader = IpcStreamReader::new(reader);
let mut schema = stream_reader.arrow_schema().unwrap();
let mut df = stream_reader.finish().unwrap(); // can't access schema again.
|
This is waiting for jorgecarleitao/arrow2#1095 to be reviewed and merged.
I've pretty much copied IPC, and made changes to use StreamWriter instead. I also think it might be worth moving IPC into its own directory, similar to parquet/csv_core/ndjson_core?
Closes #3778