Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update async API #510

Merged
merged 5 commits into from
Apr 19, 2024
Merged

Update async API #510

merged 5 commits into from
Apr 19, 2024

Conversation

kylebarron
Copy link
Owner

@kylebarron kylebarron commented Apr 19, 2024

Change list

  • Rename AsyncParquetFile to ParquetFile
  • Combine object-store reader and local file reader into single ParquetFile API.
  • Remove SharedIO trait, since now we only have a single struct.
  • Remove with_batch_size and select_columns. I don't see a need for the ParquetFile struct itself to maintain any reader state. That information is only used in the read phase, not the constructor phase, and so I think it's fine to pass those options into read or stream.
  • Remove readRowGroup in favor of read. read now has options including a rowGroups parameter, that takes a list of integers.
  • Define ReaderOptions struct for read options, used by both sync and async readers.

I couldn't get the row group selection to work in stream() here
https://github.com/kylebarron/parquet-wasm/pull/510/files#diff-e1a77beecd2634c6c0489c20cc3cae036ed6668d62c4d47f00760ab60b0d404eR188-R192

I was hitting lifetime errors with having a Vec<usize> there that wouldn't live long enough for the stream.

@H-Plus-Time I'd like to get a release out in the next day or two, because otherwise I'll forget about it again and it'll never get released. I already did a lot of other cleanup, so I think it's just this and a little more README updates and then I'm ready to publish 0.6. I don't want to spend a lot more time on this. But I wanted to give you a heads up in case you wanted to make any more edits before the release!

Copy link

github-actions bot commented Apr 19, 2024

Asset Sizes

Asset Uncompressed Size Compressed Size
async_full/parquet_wasm_bg.wasm 5.44MB $\color{green}\textbf{-18.6KB -0\%}$ 1.27MB $\color{green}\textbf{-5.32KB -0\%}$
slim/parquet_wasm_bg.wasm 3.46MB $\color{red}\textbf{+1.57KB +0\%}$ 548KB $\color{red}\textbf{+565B +0\%}$
sync/parquet_wasm_bg.wasm 4.74MB $\color{red}\textbf{+1.57KB +0\%}$ 1.04MB $\color{green}\textbf{-79B -0\%}$

@H-Plus-Time
Copy link
Contributor

Yep, I'll give the read_stream row groups bit a crack. I've seen some quirks relating to the package exports too, will check a few import scenarios too.

src/reader_async.rs Outdated Show resolved Hide resolved
src/reader_async.rs Outdated Show resolved Hide resolved
@kylebarron kylebarron merged commit 10dada0 into main Apr 19, 2024
7 checks passed
@kylebarron kylebarron deleted the kyle/update-streams branch April 19, 2024 13:56
@kylebarron
Copy link
Owner Author

I've seen some quirks relating to the package exports too, will check a few import scenarios too.

Would love bug reports if you have any!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants