Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object store wasm usage #490

Merged
merged 15 commits into from
Apr 16, 2024

Conversation

H-Plus-Time
Copy link
Contributor

NB: Starting this a little earlier for design feedback on the object-store-wasm crate itself (primarily around ehttp vs reqwest).

The usage is very straightforward (given all the plumbing is present for ParquetObjectReader), I ended up doing it as a straight swap, leaving the existing HttpFileReader code for posterity/easy switchback.

@H-Plus-Time
Copy link
Contributor Author

The main question I had was around your experiments with ehttp (as a replacement for reqwest) - did you find it was worthwhile?

I did quite like the lower impedance model (that is, the very fetch-like behaviour), though the lack of header enums was a slight annoyance (the primary reason I've left the reqwest dep in on the object-store-wasm - I'm fairly certain the compiler should just leave in the enums, string conversion logic and strip everything else).

I had hoped that ehttp would be the solution to the streaming problem (having to fake it, that is), but the mpsc channel + spawn_local trick actually ended up being the solve. It looks like it shouldn't be a problem to do the exact same thing with reqwest (I suspect I just didn't see it at the time), so it's really just a question of ergonomics (for me, and any other maintainers), how much appetite there is for porting the changes over to the OG object-store.

@H-Plus-Time
Copy link
Contributor Author

Ah, also, given it does change the deps (hopefully not significantly), I'll get the delta generation working again in a separate PR (before this one's ready).

@H-Plus-Time H-Plus-Time marked this pull request as draft April 8, 2024 00:44
@kylebarron
Copy link
Owner

I ended up doing it as a straight swap, leaving the existing HttpFileReader code for posterity/easy switchback.

🙏 I'm happy to have both in the the codebase for a while as we figure out which design makes more sense.

The main question I had was around your experiments with ehttp (as a replacement for reqwest) - did you find it was worthwhile?

I think I just got to a point where I didn't fully understand Rust-async-streams or something like that, and there was existing code using reqwest that I could look at and figure out how to make work. So I don't think I got to the point where I formed a concrete opinion over whether to use ehttp or reqwest.

I'll get the delta generation working again in a separate PR (before this one's ready).

That would be awesome 🙏

@jdoig
Copy link
Contributor

jdoig commented Apr 11, 2024

This is great! I actually sat down last night trying to use parquet-wasm to make my own wasm object-store replacement for Datafusion... It's great to see someone else is already on it 🥳

Copy link

github-actions bot commented Apr 11, 2024

Asset Sizes

Asset Uncompressed Size Compressed Size
async_full/parquet_wasm_bg.wasm 5.45MB $\color{red}\textbf{+173KB +3\%}$ 1.28MB $\color{red}\textbf{+63.2KB +5\%}$
slim/parquet_wasm_bg.wasm 3.46MB $\color{red}\textbf{+3.92KB +0\%}$ 544KB $\color{red}\textbf{+1.03KB +0\%}$
sync/parquet_wasm_bg.wasm 4.74MB $\color{green}\textbf{-328B -0\%}$ 1.04MB $\color{red}\textbf{+84B +0\%}$

@kylebarron
Copy link
Owner

How are you doing on this? Is this ready for a review?

@H-Plus-Time
Copy link
Contributor Author

Yeah, I've published the MVP version of the crate (didn't feel comfortable contributing a git-pinned dep), I think it's good to go.

@H-Plus-Time H-Plus-Time marked this pull request as ready for review April 11, 2024 23:56
Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Copy link
Owner

@kylebarron kylebarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing, thank you! Just a couple more comments

src/reader_async.rs Outdated Show resolved Hide resolved
pub async fn new(url: String) -> WasmResult<AsyncParquetFile> {
let client = Client::new();
let mut reader = HTTPFileReader::new(url.clone(), client.clone(), 1024);
pub async fn new(url: String, options: Option<Object>) -> WasmResult<AsyncParquetFile> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using Object here better than using JsValue? Is that just for TS types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting 🤔 apparently OptionFromWasmAbi is not defined for JsValue (but it is for Object) - I wondered why I did that 😮 . Yeah, both necessary and very slightly more accurate (since JsValue includes things like numbers and so on).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Previously I took in JsValue but parsed it via serde_wasm_bindgen to Option<OptionsStruct>. https://github.com/geoarrow/geoarrow-rs/blob/c296ced466b34e5a9dd487058d34910f08738833/js/src/io/parquet/async.rs#L175-L176

I'm not sure if either way is better than the other. I think in that case where we have strict object types, we should be able to define the typescript code for the input though.

src/reader_async.rs Outdated Show resolved Hide resolved
@kylebarron
Copy link
Owner

Can you merge in main? Then I think we should be good to go

@kylebarron
Copy link
Owner

Amazing, thank you!

@kylebarron kylebarron merged commit ef8ca3b into kylebarron:main Apr 16, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants