You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For a while, before switching to arrow2/parquet2, (i.e. up until this commit) I was using the arrow and parquet crates from https://github.com/apache/arrow-rs. I repeatedly had an issue with some files, where the Parquet file would be readable in Rust, and then the generated Arrow IPC data wouldn't be readable in JS. This caused a ton of frustration, and switching to Arrow2/Parquet2 seemed to solve it, but I didn't know why.
With more debugging, (crucial was logging the vector in Rust right before returning and the Uint8Array from JS), I realized that the data wasn't successfully being transferred back to JS correctly! E.g. when testing at this commit with the test file 1-partition-snappy.parquet, the arrays on the JS and Rust sides had the same length, but changed data.
It appears the entire issue was the reliance on unsafe { Uint8Array::view(&file) }. When I instead create a new Uint8Array and copy the file into the newly created Uint8Array, the array in JS and in Rust matches, and the file is read successfully by Arrow JS.
Views into WebAssembly memory are only valid so long as the backing buffer isn’t resized in JS. Once this function is called any future calls to Box::new (or malloc of any form) may cause the returned value here to be invalidated. Use with caution!
Additionally the returned object can be safely mutated but the input slice isn’t guaranteed to be mutable.
Finally, the returned object is disconnected from the input slice’s lifetime, so there’s no guarantee that the data is read at the right time.
To be honest, I'm not entirely sure where I was violating these principles (or maybe it was some internals from the arrow FileWriter). So makes sense (at least for now) to remove the unsafe code and create a new Uint8Array buffer to solve this 🙂 .
Note that creating another Uint8Array buffer would put more memory pressure on WebAssembly, which seems to run out of memory after using 1GB, but that's a problem for the future (ideally we'll be able to return a stream of record batches to JS).
The text was updated successfully, but these errors were encountered:
Going to close this now because I exclusively copy the Vec<u8> into a new Uint8Array for returning to the client, and don't return views, at least for now.
For a while, before switching to
arrow2/parquet2
, (i.e. up until this commit) I was using thearrow
andparquet
crates from https://github.com/apache/arrow-rs. I repeatedly had an issue with some files, where the Parquet file would be readable in Rust, and then the generated Arrow IPC data wouldn't be readable in JS. This caused a ton of frustration, and switching to Arrow2/Parquet2 seemed to solve it, but I didn't know why.With more debugging, (crucial was logging the vector in Rust right before returning and the
Uint8Array
from JS), I realized that the data wasn't successfully being transferred back to JS correctly! E.g. when testing at this commit with the test file1-partition-snappy.parquet
, the arrays on the JS and Rust sides had the same length, but changed data.It appears the entire issue was the reliance on
unsafe { Uint8Array::view(&file) }
. When I instead create a newUint8Array
and copy thefile
into the newly createdUint8Array
, the array in JS and in Rust matches, and the file is read successfully by Arrow JS.From the
wasm-bindgen
docsTo be honest, I'm not entirely sure where I was violating these principles (or maybe it was some internals from the arrow
FileWriter
). So makes sense (at least for now) to remove theunsafe
code and create a newUint8Array
buffer to solve this 🙂 .Note that creating another
Uint8Array
buffer would put more memory pressure on WebAssembly, which seems to run out of memory after using 1GB, but that's a problem for the future (ideally we'll be able to return a stream of record batches to JS).The text was updated successfully, but these errors were encountered: