You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I conducted benchmark on a feature table with 50m rows x 10 cols, and expect I can efficiently materialize records into online store. In Bytewax mechanism, latest records are extracted to staging location as parquet files, in my case each file contains ~140k rows.
In the efficient way, bytewax pods should process the file with as less memory footprint as possible
Current Behavior
Currently every bytewax pods pull the entire parquet file into memory before writing to online store, which causes huge memory footprint, ~3GB of memory.
Steps to reproduce
Conduct materialization with bytewax engine
Specifications
Version: master
Possible Solution
Apply zero-copy mechanism from pyarrow to stream the parquet files and process on-the-fly before pushing to online store
The text was updated successfully, but these errors were encountered:
Expected Behavior
I conducted benchmark on a feature table with 50m rows x 10 cols, and expect I can efficiently materialize records into online store. In Bytewax mechanism, latest records are extracted to staging location as parquet files, in my case each file contains ~140k rows.
In the efficient way, bytewax pods should process the file with as less memory footprint as possible
Current Behavior
Currently every bytewax pods pull the entire parquet file into memory before writing to online store, which causes huge memory footprint, ~3GB of memory.
Steps to reproduce
Conduct materialization with bytewax engine
Specifications
Possible Solution
Apply zero-copy mechanism from pyarrow to stream the parquet files and process on-the-fly before pushing to online store
The text was updated successfully, but these errors were encountered: