Bytewax pods consume lots of resource #3825

sudohainguyen · 2023-11-04T04:47:34Z

Expected Behavior

I conducted benchmark on a feature table with 50m rows x 10 cols, and expect I can efficiently materialize records into online store. In Bytewax mechanism, latest records are extracted to staging location as parquet files, in my case each file contains ~140k rows.
In the efficient way, bytewax pods should process the file with as less memory footprint as possible

Current Behavior

Currently every bytewax pods pull the entire parquet file into memory before writing to online store, which causes huge memory footprint, ~3GB of memory.

Steps to reproduce

Conduct materialization with bytewax engine

Specifications

Version: master

Possible Solution

Apply zero-copy mechanism from pyarrow to stream the parquet files and process on-the-fly before pushing to online store

sudohainguyen added kind/bug priority/p2 labels Nov 4, 2023

sudohainguyen mentioned this issue Nov 4, 2023

feat: Optimize bytewax pod resource with zero-copy #3826

Merged

achals closed this as completed in #3826 Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bytewax pods consume lots of resource #3825

Bytewax pods consume lots of resource #3825

sudohainguyen commented Nov 4, 2023 •

edited

Loading

Bytewax pods consume lots of resource #3825

Bytewax pods consume lots of resource #3825

Comments

sudohainguyen commented Nov 4, 2023 • edited Loading

Expected Behavior

Current Behavior

Steps to reproduce

Specifications

Possible Solution

sudohainguyen commented Nov 4, 2023 •

edited

Loading