-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Description
Summary
out_azure_kustoconverts every record in a chunk into one giantflb_sds_tviaflb_msgpack_raw_to_json_sds()before the data is gzipped/uploaded, so heap usage grows with the largest chunk ever seen even when filesystem buffering is enabled.- A streaming JSON formatter that emits one record at a time (bounded by
Upload_File_Size) would keep heap usage proportional to the configured upload limit instead of the full chunk size.
Root cause
plugins/out_azure_kusto/azure_kusto.cconcatenates every msgpack record in a chunk into a singleflb_sds_t(out_buf) viaflb_msgpack_raw_to_json_sds().- The filesystem buffering path writes that giant string to disk but still keeps it in heap memory; glibc keeps the arena sized for the largest allocation it has seen (per worker) until process exit.
- The JSON blob grows to the largest chunk (often hundreds of MiB) before the upload path truncates it, so every flush leaves a large "scar" in the allocator.
Proposed fix
- Introduce a streaming formatter (e.g.,
flb_azure_kusto_format_emit()) that iterates msgpack records, converts each to JSON once, and passes the bytes to a caller-supplied sink. That sink can be a file descriptor or a gzip writer, so the heap never holds more than one record plusUpload_File_Size. - Update the buffered path to stream records directly into the local filesystem buffer, keeping the in-memory footprint bounded by
Upload_File_Size+ gzip overhead. - Keep the legacy concatenation path only for the immediate-upload branch (where a contiguous blob is still required)
There is sample code at master...yaananth:fluent-bit:master with test cases, but it's probably not ready for production use and needs more testing, would appreciate if someone can dig in and take this forward
Thank you!
Metadata
Metadata
Assignees
Labels
No labels