Skip to content

Azure kusto plugin: out_azure_kusto concatenates entire chunks, when buffering is enabled #11189

@yaananth

Description

@yaananth

Summary

  • out_azure_kusto converts every record in a chunk into one giant flb_sds_t via flb_msgpack_raw_to_json_sds() before the data is gzipped/uploaded, so heap usage grows with the largest chunk ever seen even when filesystem buffering is enabled.
  • A streaming JSON formatter that emits one record at a time (bounded by Upload_File_Size) would keep heap usage proportional to the configured upload limit instead of the full chunk size.

Root cause

  • plugins/out_azure_kusto/azure_kusto.c concatenates every msgpack record in a chunk into a single flb_sds_t (out_buf) via flb_msgpack_raw_to_json_sds().
  • The filesystem buffering path writes that giant string to disk but still keeps it in heap memory; glibc keeps the arena sized for the largest allocation it has seen (per worker) until process exit.
  • The JSON blob grows to the largest chunk (often hundreds of MiB) before the upload path truncates it, so every flush leaves a large "scar" in the allocator.

Proposed fix

  1. Introduce a streaming formatter (e.g., flb_azure_kusto_format_emit()) that iterates msgpack records, converts each to JSON once, and passes the bytes to a caller-supplied sink. That sink can be a file descriptor or a gzip writer, so the heap never holds more than one record plus Upload_File_Size.
  2. Update the buffered path to stream records directly into the local filesystem buffer, keeping the in-memory footprint bounded by Upload_File_Size + gzip overhead.
  3. Keep the legacy concatenation path only for the immediate-upload branch (where a contiguous blob is still required)

There is sample code at master...yaananth:fluent-bit:master with test cases, but it's probably not ready for production use and needs more testing, would appreciate if someone can dig in and take this forward

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions