Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: HTTP Output and Kubernetes / RecordModifier Filters #579

Closed
brandoncole opened this issue May 2, 2018 · 3 comments · Fixed by #581
Closed

Question: HTTP Output and Kubernetes / RecordModifier Filters #579

brandoncole opened this issue May 2, 2018 · 3 comments · Fixed by #581

Comments

@brandoncole
Copy link
Contributor

brandoncole commented May 2, 2018

I’m trying to use Fluent-Bit to send our log data from a Kubernetes Cluster to Loggly. I’ve used FluentD in the past but I’m really enamored with Fluent-Bit because of the simplistic components that can be pieced together in novel ways and inclusion of things like CPU, Memory and Disk utilization.

Part 1 - Roadblock with HTTP Output

https://fluentbit.io/documentation/current/output/http.html

There’s one roadblock I’m facing right now that there might be a solution for that I wanted to ask about. Loggly expects bulk uploads ( https://www.loggly.com/docs/http-bulk-endpoint/ ) of log data to be sent on separate lines, but when using the HTTP module to send JSON (vs. MSGPACK) it creates a message with a single line consisting of an array with multiple entries like:

[{"log":"message"},{"log":"message"},..]

Is there a way to easily transform this into:

{"log":"message"}\n
{"log":"message"}

Alternatively there is an endpoint to send single lines ( https://www.loggly.com/docs/http-endpoint/ ) - although that might be a bit too chatty but I also can't find a way to force Fluent-Bit to send a single line at a time.

Part 2 - Proper Way Of Subsetting K8s Fields

https://fluentbit.io/documentation/current/filter/kubernetes.html
https://fluentbit.io/documentation/current/filter/record_modifier.html

We use the awesome Kubernetes Filter but have a lot of K8s annotations on our pods, and this is a nice to have to cut down on log volume. We looked into using the Record Modifier with the Kubernetes Filter to prune down the output to only the fields we cared about. We were having a bit of trouble getting the configuration exactly how we wanted, but curious whether a subset of the K8s data should be specified by improvements to the Kubernetes Filter or Record Modifier as a more general approach?

Whitelist_key kubernetes
Whitelist_key kubernetes.namespace_name
Whitelist_key kubernetes.labels.release
Whitelist_key kubernetes.host
Whitelist_key kubernetes.pod_name
Whitelist_key kubernetes.container_name
@brandoncole
Copy link
Contributor Author

brandoncole commented May 2, 2018

Thinking out loud, for Part 1 maybe the HTTP Output could support

  1. json_newline_stream - use \n separator
  2. json_inline_stream - use \s separator

There is currently a json_stream type that is implemented with a space here:

https://github.com/fluent/fluent-bit/blob/master/plugins/out_http/http.c#L124

...
        for (p = json_buf; p!=end; p++) {
            if (in_escape)
                in_escape = FLB_FALSE;
            else if (*p == '\\')
                in_escape = FLB_TRUE;
            else if (*p == '"')
                in_string = !in_string;
            else if (!in_string) {
                if (*p == '{')
                    level++;
                else if (*p == '}')
                    level--;
                else if ((*p == '[' || *p == ']' || *p == ',') && level == 0)
                    *p=' ';
            }
...

@ffscl
Copy link
Contributor

ffscl commented May 3, 2018

@brandoncole I originally implemented the json_streaming feature for an advanced scenario for sending logs to the Splunk HEC. Your PR in #581 seems reasonable from my view to solve your first issue.

For issue 2, please checkout #531 which may fit your needs.

An example on how you might use the nest plugin:

[FILTER]
    Name nest
    Match *
    Operation lift
    Nested_under kubernetes
    Prefix_with k8s:

[FILTER]
    Name nest
    Match *
    Operation lift
    Nested_under k8s:labels
    Prefix_with k8s:labels:

[FILTER]
    Name nest
    Match *
    Operation lift
    Nested_under k8s:annotations
    Prefix_with k8s:annotations:

[FILTER]
    Name record_modifier
    Match *
    Remove_key k8s:annotations:kubernetes.io*

This will prefix all fields in the kubernetes metadata. Your example fields would become:

k8s:namespace_name
k8s:labels:release
k8s:host
k8s:pod_name
k8s:container_name

Then you can now easily apply a whitelist.

@brandoncole
Copy link
Contributor Author

@ffscl thanks for the feedback on #581 and the nest plugin! That's awesome and good to know. I love that this project is moving fast and doing such a great job at making small little reusable pieces!

I was looking through the source code for the Kubernetes filter and also found it had an undocumented option Annotations

    [FILTER]
        Name            kubernetes
        Match           kube.*
        Annotations     Off

That got me 80% of the way there, and combined with what you posted above, will work perfect.

varun-da added a commit to varun-da/charts that referenced this issue Nov 26, 2018
varun-da added a commit to varun-da/charts that referenced this issue Nov 26, 2018
rawahars pushed a commit to rawahars/fluent-bit that referenced this issue Oct 24, 2022
…luent#580)

Signed-off-by: Patrick Stephens <patrick.stephens@couchbase.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants