Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubernetes_metadata in fluentd as optional filter #1148

Closed
peterbosalliandercom opened this issue Dec 14, 2022 · 11 comments
Closed

Add kubernetes_metadata in fluentd as optional filter #1148

peterbosalliandercom opened this issue Dec 14, 2022 · 11 comments

Comments

@peterbosalliandercom
Copy link

What is the problem:
We are using the logging-operator with fluentbit/fluentd for our logcollection. The output flows through fluentd to logstash to ES. In ES we find that we only get a subset of kubernetes metadata in the index. We are using the kubernetes.namespace_labels to identify the which tenant the logging comes from. We use the following label which is placed by the Capsule (https://capsule.clastix.io/) operator to identify the tenant: capsule.clastix.io/tenant: tenantxxx ). In logstash we want to split indexes per tenant by using this namespace_label. The problem is that the logging-operator does not send namespace_labels by default, and it cannot be configured by using enhancedk8s.

What do we like:
We want to have fluentd filter option (in enhancedk8s?) to add kubernetes_metadata itself in the logging-operator (override the fluentbit metadata). The filteroption like this <filter **> @type kubernetes_metadata </filter>

Background
We are using the BanzaiCloud logging-operator (https://banzaicloud.com/docs/one-eye/logging-operator) so the configuration fixed and defaults to the use of fluentbit for kubernetes metadata, we cannot change that so there is now way to fix it without breaking the operator functionality. (related issue: #704) The only way is by overriding everything the operator does (rbac) and fluentd config, but then the whole point of using the operator is lost and it cannot be maintained properly.

@peterbosalliandercom
Copy link
Author

I think hostname is also part of the kubernetes_metadata so this may be related: #1133

@pepov
Copy link
Member

pepov commented Mar 8, 2023

I beleive this is already supported with https://kube-logging.github.io/docs/configuration/plugins/filters/enhance_k8s/ @peterbosalliandercom could you please check?

Feel free to reopen this is not a solution to your problem!

@pepov
Copy link
Member

pepov commented Mar 8, 2023

Sorry it seems that the original plugin does not support namespace metadata, so I recommend opening an issue there and we can pull it into the fluentd image: https://github.com/SumoLogic/sumologic-kubernetes-fluentd/blob/main/fluent-plugin-enhance-k8s-metadata/lib/fluent/plugin/filter_enhance_k8s_metadata.rb

@pepov pepov closed this as completed Mar 8, 2023
@sebracs
Copy link

sebracs commented Aug 11, 2023

Even if FluentBit and enhance_k8s would support this, would it still not make sense to have this option to get information from FluentD? FluentD is generally positioned to be heavier to also more feature-rich than FluentBit.
So what is speaking against a feature that is already supported there?

@pepov
Copy link
Member

pepov commented Aug 14, 2023

In general the issue with depending on kubernetes metadata which is enhanced in the aggregator layer is that the pod might already be long gone, when you try to get this information.

We can take a look and support the mentioned filter if that helps users in certain use cases, but I would strongly recommend to add the labels you need to the pod directly. I understand that this can be difficult to do, but in case someone is running a multi-tenant cluster it should be beneficial to mutate pods (with the help of a policy engine) even if they don't have the right labels.

@pepov pepov reopened this Aug 14, 2023
@pepov
Copy link
Member

pepov commented Aug 14, 2023

My previous comment is not entirely accurate since the plugin holds a cache of namespaces objects, so it does not necessarily requires a pod to exist, however the information might still be inconsistent since the namespace metadata might change between the log generation and processing.

Anyways I checked and we already pull that plugin into the image, we just don't specify a wrapper for it, which is something we can do.

@sebracs
Copy link

sebracs commented Aug 14, 2023

In general the issue with depending on kubernetes metadata which is enhanced in the aggregator layer is that the pod might already be long gone, when you try to get this information.

We can take a look and support the mentioned filter if that helps users in certain use cases, but I would strongly recommend to add the labels you need to the pod directly. I understand that this can be difficult to do, but in case someone is running a multi-tenant cluster it should be beneficial to mutate pods (with the help of a policy engine) even if they don't have the right labels.

Is this also the case with enhanceK8s plugin? I would think so. If not, namespace_labels could theoretically also be added there by using the code from kubernetes_metadata (which seems a lot easier than adding to FluentBit buildin Kubernetes filter, which was mentioned here fluent/fluent-bit#6544 )

@pepov
Copy link
Member

pepov commented Aug 15, 2023

I'm open to adding anything that could help users here, even if I still think that doing it on the aggregator level is suboptimal.

If someone can show a working solution in fluentd lingo using any of the above sumologic plugins I'm happy to help integrating those.

In the meantime I'm thinking the logging operator could help with a mutating webhook. Users wouldn't need another tool, but the operator could ensure to add specific labels to pods on the fly if those exist on the namespace. That would be the most reliable solution in my opinion.

@pepov
Copy link
Member

pepov commented Jan 11, 2024

there is a good chance this is going to be supported in fluentbit soon: fluent/fluent-bit#8279

@pepov
Copy link
Member

pepov commented Mar 28, 2024

Namespace labels are already available see:
https://kube-logging.dev/docs/whats-new/#kubernetes-namespace-labels-and-annotations

The use case described above can already be solved by using the latest logging operator version 4.6+ and fluentbit 3.+

Also the use case can be implemented using the multi-tenant architecture supported by logging operator version 4.5+ using isolated aggregators for the tenants with the loggingroute resource:

I'm closing this now

@pepov pepov closed this as completed Mar 28, 2024
@pepov pepov removed this from the 4.x milestone Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants