Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add condition to leverage kubernetes provider for controllermanager and scheduler data streams #1324

Conversation

MichaelKatsoulis
Copy link
Contributor

@MichaelKatsoulis MichaelKatsoulis commented Jul 15, 2021

What does this PR do?

This PR adds a condition in controller manager and scheduler data streams so as to leverage kubernetes dynamic provider.
Only agent running on k8s node where Kube-controller-manager and Kube-scheduler are also running(control plane) will fetch metrics.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • If I'm introducing a new feature, I have modified the Kibana version constraint in my package's manifest.yml file to point to the latest Elastic stack release (e.g. ^7.13.0).

How to test this PR locally

  1. Build the kubernetes package elastic-package build inside integrations/packages/kubernetes folder
  2. Bring up elastic stack elastic-package stack up --version=7.15.0-SNAPSHOT -v -d
  3. Set up multi-node k8s cluster with kind (https://kind.sigs.k8s.io/docs/user/quick-start/#multinode-clusters)
  4. Connect kind nodes to elastic stack network docker-connect.sh.zip
  5. In Kibana ui create a new agent policy with kube-controller-manager and kube-scheduler integration. Activate Collect Kubernetes metrics from Kubernetes Scheduler and Collect metrics from Kubernetes controller-manager. Save the integration to the policy and copy the enrolment token of the new policy.
  6. Update elastic-agent-managed daemonset variables (FLEET_URL: "http://fleet-server:8220", FLEET_ENROLLMENT_TOKEN: token from step 5 and set image to docker.elastic.co/beats/elastic-agent:7.15.0-SNAPSHOT . Apply the manifests kubectl apply -f .
  7. After agents are successfully enrolled to fleet and assigned the policy check in discovery that controller-manager and scheduler metrics (for example metricset=controllermanager) come only from agent.name==kind-control-plane which is the agent running on the node that controller-manager is running.

In a scenario where controller manager does not have component=kube-controller-manager label already set, the user can set a custom label like my_label=controller to the running pod and in step 5 can specify in advanced options of kube-controller-manager integration the label key and value. After applying the policy the agent running in kind-control-plane will start fetching metrics from the pod.

Screenshots

agents

discover-controllermanager

labels

custom_label

@MichaelKatsoulis MichaelKatsoulis marked this pull request as draft July 15, 2021 09:48
@MichaelKatsoulis MichaelKatsoulis added enhancement New feature or request Team:Integrations Label for the Integrations team labels Jul 15, 2021
@elasticmachine
Copy link

elasticmachine commented Jul 15, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-08-11T15:45:54.505+0000

  • Duration: 29 min 34 sec

  • Commit: ea53c24

Test stats 🧪

Test Results
Failed 0
Passed 112
Skipped 0
Total 112

Trends 🧪

Image of Build Times

Image of Tests

@MichaelKatsoulis
Copy link
Contributor Author

Tests fail because the added conditions are never matched as the kubernetes autodiscovery is not working properly.
An update is needed in agent/fleet to enable (by default?) Kubernetes dynamic provider through configuration.
In standalone agent version this block is used in agent.yml

agent:
  monitoring:
    enabled: true
    use_output: default
    logs: true
    metrics: true
  providers.kubernetes:
    node: ${NODE_NAME}
    scope: node

When providers.kubernetes block is not configured, then by default node name is derived from DiscoverKubernetesNode.
When node is not set in the config, code tries to get the container hostname and then fetch the node_name from the pod description.
If the hostname of the pod is not its actual name (happens in case of kind k8s deployments) then pod is not found and the provider fails.
An addition to that function could be to add an extra step of looking up node_name in the environment vars as it is already defined in agent startup manifest. https://github.com/elastic/beats/blob/master/deploy/kubernetes/elastic-agent-managed/elastic-agent-managed-daemonset.yaml#L44

Thoughts on the latter?
cc @ChrsMark @jsoriano

@@ -9,3 +9,4 @@ period: {{period}}
bearer_token_file: {{bearer_token_file}}
ssl.verification_mode: {{ssl.verification_mode}}
{{/if}}
condition: ${kubernetes.pod.labels.component} == 'kube-controller-manager'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would worth giving the option to users to change it via the UI in case they have different labels than the default ones for these components.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit over engineered by the users to have that option as rarely anyone would remove the default kube-controller-manager option but If we have that approach in general I can go with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If upstream k8s or a special distro of k8s changes the value of this label then the integration will be incompatible. I feel like it's too fragile to depend to a specific value of a specific label. Since the default value will be there users will not really be bothered by this option, like what happens for bearer_token and others.

@ChrsMark
Copy link
Member

Thoughts on the latter?
cc @ChrsMark @jsoriano

I think that DiscoverKubernetesNode needs to be revisited/improved a little bit, there is also elastic/beats#23400 reporting some issues.

As far as I have seen we usually set this node: ${NODE_NAME} for example at https://github.com/elastic/beats/blob/dccf5cab6c9db04eec6be57fdaaa06aa3be5a357/deploy/kubernetes/metricbeat-kubernetes.yaml#L55. I would guess that it would be just safe instead of returning defaultNode to try to get NODE_NAME from the env and return this instead.

@MichaelKatsoulis
Copy link
Contributor Author

As far as I have seen we usually set this node: ${NODE_NAME} for example at https://github.com/elastic/beats/blob/dccf5cab6c9db04eec6be57fdaaa06aa3be5a357/deploy/kubernetes/metricbeat-kubernetes.yaml#L55. I would guess that it would be just safe instead of returning defaultNode to try to get NODE_NAME from the env and return this instead.

I agree that defaultNode (localhost) needs to be replaced with the value extracted from env var. But according also with the discussion in elastic/beats#23400 , we should handle cases where the env var is not set (equals "").

  1. Returning an error when os.Getenv("NODE_NAME")=="" and later handling that on the callers side is an option or
  2. Returning just the env var in anyway as default and then in case of empty string the caller would fail the metricbeat/filebeat process

@jsoriano
Copy link
Member

An update is needed in agent/fleet to enable (by default?) Kubernetes dynamic provider through configuration.

This recalls me to this discussion, it'd be nice if providers would start automatically when used 🙂

If we enable providers by default we have to make them to silently fail in environments where they are not used nor expected to work, something like this is done in some add_*_metadata processors.

When node is not set in the config, code tries to get the container hostname and then fetch the node_name from the pod description.
If the hostname of the pod is not its actual name (happens in case of kind k8s deployments) then pod is not found and the provider fails.

Yes, this is a bit of a hack, it shouldn't be assumed that the host name of a kubernetes node is its node name, we should take this into account if we refactor DiscoverKubernetesNode.

Checking for the existence of NODE_NAME sounds good to me, but still I think that when all discovery methods fail (including NODE_NAME not being set), then this method should return an error.

@ChrsMark
Copy link
Member

An update is needed in agent/fleet to enable (by default?) Kubernetes dynamic provider through configuration.

This recalls me to this discussion, it'd be nice if providers would start automatically when used 🙂

If we enable providers by default we have to make them to silently fail in environments where they are not used nor expected to work, something like this is done in some add_*_metadata processors.

In think that in this case the provider is not initialised properly due to issues with DiscoverKuberneteNode but it doesn't crash. It just uses the wrong node in the watcher and events are never retrieved and hence the added condition: ${kubernetes.pod.labels.component} == 'kube-controller-manager' will never become true. As a result, the test fails because the data_stream never ships events to ES.

@MichaelKatsoulis
Copy link
Contributor Author

@ChrsMark , @jsoriano I will create a new PR to update DiscoverKuberneteNode and we can move implementation discussions there.

@MichaelKatsoulis MichaelKatsoulis marked this pull request as ready for review August 10, 2021 10:30
@elasticmachine
Copy link

Pinging @elastic/integrations (Team:Integrations)

@MichaelKatsoulis
Copy link
Contributor Author

@ChrsMark @jsoriano after the update in DiscoverKuberneteNode the defaults work well for Kubernetes dynamic provider in 7.15.0-SNAPSHOT version.
I think if you don't have any more comments on this, we can finally merge it. Tested locally and works as expected.

@@ -9,3 +9,4 @@ period: {{period}}
bearer_token_file: {{bearer_token_file}}
ssl.verification_mode: {{ssl.verification_mode}}
{{/if}}
condition: ${kubernetes.pod.labels.component} == '{{controller_manager_label}}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how we could support deployments that don't set the component label.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about that. Giving the user the possibility to define the key and the value of a label like setting 'component=kube-controller-manager'.
The problem is that somehow we need to split this variable inside the handlebar file in order to create the correct condition. And this requires to register some helpers and I don't if this is supported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsoriano after proposals I went on with a different approach on this. I added controller_manager_label_key and controller_manager_label_value as hidden vars in advanced settings (see the last screenshots) and you can set there the label key and value you want. Default stays the component label. It works perfectly for custom labels defined by user

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -9,3 +10,5 @@ period: {{period}}
bearer_token_file: {{bearer_token_file}}
ssl.verification_mode: {{ssl.verification_mode}}
{{/if}}

condition: ${kubernetes.pod.labels.{{~controller_manager_label_key~}} } == '{{controller_manager_label_value}}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, handlebars is processed before the config 👍

@MichaelKatsoulis MichaelKatsoulis merged commit c7b31c0 into elastic:master Aug 12, 2021
eyalkraft pushed a commit to build-security/integrations that referenced this pull request Mar 30, 2022
…nd scheduler data streams (elastic#1324)

* Add condition to leverage kubernetes provider for controllermanager and scheduler data streams
* Allow users to change controller manager and scheduler labels
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants