Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable beats stack monitoring configuration #5878

Merged
merged 69 commits into from
Sep 26, 2022

Conversation

naemono
Copy link
Contributor

@naemono naemono commented Jul 18, 2022

Closes #5563

This change enables easy stack monitoring configuration for Beats, such as we already have for both Elasticsearch and Kibana, by adding the following configuration stanza in the Beats CRD

apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
spec:
  monitoring:
    metrics:
      elasticsearchRefs:
      - name: elasticsearch
    logs:
      elasticsearchRefs:
      - name: elasticsearch

@naemono naemono added the >feature Adds or discusses adding a feature to the product label Jul 18, 2022
@naemono naemono marked this pull request as ready for review July 18, 2022 18:13
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick first look, I haven't done any testing yet.

pkg/controller/beat/common/pod_test.go Outdated Show resolved Hide resolved
test/e2e/test/beat/checks.go Outdated Show resolved Hide resolved
@pebrc pebrc self-assigned this Jul 28, 2022
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM. I am having second thoughts though whether we should stick to the common

monitoring:
    metrics:
      elasticsearchRefs:
      - name: metrics-es
    logs:
      elasticsearchRefs:
      - name: logs-es

schema just to keep the door open for future enhancements that would for example export logs as well and to keep the CRDs somewhat uniform. Can't quite make my mind up. Maybe @thbkrkr has an opinion as well given that he wrote the original implementation.

test/e2e/test/checks/monitoring.go Outdated Show resolved Hide resolved
pkg/apis/beat/v1beta1/validations.go Outdated Show resolved Hide resolved
Removing snapshot from minimal beats stack version.
Adding some documentation around why 7.2 is the minimum version.
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought about it some more and asked for opinions from others and I think now we should indeed stick with the same API schema we have for Kibana and Elasticsearch. The reasons are as originally stated: to be consistent across CRDs and to be able to add log extraction through Filebeat. It would be ok to do the log bit in a separate PR though if you prefer. It might also be good to reach out to the Observability team to get their take on the preferred way to do Beats monitoring and log extraction given that this leads to a kind of Russian doll problem where we add more Beats eg. Filebeat to monitor another Beat, which then raises the question where does one stop (but we already have that problem with the monitoring of Kibana and ES so maybe that is ok)

@naemono
Copy link
Contributor Author

naemono commented Aug 2, 2022

@pebrc I've reached out to observability, and they also suggested we use sidecars to monitor these instead of internal collectors. My changes to allow sidecars is nearly complete, with some unit tests for the functionality being taken care of, and I'll note that this is ready for another look.

@naemono
Copy link
Contributor Author

naemono commented Sep 1, 2022

run/e2e-tests tags=beat

@thbkrkr thbkrkr self-assigned this Sep 8, 2022
Copy link
Contributor

@thbkrkr thbkrkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I'm testing with this manifest and I get a socket permission error for Filebeat:

{
    "log.level": "error",
    "@timestamp": "2022-09-09T13:40:34.362Z",
    "log.origin": {
        "file.name": "module/wrapper.go",
        "file.line": 256
    },
    "message": "Error fetching data for metricset beat.state: error making http request:
 Get \"http://unix/state\": dial 
 unix /var/shared/filebeat-default-filebeat.sock: connect: permission denied",
    "service.name": "metricbeat",
    "ecs.version": "1.6.0"
}

pkg/apis/beat/v1beta1/beat_types.go Outdated Show resolved Hide resolved
pkg/controller/association/controller/beat_monitoring.go Outdated Show resolved Hide resolved
pkg/controller/beat/common/pod_test.go Outdated Show resolved Hide resolved
docs/advanced-topics/stack-monitoring.asciidoc Outdated Show resolved Hide resolved
pkg/controller/beat/common/stackmon/filebeat.yml Outdated Show resolved Hide resolved
@naemono
Copy link
Contributor Author

naemono commented Sep 12, 2022

This looks good! I'm testing with this manifest and I get a socket permission error for Filebeat:

{
    "log.level": "error",
    "@timestamp": "2022-09-09T13:40:34.362Z",
    "log.origin": {
        "file.name": "module/wrapper.go",
        "file.line": 256
    },
    "message": "Error fetching data for metricset beat.state: error making http request:
 Get \"http://unix/state\": dial 
 unix /var/shared/filebeat-default-filebeat.sock: connect: permission denied",
    "service.name": "metricbeat",
    "ecs.version": "1.6.0"
}

I'm investigating these permission issues...

@naemono
Copy link
Contributor Author

naemono commented Sep 13, 2022

@thbkrkr meant to update yesterday. The permission errors were from attempting to use /usr/share/filebeat-sidecar as the path for the filbeat sidecar, which was unnecessary. I removed that from the config template, tested your manifest locally, and saw no issues. This should be ready for 👀 again.

When primary beat is running as root, also run sidecars as root to be able to read socket.
Tests around sidecars running as root.
@naemono
Copy link
Contributor Author

naemono commented Sep 14, 2022

@thbkrkr There were additional issues found surrounding sidecar permissions reading the unix socket. These were resolved, and tests were added around this feature. It's again ready for 👀

@pebrc
Copy link
Collaborator

pebrc commented Sep 21, 2022

I stepped into a trap when testing this yesterday by using one of our existing manifests with the -e option and only with @thbkrkr 's help was able to identify that this was the reason why my logs would not show up in the monitoring cluster.

I think we have a usability issue here with some of the existing recipes using the -e option. We can assume that some of our users also might have specified it when overriding the default beats command. I think we should at least document this requirement and maybe even consider, as @thbkrkr suggested, to automatically remove the -e option if log delivery is configured.

…ontainer args when stack monitoring is enabled.

Update test to ensure that it's removed.
@naemono
Copy link
Contributor Author

naemono commented Sep 21, 2022

I think we have a usability issue here with some of the existing recipes using the -e option. We can assume that some of our users also might have specified it when overriding the default beats command. I think we should at least document this requirement and maybe even consider, as @thbkrkr suggested, to automatically remove the -e option if log delivery is configured.

I didn't think of the scenario where the customer/recipe already had the -e option. Resolved in 52cd829. I'll also get the documentation updated to note this.

…moved when logs stack monitoring is enabled.
@naemono
Copy link
Contributor Author

naemono commented Sep 21, 2022

@pebrc documentation added. Let me know how you feel about the wording.

@thbkrkr thbkrkr added the v2.5.0 label Sep 22, 2022
Copy link
Contributor

@thbkrkr thbkrkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

Co-authored-by: Thibault Richard <thbkrkr@users.noreply.github.com>
@naemono naemono merged commit d1dd8eb into elastic:main Sep 26, 2022
@naemono naemono deleted the 5563-enable-beats-monitoring branch September 26, 2022 18:26
fantapsody pushed a commit to fantapsody/cloud-on-k8s that referenced this pull request Feb 7, 2023
* Add stack monitoring for Beats 
* Public Documentation for Beats stack monitoring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature Adds or discusses adding a feature to the product v2.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

enable beats monitoring in ECK
4 participants