Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] update packages dataset #4018

Merged
merged 6 commits into from
Sep 6, 2022

Conversation

klacabane
Copy link
Contributor

@klacabane klacabane commented Aug 17, 2022

Summary

Closes #3929

This change updates the datasets of the Stack Monitoring packages to include a stack_monitoring part (ie elasticsearch.stack_monitoring.node. The identifier has two purposes: 1) clarify the intent of these datastreams 2) free the stack products namespaces for the upcoming PO initiative. Only the metrics are impacted because logs will be collected similarly in both SM and PO.

The change also bumps elasticsearch, kibana and logstash package to their next major version while keeping their release to experimental until elastic/kibana#120415 is completed. This will allow the packages to be built into the registry so that we don't have to manually build them locally when testing. I thought that metrics mappings being aligned was a relevant milestone for the bump.

Testing

  • Start an elastic-package stack with elasticsearch, kibana and logstash packages installed:
    • We can automate the package installation by providing the right fleet configuration to kibana, and we can use elastic-package profiles to do that. Let's download a profile that specifically does that. You can skip this step if you want to install the packages manually
    curl https://drive.google.com/uc\?export\=download\&id\=1aqdqNb9JaYXL-C3WjiT2t58Q65Cl4NIV -L -o /tmp/stack_monitoring-profile.zip && \
    unzip -o /tmp/stack_monitoring-profile.zip -d ~/.elastic-package/profiles && \
    rm /tmp/stack_monitoring-profile.zip
    
    • Now cd at the root of the integrations repository and let's build the packages, start the stack with the downloaded profile and also start a logstash service with some predefined pipelines. The command may take a moment to complete and will be done once the logstash service is started, this log message should appear: Service is up, please use ctrl+c to take it down
    (cd packages/elasticsearch && elastic-package build) && \
    (cd packages/kibana && elastic-package build) && \
    (
      cd packages/logstash && elastic-package build && \
      elastic-package stack up -v -d --profile stack_monitoring --version 8.5.0-SNAPSHOT && \
      elastic-package service up -v
    )
    
  • Start a local Kibana with this change [Stack Monitoring] Add stack_monitoring suffix to metrics-* index pattern kibana#137904. See how to connect a local kibana
  • Open kibana at http://localhost:5602 and navigate to Stack Monitoring app. Elasticsearch, Kibana and Logstash section should be showing up with all their views populated. Inspect all the views and report anything bizarre
  • Verify every stack monitoring metrics-* data streams is formatted as metrics-{product}.stack_monitoring.{metricset}

@klacabane klacabane added v8.5.0 Team:Infra Monitoring UI - DEPRECATED Label for the Infrastructure Monitoring UI team. - DEPRECATED - Use Team:obs-ux-infra_services labels Aug 17, 2022
@klacabane klacabane self-assigned this Aug 17, 2022
@elasticmachine
Copy link

elasticmachine commented Aug 17, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-09-06T22:31:39.774+0000

  • Duration: 14 min 53 sec

Test stats 🧪

Test Results
Failed 0
Passed 62
Skipped 0
Total 62

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Aug 17, 2022

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (0/0) 💚
Files 100.0% (0/0) 💚 2.745
Classes 100.0% (0/0) 💚 2.745
Methods 49.462% (46/93) 👎 -40.185
Lines 100.0% (0/0) 💚 9.503
Conditionals 100.0% (0/0) 💚

@klacabane klacabane marked this pull request as ready for review August 17, 2022 15:24
@klacabane klacabane requested a review from a team as a code owner August 17, 2022 15:24
@crespocarlos
Copy link
Contributor

For Logstash internal monitoring I had to add the lines below to packages/logstash/_dev/deploy/docker/config/logstash.yml

monitoring.enabled: true
monitoring.elasticsearch.hosts: https://elasticsearch:9200
monitoring.elasticsearch.username: elastic
monitoring.elasticsearch.password: changeme

And for some reason the docker container name is elastic-package-service-logstash-1, without _.

I've tested this branch using local kibana with changes from elastic/kibana#137904 and everything looks ok, except for this standalone cluster. I'm trying to understand why this is there.
image

I don't see some datastreams like ml_job but that's probably because I don't have any ml jobs running

@crespocarlos
Copy link
Contributor

crespocarlos commented Sep 5, 2022

Not related to this change, but enrich consistently fails due to lack of permission. I think the agent user lacks some privileges

image

@klacabane
Copy link
Contributor Author

klacabane commented Sep 5, 2022

For Logstash internal monitoring I had to add the lines below

Are these settings allowing the Agent to collect data or is it publishing data in .monitoring-logstash ?

And for some reason the docker container name is elastic-package-service-logstash-1, without _.

It seems dependent on the docker version you're running which makes the profiles difficult to share and probably not the best approach when it comes to installing packages. @matschaffer has similar issue when testing the elasticsearch package

except for this standalone cluster

The logstash service defined in _dev contains a standalone pipeline so that is expected, unless the Standalone Cluster view breaks

Not related to this change but enrich consistently fails due to lack of permission. I think the agent user lacks some privileges

Interesting, I'm wondering if this happens in standalone metricbeat as well. We can track this as a separate ticket

Copy link
Contributor

@crespocarlos crespocarlos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just couldn't verify ccr and enrich (because enrich fails).

The standalone cluster was fixed after setting the monitoring.cluster_uuid on logstash.yml

@crespocarlos
Copy link
Contributor

crespocarlos commented Sep 5, 2022

Are these settings allowing the Agent to collect data or is it publishing data in .monitoring-logstash ?

I removed those config lines. The problem was actually the wrong logstash host I used. Even with those lines on the config file, .monitoring-logstash remains empty.

@matschaffer
Copy link
Contributor

matschaffer commented Sep 6, 2022

Glad to hear it wasn't the settings from #4018 (comment) - those would be for internal collection which wouldn't exercise the agent at all.

Also the hyphen and dash is a difference between docker-compose 1 (underscores) & 2 (hyphens). I was running 1 awhile ago, then switched to 2 to see if it would help at all with stack up slowness. You can run docker-compose version to check.

This is part of why I'm a little skeptical of the "canned profile" approach and would like to work on better elastic-package level support for additional services used in testing. There's already code in there that deals with docker-compose differences so I think we can probably deal with it at the golang layer more easily.

@matschaffer
Copy link
Contributor

matschaffer commented Sep 6, 2022

Wondering we might have a new issue with the saved profile now. When I try to connect a main kibana it seems to fail at this stage:

[2022-09-06T13:12:35.789+09:00][INFO ][savedobjects-service] [.kibana] OUTDATED_DOCUMENTS_REFRESH -> UPDATE_TARGET_MAPPINGS. took: 34ms.
[2022-09-06T13:12:35.889+09:00][INFO ][savedobjects-service] [.kibana] UPDATE_TARGET_MAPPINGS -> UPDATE_TARGET_MAPPINGS_WAIT_FOR_TASK. took: 100ms.
{"log.level":"error","@timestamp":"2022-09-06T04:12:44.321Z","log":{"logger":"elastic-apm-node"},"ecs":{"version":"1.6.0"},"message":"APM Server transport error: intake response timeout: APM server did not respond within 10s of gzip stream finish"}
{"log.level":"error","@timestamp":"2022-09-06T04:13:34.146Z","log":{"logger":"elastic-apm-node"},"ecs":{"version":"1.6.0"},"message":"APM Server transport error: error fetching APM Server version: timeout (30000ms) fetching APM Server version"}
[2022-09-06T13:13:35.492+09:00][ERROR][savedobjects-service] [.kibana_task_manager] Action failed with '[timeout_exception] Timed out waiting for completion of [Task{id=19169, type='transport', action='indices:data/write/update/byquery', description='update-by-query [.kibana_task_manager_8.5.0_001]', parentTask=unset, startTime=1662437555506, startTimeNanos=1380955459148883}]'. Retrying attempt 1 in 2 seconds.
[2022-09-06T13:13:35.493+09:00][INFO ][savedobjects-service] [.kibana_task_manager] UPDATE_TARGET_MAPPINGS_WAIT_FOR_TASK -> UPDATE_TARGET_MAPPINGS_WAIT_FOR_TASK. took: 59993ms.

update: nm... just yet another potential manifestation of the low-docker-disk issue 🤦🏻

					"explanation": "the node is above the high watermark cluster setting [cluster.routing.allocation.disk.watermark.high=90%], having less than the minimum required [5.8gb] free space, actual free: [5.5gb], actual used: [90.5%]"

Copy link
Contributor

@matschaffer matschaffer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to merge. The data stream names are as-expected. I wish we had a more complete stack to test with (ccr, logstash, beats, monitoring), but not sure how easy/hard that'd be with the current setup.

I'm catching some logstash docs that are showing up as "standalone" but they cause the UI just look "blank"

Screen Shot 2022-09-06 at 15 23 18

Screen Shot 2022-09-06 at 15 23 46

But I'm fairly certain that's up to elastic/kibana#137904 to fix.

One thing on my mind for this PR is the logs. I guess they don't really need a .stack_monitoring datastream though since the shape won't be changing between now and platform observability.

@matschaffer
Copy link
Contributor

Yeah, looks like those are connection errors.

Screen Shot 2022-09-06 at 15 25 55

So we can fix that part up in kibana.

@matschaffer
Copy link
Contributor

Ah, and the connection failures are due to elastic-package-service_logstash_1 (docker compose v1 style) :)

So many little things to iron out here but I think this PR is fine.

@klacabane
Copy link
Contributor Author

Merging - the ghost standalone cluster will be fixed with elastic/kibana#140102

@klacabane klacabane merged commit aadbf6e into elastic:main Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Infra Monitoring UI - DEPRECATED Label for the Infrastructure Monitoring UI team. - DEPRECATED - Use Team:obs-ux-infra_services v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Stack Monitoring] update data streams dataset section
4 participants