-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the telemetry collection of Logstash with metricbeat monitoring. #182304
Fix the telemetry collection of Logstash with metricbeat monitoring. #182304
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
x-pack/plugins/monitoring/server/telemetry_collection/get_cluster_uuids.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking a look at this! We needed someone with knowledge about the new shape of the data to fix this. 🧡
It Looks Great To Me! My only comment is: can we normalize the response of get_es_stats
so that it doesn't leak out to other plugins and functions?
x-pack/plugins/monitoring/server/telemetry_collection/get_es_stats.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts
Show resolved
Hide resolved
8ba410c
to
c20f3f9
Compare
FYI: I have updated the unit test cases which align with current changes, wil try to add for metricbeat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bearing with me and explaining the reasoning behind the changes.
FWIW, my main concern is that we're leaking code outside the monitoring
plugin. Other than that, all looking good aside from the global logstash request now being moved inside the loop.
x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/telemetry_collection/get_all_stats.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/monitoring/server/telemetry_collection/get_es_stats.ts
Outdated
Show resolved
Hide resolved
152d6a8
to
f86444c
Compare
Hmm, the failing tests indicate that we're somehow not returning the monitoring data... Maybe the new query is failing? UPDATE: Found it in the logs:
|
… shape, telemetry fetch logics updated to save and send metricbeat data shape.
…stats and state if no cluster Logstash monitoring data found. Co-authored-by: Alejandro Fernández Haro <afharo@gmail.com>
f86444c
to
1701038
Compare
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This is great! Thanks for such an effort!
Thank you so much @afharo. This happened because of your huge help, appreciate! |
💚 Build Succeeded
Metrics [docs]Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @mashhurs |
@afharo, @neptunian can we please backport this change to upcoming 8.14.x releases? |
I've added the appropriate label to back port this PR to the previous minor. Did the same with #182857 Hopefully, our kibanamachine bot backports them for us. |
…lastic#182304) ## Summary Telemetry data collection is broken for Logstash, monitoring with metricbeat. This PR change covers following issues faced: **1) Resolve cluster UUID** - With self monitoring, KB creates `.monitoring-es*` index with mapping `type` field and defaults to [`type:cluster_state`](https://github.com/elastic/kibana/blob/main/packages/kbn-apm-synthtrace-client/src/lib/monitoring/cluster_stats.ts#L25) key-value. It uses [`type:cluster_state` condition when fetching cluster UUID](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_cluster_uuids.ts#L48). - However, with metricbeat, this [_`type` field doesn't exist_ under mapping](https://github.com/elastic/beats/blob/v8.13.2/metricbeat/module/elasticsearch/_meta/fields.yml) which metricbeat creates, so cluster UUID will not be resolved as query is wrong (results empty output). **2) `type` field mismatch in (especially in _state_) queries, also collapse field** - metricbeat sends the _state_ event with `metricset.name:node` and state fetch query doesn't care about this condition, instead uses [legacy `type:logstash_state` condition](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L349) which is incorrect. - in the queries, `collapse` field is not correct: it is due to data shape change from legacy to metricbeat monitoring and queries are tightly coupled with legacy one ([1](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L355), [2](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L346), [3](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L301-L302)) - in the queries, `filter_path` is also not correct: in both [_state_ query](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L332) and [_stats_ query](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L273) **3) Logstash state data frequency** - metricbeat sends _state_ event (_node/stats/pipeline?graph=true) data once but legacy frequently sends. KB telemetry fetcher [queries against ES with latest update period](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L343-L344) where state data will not be available. It might be a reasonable design which results network efficiency. If that's the case, we should decide the expectation - to still collect: we have to tune the query to collect the state data. - leave it as an empty assuming state didn't change (personally, I would not align with this option since collecting once it a data loss risky) --------- Co-authored-by: Alejandro Fernández Haro <afharo@gmail.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> (cherry picked from commit 26f6977)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…oring. (#182304) (#183331) # Backport This will backport the following commits from `main` to `8.14`: - [Fix the telemetry collection of Logstash with metricbeat monitoring. (#182304)](#182304) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Mashhur","email":"99575341+mashhurs@users.noreply.github.com"},"sourceCommit":{"committedDate":"2024-05-09T01:19:11Z","message":"Fix the telemetry collection of Logstash with metricbeat monitoring. (#182304)\n\n## Summary\r\n\r\nTelemetry data collection is broken for Logstash, monitoring with\r\nmetricbeat. This PR change covers following issues faced:\r\n\r\n**1) Resolve cluster UUID**\r\n- With self monitoring, KB creates `.monitoring-es*` index with mapping\r\n`type` field and defaults to\r\n[`type:cluster_state`](https://github.com/elastic/kibana/blob/main/packages/kbn-apm-synthtrace-client/src/lib/monitoring/cluster_stats.ts#L25)\r\nkey-value. It uses [`type:cluster_state` condition when fetching cluster\r\nUUID](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_cluster_uuids.ts#L48).\r\n- However, with metricbeat, this [_`type` field doesn't exist_ under\r\nmapping](https://github.com/elastic/beats/blob/v8.13.2/metricbeat/module/elasticsearch/_meta/fields.yml)\r\nwhich metricbeat creates, so cluster UUID will not be resolved as query\r\nis wrong (results empty output).\r\n\r\n**2) `type` field mismatch in (especially in _state_) queries, also\r\ncollapse field**\r\n- metricbeat sends the _state_ event with `metricset.name:node` and\r\nstate fetch query doesn't care about this condition, instead uses\r\n[legacy `type:logstash_state`\r\ncondition](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L349)\r\nwhich is incorrect.\r\n- in the queries, `collapse` field is not correct: it is due to data\r\nshape change from legacy to metricbeat monitoring and queries are\r\ntightly coupled with legacy one\r\n([1](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L355),\r\n[2](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L346),\r\n[3](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L301-L302))\r\n- in the queries, `filter_path` is also not correct: in both [_state_\r\nquery](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L332)\r\nand [_stats_\r\nquery](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L273)\r\n\r\n**3) Logstash state data frequency** \r\n- metricbeat sends _state_ event (_node/stats/pipeline?graph=true) data\r\nonce but legacy frequently sends. KB telemetry fetcher [queries against\r\nES with latest update\r\nperiod](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L343-L344)\r\nwhere state data will not be available. It might be a reasonable design\r\nwhich results network efficiency. If that's the case, we should decide\r\nthe expectation\r\n- to still collect: we have to tune the query to collect the state data.\r\n- leave it as an empty assuming state didn't change (personally, I would\r\nnot align with this option since collecting once it a data loss risky)\r\n\r\n---------\r\n\r\nCo-authored-by: Alejandro Fernández Haro <afharo@gmail.com>\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>","sha":"26f6977aa126b129a2f7a5cb8f693618c0ae9e80","branchLabelMapping":{"^v8.15.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:fix","backport:prev-minor","apm:review","8.14 candidate","v8.15.0"],"title":"Fix the telemetry collection of Logstash with metricbeat monitoring.","number":182304,"url":"#182304 the telemetry collection of Logstash with metricbeat monitoring. (#182304)\n\n## Summary\r\n\r\nTelemetry data collection is broken for Logstash, monitoring with\r\nmetricbeat. This PR change covers following issues faced:\r\n\r\n**1) Resolve cluster UUID**\r\n- With self monitoring, KB creates `.monitoring-es*` index with mapping\r\n`type` field and defaults to\r\n[`type:cluster_state`](https://github.com/elastic/kibana/blob/main/packages/kbn-apm-synthtrace-client/src/lib/monitoring/cluster_stats.ts#L25)\r\nkey-value. It uses [`type:cluster_state` condition when fetching cluster\r\nUUID](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_cluster_uuids.ts#L48).\r\n- However, with metricbeat, this [_`type` field doesn't exist_ under\r\nmapping](https://github.com/elastic/beats/blob/v8.13.2/metricbeat/module/elasticsearch/_meta/fields.yml)\r\nwhich metricbeat creates, so cluster UUID will not be resolved as query\r\nis wrong (results empty output).\r\n\r\n**2) `type` field mismatch in (especially in _state_) queries, also\r\ncollapse field**\r\n- metricbeat sends the _state_ event with `metricset.name:node` and\r\nstate fetch query doesn't care about this condition, instead uses\r\n[legacy `type:logstash_state`\r\ncondition](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L349)\r\nwhich is incorrect.\r\n- in the queries, `collapse` field is not correct: it is due to data\r\nshape change from legacy to metricbeat monitoring and queries are\r\ntightly coupled with legacy one\r\n([1](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L355),\r\n[2](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L346),\r\n[3](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L301-L302))\r\n- in the queries, `filter_path` is also not correct: in both [_state_\r\nquery](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L332)\r\nand [_stats_\r\nquery](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L273)\r\n\r\n**3) Logstash state data frequency** \r\n- metricbeat sends _state_ event (_node/stats/pipeline?graph=true) data\r\nonce but legacy frequently sends. KB telemetry fetcher [queries against\r\nES with latest update\r\nperiod](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L343-L344)\r\nwhere state data will not be available. It might be a reasonable design\r\nwhich results network efficiency. If that's the case, we should decide\r\nthe expectation\r\n- to still collect: we have to tune the query to collect the state data.\r\n- leave it as an empty assuming state didn't change (personally, I would\r\nnot align with this option since collecting once it a data loss risky)\r\n\r\n---------\r\n\r\nCo-authored-by: Alejandro Fernández Haro <afharo@gmail.com>\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>","sha":"26f6977aa126b129a2f7a5cb8f693618c0ae9e80"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.15.0","branchLabelMappingKey":"^v8.15.0$","isSourceBranch":true,"state":"MERGED","url":"#182304 the telemetry collection of Logstash with metricbeat monitoring. (#182304)\n\n## Summary\r\n\r\nTelemetry data collection is broken for Logstash, monitoring with\r\nmetricbeat. This PR change covers following issues faced:\r\n\r\n**1) Resolve cluster UUID**\r\n- With self monitoring, KB creates `.monitoring-es*` index with mapping\r\n`type` field and defaults to\r\n[`type:cluster_state`](https://github.com/elastic/kibana/blob/main/packages/kbn-apm-synthtrace-client/src/lib/monitoring/cluster_stats.ts#L25)\r\nkey-value. It uses [`type:cluster_state` condition when fetching cluster\r\nUUID](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_cluster_uuids.ts#L48).\r\n- However, with metricbeat, this [_`type` field doesn't exist_ under\r\nmapping](https://github.com/elastic/beats/blob/v8.13.2/metricbeat/module/elasticsearch/_meta/fields.yml)\r\nwhich metricbeat creates, so cluster UUID will not be resolved as query\r\nis wrong (results empty output).\r\n\r\n**2) `type` field mismatch in (especially in _state_) queries, also\r\ncollapse field**\r\n- metricbeat sends the _state_ event with `metricset.name:node` and\r\nstate fetch query doesn't care about this condition, instead uses\r\n[legacy `type:logstash_state`\r\ncondition](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L349)\r\nwhich is incorrect.\r\n- in the queries, `collapse` field is not correct: it is due to data\r\nshape change from legacy to metricbeat monitoring and queries are\r\ntightly coupled with legacy one\r\n([1](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L355),\r\n[2](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L346),\r\n[3](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L301-L302))\r\n- in the queries, `filter_path` is also not correct: in both [_state_\r\nquery](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L332)\r\nand [_stats_\r\nquery](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L273)\r\n\r\n**3) Logstash state data frequency** \r\n- metricbeat sends _state_ event (_node/stats/pipeline?graph=true) data\r\nonce but legacy frequently sends. KB telemetry fetcher [queries against\r\nES with latest update\r\nperiod](https://github.com/elastic/kibana/blob/main/x-pack/plugins/monitoring/server/telemetry_collection/get_logstash_stats.ts#L343-L344)\r\nwhere state data will not be available. It might be a reasonable design\r\nwhich results network efficiency. If that's the case, we should decide\r\nthe expectation\r\n- to still collect: we have to tune the query to collect the state data.\r\n- leave it as an empty assuming state didn't change (personally, I would\r\nnot align with this option since collecting once it a data loss risky)\r\n\r\n---------\r\n\r\nCo-authored-by: Alejandro Fernández Haro <afharo@gmail.com>\r\nCo-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>","sha":"26f6977aa126b129a2f7a5cb8f693618c0ae9e80"}}]}] BACKPORT--> Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>
@afharo Thank you so much for sharing all your knowledge here and getting this to done! |
Summary
Telemetry data collection is broken for Logstash, monitoring with metricbeat. This PR change covers following issues faced:
1) Resolve cluster UUID
.monitoring-es*
index with mappingtype
field and defaults totype:cluster_state
key-value. It usestype:cluster_state
condition when fetching cluster UUID.type
field doesn't exist under mapping which metricbeat creates, so cluster UUID will not be resolved as query is wrong (results empty output).2)
type
field mismatch in (especially in state) queries, also collapse fieldmetricset.name:node
and state fetch query doesn't care about this condition, instead uses legacytype:logstash_state
condition which is incorrect.collapse
field is not correct: it is due to data shape change from legacy to metricbeat monitoring and queries are tightly coupled with legacy one (1, 2, 3)filter_path
is also not correct: in both state query and stats query3) Logstash state data frequency
Checklist
Delete any items that are not applicable to this PR.
[ ] Any text added follows EUI's writing guidelines, uses sentence case text and includes i18n support[ ] Documentation was added for features that require explanation or tutorials[ ] Flaky Test Runner was used on any tests changed[ ] Any UI touched in this PR is usable by keyboard only (learn more about keyboard accessibility)[ ] Any UI touched in this PR does not create any new axe failures (run axe in browser: FF, Chrome)[ ] If a plugin configuration key changed, check if it needs to be allowlisted in the cloud and added to the docker list[ ] This renders correctly on smaller devices using a responsive layout. (You can test this in your browser)[ ] This was checked for cross-browser compatibilityRisk Matrix
Delete this section if it is not applicable to this PR.
Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.
When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:
For maintainers