Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Run log rate analysis on metrics data #182652

Open
benakansara opened this issue May 6, 2024 · 3 comments
Open

[POC] Run log rate analysis on metrics data #182652

benakansara opened this issue May 6, 2024 · 3 comments
Assignees
Labels
Team:obs-ux-management Observability Management User Experience Team

Comments

@benakansara
Copy link
Contributor

Create a POC to review if Log rate analysis can be used for metrics data to have meaningful analysis results.

Acceptance criteria

  • Test Log rate analysis with metrics data in Custom threshold rule
  • Capture analysis results
@benakansara benakansara added the Team:obs-ux-management Observability Management User Experience Team label May 6, 2024
@benakansara benakansara self-assigned this May 6, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

@benakansara
Copy link
Contributor Author

Using CCS

Scanario 1: Group by host.hostname, Document count alert on metrics data

Screenshot 2024-05-23 at 23 33 30

Scanario 2: Group by kubernetes.pod.name, avg (kubernetes.pod.cpu.usage.limit.pct) alert

Screenshot 2024-05-23 at 23 48 25

Scenario 3: sum (kubernetes.pod.network.rx.bytes)/1000000000000) + avg (kubernetes.container.cpu.usage.limit.pct) + max (kubernetes.node.cpu.capacity.cores) alert, Without group by

Screenshot 2024-05-24 at 00 00 51

Scenario 4: (avg (kubernetes.container.cpu.usage.limit.pct) + avg (kubernetes.container.memory.usage.bytes)) / 10000000 alert, Group by kubernetes.pod.name

Screenshot 2024-05-24 at 00 21 01

Using kbn-data-forge

Metrics data

Scanario 1: Increase in number of hosts, Before spike: 2 hosts (host-0, host-1), At spike: 10 hosts (host-0 ... host-9), Without group by, Average(system.cpu.user.pct) alert, Higher cpu usage in new hosts

The analysis identifies new hosts, containers, labels, network.

Screenshot 2024-05-22 at 22 15 58

Scanario 2: Increase in number of hosts, Before spike: 2 hosts (host-0, host-1), At spike: 10 hosts (host-0 ... host-9), Without group by, Document count alert on metrics data

The analysis identifies new hosts, containers, labels, network.

Screenshot 2024-05-22 at 14 21 34

Scanario 3: Change in metric values for system.cpu.user.pct metric, Without group by, Average(system.cpu.user.pct) alert

The analysis did not find any results.

Scanario 4: Change in metric values for system.cpu.user.pct metric, new metric added (system.cpu.system.pct) in spiked documents, Without group by, Average(system.cpu.user.pct) alert

The analysis did not find any results.

Scanario 5: Change in metric values for system.cpu.user.pct metric, Group by host.name, Average(system.cpu.user.pct) alert

The analysis did not find any results.

Scanario 6: Change in document count, Group by host.name, Average(system.cpu.user.pct) alert

The analysis did not find any results.

Scanario 7: Change in document count, Group by host.name, Document count alert on metrics data

The analysis did not find any results.

@benakansara
Copy link
Contributor Author

Based on these observations, the "Log rate analysis" works best when there are differences in documents in terms of particular keyword/text fields that appear more frequently or less frequently. It does not find differences in numeric values.

In a more controlled setup like using kbn-data-forge where we decide the shape of data to test, the results are not promising. It's because we generate documents with same set of fields changing only particular metric value.

In conclusion, the analysis itself could work with any kind of documents, not just logs. In real metrics data, generally there are other "keyword" fields in a document beside metric fields. The analysis would find variations in those fields when ran on metrics data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:obs-ux-management Observability Management User Experience Team
Projects
None yet
Development

No branches or pull requests

2 participants