Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kubernetes][Dashboard] Fix proxy, controller manager and scheduler dashboards to support TSDB #5992

Merged
merged 10 commits into from
May 12, 2023

Conversation

constanca-m
Copy link
Contributor

@constanca-m constanca-m commented Apr 25, 2023

What does this PR do?

In proxy, controller manager and scheduler dashboards, there are some visualizations that use counter fields with now unsupported formulas - sum and last_value - when TSDB is enabled. This PR fixes those issues so we can have similar visualizations for the same metrics.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

How to test this PR locally

  1. Clone this branch and run elastic-package build inside Kubernetes package.
  2. Run elastic-package stack up -v -d --version=8.8.0-SNAPSHOT.
  3. Add Kubernetes integration with TSDB enabled for these 3 data streams.

Related issues

Screenshots

These screenshots only include the before/after of the broken visualizations. Others that used a counter with last_value were switched to max, but there was no change in the visualization. This is needed because last_value is also no longer supported for counter fields once TSDB is enabled.

Proxy dashboard

Before:
Screenshot from 2023-04-25 09-15-51

After:
After-Proxy

Note: "No results found" is shown because there are no documents for the applied filter in these two visualizations. There is nothing wrong with them.

Controller manager dashboard

Before:
Screenshot from 2023-04-25 09-17-47
Screenshot from 2023-04-25 09-17-35

After:
After-1
After-2

Note: Since we can no longer use sum we have to breakdown the metrics by the labels. This will likely lead to very confusing visualizations but as of now, there is no workaround.

Scheduler dashboard

Before:
Screenshot from 2023-04-25 09-18-52
Screenshot from 2023-04-25 09-18-32
Screenshot from 2023-04-25 09-18-42

After:
Screenshot from 2023-04-25 16-44-11
Screenshot from 2023-04-25 16-44-19
Screenshot from 2023-04-25 16-44-29

Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@constanca-m constanca-m added enhancement New feature or request Integration:kubernetes Kubernetes labels Apr 25, 2023
@constanca-m constanca-m requested review from tetianakravchenko and a team April 25, 2023 14:57
@constanca-m constanca-m self-assigned this Apr 25, 2023
@constanca-m constanca-m requested a review from a team as a code owner April 25, 2023 14:57
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@elasticmachine
Copy link

elasticmachine commented Apr 25, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-05-11T17:17:06.749+0000

  • Duration: 29 min 4 sec

Test stats 🧪

Test Results
Failed 0
Passed 92
Skipped 0
Total 92

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Apr 25, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (0/0) 💚
Files 100.0% (0/0) 💚 1.961
Classes 100.0% (0/0) 💚 1.961
Methods 96.154% (75/78) 👍 6.257
Lines 100.0% (0/0) 💚 5.711
Conditionals 100.0% (0/0) 💚

Copy link
Contributor

@drewdaemon drewdaemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problems from my end!

@gizas
Copy link
Contributor

gizas commented Apr 26, 2023

I would suggest to change the labels of "Average Request" to "Max Average Request" .
Just to be on safe side and more precise to what we depict.

In general you can add the Max ref to all visualisations that include it

…11ed-b760-5d1bccb47f56.json

Co-authored-by: Andrew Gizas <andreas.gkizas@elastic.co>
@constanca-m
Copy link
Contributor Author

I would suggest to change the labels of "Average Request" to "Max Average Request" . Just to be on safe side and more precise to what we depict.

I think that might be confusing. Since it is a counter this max will be the same as using last value. @gizas

@gizas
Copy link
Contributor

gizas commented Apr 26, 2023

@constanca-m I would say the opposite. The last value can be diffrent than the max and can lead to totally diffrent results in the calculation. For eg. last value of cpu can be something low but the max could have been a spike for the given period. So users should understand what they see

@constanca-m
Copy link
Contributor Author

I am not sure I understand. Since kubernetes.proxy.process.cpu.sec is a counter, we know that the last value will always be equal to the max, since it can only increase. So having counter_rate(last_value(kubernetes.proxy.process.cpu.sec)) and counter_rate(last_value(kubernetes.proxy.process.cpu.sec)) (like before) should be the same thing. The reason to change is that last_value is no longer supported for counters @gizas

@gizas
Copy link
Contributor

gizas commented Apr 26, 2023

Ah indeed is a counter, was not paying attention. What happens if the counter resets to zero for example?

@constanca-m
Copy link
Contributor Author

That will be a problem @gizas, but I don't see a workaround for that right now.

@@ -220,7 +220,7 @@
"suffix": "s"
}
},
"formula": "last_value(kubernetes.proxy.sync.networkprogramming.duration.us.sum)/(pick_max(last_value(kubernetes.proxy.sync.networkprogramming.duration.us.count),1))/1000000",
"formula": "max(kubernetes.proxy.sync.networkprogramming.duration.us.sum)/(pick_max(max(kubernetes.proxy.sync.networkprogramming.duration.us.count),1))/1000000",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if last_value is already supported, according to elastic/kibana#156168, why it was changed here last_value -> max?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not critical because of this explanation #5992 (comment), trying to understand if there is any other reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I pushed the changes, last value was still not available @tetianakravchenko, but that should be the correct aggregation. I will check that now.

Signed-off-by: Tetiana Kravchenko <tetiana.kravchenko@elastic.co>
@tetianakravchenko tetianakravchenko merged commit 993e972 into elastic:main May 12, 2023
@elasticmachine
Copy link

Package kubernetes - 1.37.0 containing this change is available at https://epr.elastic.co/search?package=kubernetes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Integration:kubernetes Kubernetes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants