Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1806518: Monitoring Dashboard metrics corrections #4323

Merged
merged 1 commit into from
Feb 28, 2020

Conversation

abhi-kn
Copy link
Contributor

@abhi-kn abhi-kn commented Feb 14, 2020

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1806518
https://issues.redhat.com/browse/ODC-2943

Analysis / Root cause:
Metrics queries for Dashboard & Metrics tab have been updated in design doc. Hence code need to be synced w.r.t queries / order / labels.
Step count in request was too low (6) & was leading to performance issue due to huge data set in response.

Solution Description:
Updated labels for OOB queries in Monitoring/Metrics.
Synchronised OOB queries in Dashboard & Workload metrics.
Increased step count to 60 by setting defaultSamples count which fixed performance issue.

Screen shots / Gifs for design review:
@openshift/team-devconsole-ux
oob_metrics_update

Test setup: N/A

Browser conformance:

  • Chrome
  • Firefox
  • Safari
  • Edge

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Feb 14, 2020
@openshift-ci-robot
Copy link
Contributor

@abhi-kn: This pull request references Bugzilla bug 1802300, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Bug 1802300: [WIP] Monitoring Dashboard metrics corrections

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 14, 2020
@openshift-ci-robot openshift-ci-robot added the component/dev-console Related to dev-console label Feb 14, 2020
@andrewballantyne
Copy link
Contributor

/kind bug

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 14, 2020
@kyoto
Copy link
Member

kyoto commented Feb 15, 2020

\cc @openshift/openshift-team-monitoring

@@ -71,12 +71,21 @@ export const monitoringDashboardQueries: MonitoringQuery[] = [
humanize: humanizeBinaryBytes,
byteDataType: ByteDataTypes.BinaryBytes,
},
{
query: _.template(
`topk(25, sort_desc(sum((kubelet_volume_stats_used_bytes {namespace='<%= namespace %>'}* on (namespace,persistentvolumeclaim) group_right() kube_pod_spec_volumes_persistentvolumeclaims_info)) by (namespace,pod)))`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I know these metrics refer to persistent volumes, the title though states Local Storage Usage. Is this intended?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @abhi-kn

@@ -85,7 +94,7 @@ export const monitoringDashboardQueries: MonitoringQuery[] = [
`topk(25, sort_desc(sum(rate(container_network_receive_bytes_total{ container="POD", pod!= "", namespace = '<%= namespace %>'}[5m]) + rate(container_network_transmit_bytes_total{ container="POD", pod!= "", namespace = '<%= namespace %>'}[5m])) BY (namespace, pod)))`,
),
chartType: GraphTypes.line,
title: 'Network Received ',
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query includes both Received & Transmitted network so changed the title as per query.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about that? If so then that seems like a bug because there is the separate metric container_network_transmit_bytes_total.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separated received & transmitted bandwidth.

{
query: _.template(
`topk(25, sort_desc(sum(pod:container_fs_usage_bytes:sum{container="",pod!="",namespace='<%= namespace %>'}) BY (pod, namespace)))`,
),
chartType: GraphTypes.line,
title: 'Filesystem Usage',
title: 'Attached Storage Usage',
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the metric used in the query refers to the amount of space the containers rootfs uses, which I don't think is particularly interesting to have on display here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serenamarie125 do you still want to keep it or shall I remove?

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 18, 2020
@spadgett
Copy link
Member

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. and removed bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Feb 20, 2020
@openshift-ci-robot
Copy link
Contributor

@spadgett: This pull request references Bugzilla bug 1802300, which is invalid:

  • expected the bug to target the "4.5.0" release, but it targets "4.4.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 24, 2020
@abhi-kn abhi-kn changed the title Bug 1802300: [WIP] Monitoring Dashboard metrics corrections Bug 1806518: Monitoring Dashboard metrics corrections Feb 24, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Feb 24, 2020
@openshift-ci-robot
Copy link
Contributor

@abhi-kn: This pull request references Bugzilla bug 1806518, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Bug 1806518: Monitoring Dashboard metrics corrections

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Feb 24, 2020
@openshift-ci-robot
Copy link
Contributor

@abhi-kn: This pull request references Bugzilla bug 1806518, which is valid.

In response to this:

Bug 1806518: Monitoring Dashboard metrics corrections

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@christianvogt
Copy link
Contributor

@serenamarie125 please have a look

@invincibleJai
Copy link
Member

@abhi-kn PR needs rebase cc @vikram-raj

@abhi-kn
Copy link
Contributor Author

abhi-kn commented Feb 26, 2020

/retest

@openshift-ci-robot
Copy link
Contributor

@abhi-kn: This pull request references Bugzilla bug 1806518, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.5.0) matches configured target release for branch (4.5.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1806518: Monitoring Dashboard metrics corrections

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@invincibleJai
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 27, 2020
@abhi-kn
Copy link
Contributor Author

abhi-kn commented Feb 27, 2020

/cherry-pick release-4.4

@openshift-cherrypick-robot

@abhi-kn: once the present PR merges, I will cherry-pick it on top of release-4.4 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@christianvogt
Copy link
Contributor

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhi-kn, christianvogt, invincibleJai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 27, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

12 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

@abhi-kn: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/verify fe3f6c3 link /test verify
ci/prow/e2e-gcp-console 019eb0b link /test e2e-gcp-console

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit f978e6e into openshift:master Feb 28, 2020
@openshift-ci-robot
Copy link
Contributor

@abhi-kn: All pull requests linked via external trackers have merged. Bugzilla bug 1806518 has been moved to the MODIFIED state.

In response to this:

Bug 1806518: Monitoring Dashboard metrics corrections

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andrewballantyne
Copy link
Contributor

/cherry-pick release-4.4

@openshift-cherrypick-robot

@andrewballantyne: new pull request created: #4566

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@spadgett spadgett added this to the v4.5 milestone Mar 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. component/dev-console Related to dev-console kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.