Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1780405: Gather ~10 metrics that tell us workloads that are being used #579

Merged

Conversation

smarterclayton
Copy link
Contributor

These are the first metrics that show rough details of what is going on for
a cluster so that we can assess rough usage. It captures both scale (number
of particular workload types) and gives us a small hint about usage type
(are people using DCs or deployments, statefulsets or jobs). The containers
usage can be used to assess approx container per pod numbers when
contrasted to pod numbers, which is also useful.

For cardinality 10, we get a much better insight into whether this cluster
is in use. We can approximate infrastructure resource count against the
user's workload resource count by looking at empty clusters, but sheer
scale helps us determine better values.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 5, 2019
@smarterclayton
Copy link
Contributor Author

@derekwaynecarr @bparees this is the first attempt to capture a small number of high value metrics around usage. We obviously can't get everything, but this answers some fundamental questions. Suggestions on other super critical metrics (as seen by the whole product) might be useful.

Copy link
Contributor

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold

Until openshift/telemeter#270 is merged

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 5, 2019
@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 5, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton, squat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 5, 2019
@lilic
Copy link
Contributor

lilic commented Dec 5, 2019

Actually we don't need to hold as that PR you referenced targeting 4.3 and this is targeting master, the PR for whitelisting for that has already been merged in telemeter master.

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 5, 2019
@lilic
Copy link
Contributor

lilic commented Dec 5, 2019

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

8 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

- expr: count(count (kube_pod_restart_policy{type!="Always",namespace!~"openshift-.+"})
by (namespace,pod))
record: cluster:usage:pods:terminal:workload:sum
- expr: sum(max (kubelet_containers_per_pod_count_sum) by (instance))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
- expr: sum(max (kubelet_containers_per_pod_count_sum) by (instance))
- expr: sum(max(kubelet_containers_per_pod_count_sum) by (instance))

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@bparees
Copy link
Contributor

bparees commented Dec 5, 2019

@adambkaplan fyi, there will be metrics about number of build objects on clusters in telemeter w/ this change

@bparees
Copy link
Contributor

bparees commented Dec 5, 2019

/hold
per @lilic's comment about failing tests

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 5, 2019
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Dec 5, 2019
@openshift-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 5, 2019
These are the first metrics that show rough details of what is going
on for a cluster so that we can assess rough usage. It captures both
scale (number of particular workload types) and gives us a small hint
about usage type (are people using DCs or deployments, statefulsets or
jobs). The containers usage can be used to assess approx container per
pod numbers when contrasted to pod numbers, which is also useful.

For cardinality 10, we get a much better insight into whether this
cluster is in use.  We can approximate infrastructure resource count
against the user's workload resource count by looking at empty clusters,
but sheer scale helps us determine better values.
@smarterclayton
Copy link
Contributor Author

Fixed, applying label

@smarterclayton smarterclayton added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Dec 5, 2019
@smarterclayton
Copy link
Contributor Author

/cherry-pick release-4.3

@openshift-cherrypick-robot

@smarterclayton: once the present PR merges, I will cherry-pick it on top of release-4.3 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 80312c8 into openshift:master Dec 5, 2019
@openshift-cherrypick-robot

@smarterclayton: #579 failed to apply on top of branch "release-4.3":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	assets/prometheus-k8s/rules.yaml
M	jsonnet/rules.jsonnet
M	pkg/manifests/bindata.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/manifests/bindata.go
CONFLICT (content): Merge conflict in pkg/manifests/bindata.go
Auto-merging jsonnet/rules.jsonnet
Auto-merging assets/prometheus-k8s/rules.yaml
Patch failed at 0001 jsonnet: Gather ~10 metrics that tell us workloads that are being used

In response to this:

/cherry-pick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@smarterclayton smarterclayton changed the title jsonnet: Gather ~10 metrics that tell us workloads that are being used Bug 1780405: Gather ~10 metrics that tell us workloads that are being used Dec 5, 2019
@openshift-ci-robot
Copy link
Contributor

@smarterclayton: All pull requests linked via external trackers have merged. Bugzilla bug 1780405 has been moved to the MODIFIED state.

In response to this:

Bug 1780405: Gather ~10 metrics that tell us workloads that are being used

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants