Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for resource #2868

Merged
merged 1 commit into from
Nov 28, 2022

Conversation

Poor12
Copy link
Member

@Poor12 Poor12 commented Nov 25, 2022

Signed-off-by: Poor12 shentiecheng@huawei.com

What type of PR is this?
/kind feature

What this PR does / why we need it:
As we consider the resource distribution latency as one of the important metrics to measure the multi-cluster system, we might need some metrics to some detailed steps.

A complete distribution contains: looking for a matched policy -> apply policy -> sync work -> sync workload to target cluster.

# HELP resource_find_matched_policy_duration_seconds Duration in seconds to find a matched propagation policy for the resource template.
# TYPE resource_find_matched_policy_duration_seconds histogram
resource_find_matched_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",le="0.001"} 1
resource_find_matched_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",le="0.002"} 1
resource_find_matched_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",le="0.004"} 1
resource_find_matched_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",le="0.008"} 1
resource_find_matched_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",le="0.016"} 1
resource_find_matched_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",le="0.032"} 1
resource_find_matched_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",le="0.064"} 1
resource_find_matched_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",le="0.128"} 1
resource_find_matched_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",le="0.256"} 1
resource_find_matched_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",le="0.512"} 1
resource_find_matched_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",le="1.024"} 1
resource_find_matched_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",le="2.048"} 1
resource_find_matched_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",le="+Inf"} 1
resource_find_matched_policy_duration_seconds_sum{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default"} 0.000183851
resource_find_matched_policy_duration_seconds_count{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default"} 1
# HELP resource_apply_policy_duration_seconds Duration in seconds to apply a propagation policy for the resource template. By the result, 'error' means a resource template failed to apply the policy. Otherwise 'success'.
# TYPE resource_apply_policy_duration_seconds histogram
resource_apply_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success",le="0.001"} 0
resource_apply_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success",le="0.002"} 0
resource_apply_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success",le="0.004"} 0
resource_apply_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success",le="0.008"} 0
resource_apply_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success",le="0.016"} 1
resource_apply_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success",le="0.032"} 1
resource_apply_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success",le="0.064"} 2
resource_apply_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success",le="0.128"} 2
resource_apply_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success",le="0.256"} 2
resource_apply_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success",le="0.512"} 2
resource_apply_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success",le="1.024"} 2
resource_apply_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success",le="2.048"} 2
resource_apply_policy_duration_seconds_bucket{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success",le="+Inf"} 2
resource_apply_policy_duration_seconds_sum{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success"} 0.052528434
resource_apply_policy_duration_seconds_count{apiVersion="apps/v1",kind="Deployment",name="nginx",namespace="default",result="success"} 2
# HELP policy_apply_attempts_total Number of attempts to be applied for a propagation policy. By the result, 'error' means a resource template failed to apply the policy. Otherwise 'success'.
# TYPE policy_apply_attempts_total counter
policy_apply_attempts_total{name="nginx-propagation",namespace="default",result="success"} 2
# HELP binding_sync_work_duration_seconds Duration in seconds to sync works for a binding object. By the result, 'error' means a binding failed to sync works. Otherwise 'success'.
# TYPE binding_sync_work_duration_seconds histogram
binding_sync_work_duration_seconds_bucket{name="nginx-deployment",namespace="default",result="success",le="0.001"} 2
binding_sync_work_duration_seconds_bucket{name="nginx-deployment",namespace="default",result="success",le="0.002"} 2
binding_sync_work_duration_seconds_bucket{name="nginx-deployment",namespace="default",result="success",le="0.004"} 2
binding_sync_work_duration_seconds_bucket{name="nginx-deployment",namespace="default",result="success",le="0.008"} 2
binding_sync_work_duration_seconds_bucket{name="nginx-deployment",namespace="default",result="success",le="0.016"} 2
binding_sync_work_duration_seconds_bucket{name="nginx-deployment",namespace="default",result="success",le="0.032"} 5
binding_sync_work_duration_seconds_bucket{name="nginx-deployment",namespace="default",result="success",le="0.064"} 9
binding_sync_work_duration_seconds_bucket{name="nginx-deployment",namespace="default",result="success",le="0.128"} 10
binding_sync_work_duration_seconds_bucket{name="nginx-deployment",namespace="default",result="success",le="0.256"} 10
binding_sync_work_duration_seconds_bucket{name="nginx-deployment",namespace="default",result="success",le="0.512"} 10
binding_sync_work_duration_seconds_bucket{name="nginx-deployment",namespace="default",result="success",le="1.024"} 10
binding_sync_work_duration_seconds_bucket{name="nginx-deployment",namespace="default",result="success",le="2.048"} 10
binding_sync_work_duration_seconds_bucket{name="nginx-deployment",namespace="default",result="success",le="+Inf"} 10
binding_sync_work_duration_seconds_sum{name="nginx-deployment",namespace="default",result="success"} 0.3456216
binding_sync_work_duration_seconds_count{name="nginx-deployment",namespace="default",result="success"} 10
# HELP work_sync_workload_duration_seconds Duration in seconds to sync the workload to a target cluster. By the result, 'error' means a work failed to sync workloads. Otherwise 'success'.
# TYPE work_sync_workload_duration_seconds histogram
work_sync_workload_duration_seconds_bucket{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error",le="0.001"} 0
work_sync_workload_duration_seconds_bucket{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error",le="0.002"} 0
work_sync_workload_duration_seconds_bucket{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error",le="0.004"} 0
work_sync_workload_duration_seconds_bucket{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error",le="0.008"} 0
work_sync_workload_duration_seconds_bucket{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error",le="0.016"} 0
work_sync_workload_duration_seconds_bucket{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error",le="0.032"} 1
work_sync_workload_duration_seconds_bucket{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error",le="0.064"} 1
work_sync_workload_duration_seconds_bucket{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error",le="0.128"} 1
work_sync_workload_duration_seconds_bucket{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error",le="0.256"} 1
work_sync_workload_duration_seconds_bucket{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error",le="0.512"} 1
work_sync_workload_duration_seconds_bucket{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error",le="1.024"} 1
work_sync_workload_duration_seconds_bucket{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error",le="2.048"} 1
work_sync_workload_duration_seconds_bucket{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error",le="+Inf"} 1
work_sync_workload_duration_seconds_sum{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error"} 0.02985906
work_sync_workload_duration_seconds_count{name="karmada-impersonator-7cbb6bd5c9",namespace="karmada-es-member1",result="error"} 1

Which issue(s) this PR fixes:
Part of #2472

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`Instrumentation`: Introduced the `resource_find_matched_policy_duration_seconds`, `resource_apply_policy_duration_seconds`, `policy_apply_attempts_total`, `binding_sync_work_duration_seconds`, `work_sync_workload_duration_seconds` metrics.

@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 25, 2022
@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 25, 2022
@RainbowMango RainbowMango modified the milestones: v1.5, v1.4 Nov 25, 2022
Signed-off-by: Poor12 <shentiecheng@huawei.com>
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 28, 2022
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 28, 2022
@karmada-bot karmada-bot merged commit 9f660f8 into karmada-io:master Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants