Skip to content
This repository has been archived by the owner on Oct 21, 2020. It is now read-only.

Add metrics server support for provision controller #796

Merged
merged 1 commit into from
Jun 19, 2018

Conversation

cofyc
Copy link
Contributor

@cofyc cofyc commented Jun 13, 2018

What this PR does / why we need it:

This pr add metrics server support for provision controller.

Like kubelet's storage_operation_xxx metrics, it's better we can have some metrics to monitor latency of provision/delete operations and failures in provisioner.

Current metrics:

  • controller_persistentvolumeclaim_provision_total
  • controller_persistentvolumeclaim_provision_failed_total
  • controller_persistentvolumeclaim_provision_duration_seconds
  • controller_persistentvolume_delete_total
  • controller_persistentvolume_delete_failed_total
  • controller_persistentvolume_delete_duration_seconds

Metrics server is disabled by default. Each provisioner can enable based on needs.
rbd/cephfs example: https://github.com/kubernetes-incubator/external-storage/pull/797/files

Examples:

...
controller_persistentvolume_delete_duration_seconds_bucket{class="rbd",le="10"} 4
controller_persistentvolume_delete_duration_seconds_bucket{class="rbd",le="+Inf"} 4
controller_persistentvolume_delete_duration_seconds_sum{class="rbd"} 1.0242772740000001
controller_persistentvolume_delete_duration_seconds_count{class="rbd"} 4
controller_persistentvolume_delete_total{class="rbd"} 4
...
controller_persistentvolumeclaim_discovery_duration_seconds_bucket{class="rbd",le="10"} 4
controller_persistentvolumeclaim_discovery_duration_seconds_bucket{class="rbd",le="+Inf"} 4
controller_persistentvolumeclaim_discovery_duration_seconds_sum{class="rbd"} 1.467650504
controller_persistentvolumeclaim_discovery_duration_seconds_count{class="rbd"} 4
controller_persistentvolumeclaim_provision_failed_total{class="rbd"} 9
controller_persistentvolumeclaim_provision_total{class="rbd"} 4

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 13, 2018
@cofyc
Copy link
Contributor Author

cofyc commented Jun 13, 2018

/area lib
/assign @rootfs

@cofyc
Copy link
Contributor Author

cofyc commented Jun 13, 2018

cc @childsb What do you think?

@cofyc
Copy link
Contributor Author

cofyc commented Jun 14, 2018

cc @wongma7

@rootfs
Copy link
Contributor

rootfs commented Jun 18, 2018

/assign @jsafrane

Copy link
Contributor

@wongma7 wongma7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, there is just one question I have about the port

@@ -156,6 +170,12 @@ const (
DefaultRetryPeriod = 2 * time.Second
// DefaultTermLimit is used when option function TermLimit is omitted
DefaultTermLimit = 30 * time.Second
// DefaultMetricsPort is used when option function MetricsPort is omitted
DefaultMetricsPort = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a port other than random we should default to? Like some convention for all provisioners. Will the port need to be exposed via the pod yaml? Because then a convention would make it easy to distribute yamls with the port already filled in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though port 0 means picking a random port in network programming, I used it to disabling metrics server here, like kubelet --cadvisor-port flag. Sorry, comments here is a bit confusing.

Will the port need to be exposed via the pod yaml? Because then a convention would make it easy to distribute yamls with the port already filled in

It depends on provisioner implementation. Provisioner can have a non-zero metrics port and pass it to NewProvisionController() to enable metrics server by default.

My intent is to make this feature optional for backward compatibility, each provisioner can customize based on needs, e.g. disable/enable by default, choose default port. We can have some convention for default port (e.g. 8000), but we may need to add another option to toggle feature.

What do you prefer? Do you want to enable metrics server by default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm okay I agree with you actually let's disable it by default

@wongma7
Copy link
Contributor

wongma7 commented Jun 19, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 19, 2018
@wongma7 wongma7 merged commit 732d6e7 into kubernetes-retired:master Jun 19, 2018
@cofyc cofyc deleted the metrics-server branch June 20, 2018 03:09
@sandaymin123
Copy link

sandaymin123 commented Jul 2, 2018

@cofyc and @wongma7
I want to clarify if these metrics for out-of-tree flex drivers are also supported:

kubelet_volume_stats_available_bytes
kubelet_volume_stats_capacity_bytes
kubelet_volume_stats_inodes
kubelet_volume_stats_inodes_free
kubelet_volume_stats_inodes_used
kubelet_volume_stats_used_bytes``` with this PR?

@wongma7
Copy link
Contributor

wongma7 commented Jul 3, 2018

@sandaymin123 no, those stats are calculated by kubernetes for in-tree volumes that support the GetMetrics call. We will have to figure out a different solution for out-of-tree.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/lib cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants