Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure cloud operation count metrics for azure cloud provider #82574

Open
wants to merge 1 commit into
base: master
from

Conversation

@kkmsft
Copy link
Contributor

commented Sep 11, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

What this PR does / why we need it:
Currently there are no metrics which records the number of cloud operations and also the ones which got rate limited. This PR adds two metrics - one for successful operation and other for rate limited operation.

This PR was arrived in the context of issue kubernetes/cloud-provider-azure#228

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:
Testing performed by back porting to the cloud-provider-azure code.
TODO: Test after the new dependancy code is vendored in cloud-provider-azure.

Does this PR introduce a user-facing change?:

Added cloud operation count metrics to azure cloud controller manager.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2019

Hi @kkmsft. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kkmsft

This comment has been minimized.

Copy link
Contributor Author

commented Sep 11, 2019

/area platform/azure

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2019

@kkmsft: The label(s) sig/azure cannot be appled. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/sig azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2019

@kkmsft: The label(s) sig/platform/azure cannot be appled. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/sig platform/azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot requested review from andyzhangx and justaugustus Sep 11, 2019

@kkmsft

This comment has been minimized.

Copy link
Contributor Author

commented Sep 11, 2019

/assign @feiskyer

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2019

@kkmsft: The label(s) area/platform/azure cannot be appled. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/area platform/azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andyzhangx

This comment has been minimized.

Copy link
Member

commented Sep 11, 2019

/ok-to-test

@andyzhangx

This comment has been minimized.

Copy link
Member

commented Sep 11, 2019

/priority important-soon
/sig cloud-provider
/area provider/azure

@@ -201,7 +204,7 @@ func (az *azVirtualMachinesClient) CreateOrUpdate(ctx context.Context, resourceG
klog.V(10).Infof("azVirtualMachinesClient.CreateOrUpdate(%q, %q): end", resourceGroupName, VMName)
}()

mc := newMetricContext("vm", "create_or_update", resourceGroupName, az.client.SubscriptionID, source)
mc.Start()

This comment has been minimized.

Copy link
@feiskyer

feiskyer Sep 11, 2019

Member

why did you add this? I don't think this is required

This comment has been minimized.

Copy link
@kkmsft

kkmsft Sep 11, 2019

Author Contributor

We want to measure the time taken for the specific cloud operation. This should not include time from any other items like logging, hence the need to start the timer.

@@ -191,7 +191,10 @@ func newAzVirtualMachinesClient(config *azClientConfig) *azVirtualMachinesClient

func (az *azVirtualMachinesClient) CreateOrUpdate(ctx context.Context, resourceGroupName string, VMName string, parameters compute.VirtualMachine, source string) (resp *http.Response, err error) {
// /* Write rate limiting */
mc := newMetricContext("vm", "create_or_update", resourceGroupName, az.client.SubscriptionID, source)
mc.Count()

This comment has been minimized.

Copy link
@feiskyer

feiskyer Sep 11, 2019

Member

The count here is actually not true. When mc.Count() is invoked and ratelimited by the following line, then the requests won't be sent to Azure.

I think we should move this before az.client.CreateOrUpdate(). Same all other clients.

This comment has been minimized.

Copy link
@kkmsft

kkmsft Sep 11, 2019

Author Contributor

There are two metrics - one which is the count of operations which is actually made by various components. And then the count which are ratelimited. We want to be able to know how many calls we are making and how many we are rate limiting. Just knowing how many were sent to azure is not enough information to arrive at conclusions. This gives us a better picture. Also, we can gather the information of how many went to azure from these two counters.

This comment has been minimized.

Copy link
@feiskyer

feiskyer Sep 12, 2019

Member

I mean we should expose the counts sent to Azure as well as the ratelimited. The total count could be calculated by sum both.

@@ -308,8 +320,11 @@ func newAzInterfacesClient(config *azClientConfig) *azInterfacesClient {
}

func (az *azInterfacesClient) CreateOrUpdate(ctx context.Context, resourceGroupName string, networkInterfaceName string, parameters network.Interface) (resp *http.Response, err error) {
mc := newMetricContext("interfaces", "create_or_update", resourceGroupName, az.client.SubscriptionID, "")

This comment has been minimized.

Copy link
@aramase

aramase Sep 11, 2019

Member

since the prefix and request in newMetricContext are mostly repeated, we should define as const and use it.

This comment has been minimized.

Copy link
@kkmsft

kkmsft Sep 12, 2019

Author Contributor

Thanks @aramase. will address in future PR.

This comment has been minimized.

Copy link
@khenidak

khenidak Sep 16, 2019

Contributor

Let us have this change as part of this PR please. Let us have a private var, create on first call NewMetricContext()

This comment has been minimized.

Copy link
@kkmsft

kkmsft Sep 17, 2019

Author Contributor

As discussed, the metrics context is fresh for every operation.

@justaugustus

This comment has been minimized.

Copy link
Member

commented Sep 12, 2019

/uncc
/milestone v1.17

@k8s-ci-robot k8s-ci-robot removed the request for review from justaugustus Sep 12, 2019

@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Sep 12, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 12, 2019

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kkmsft
To complete the pull request process, please assign feiskyer
You can assign the PR to them by writing /assign @feiskyer in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kkmsft kkmsft force-pushed the kkmsft:azure_metrics branch from d58730b to b15444f Sep 13, 2019

@feiskyer
Copy link
Member

left a comment

Thanks, KK. Could you squash the commits?

@kkmsft kkmsft force-pushed the kkmsft:azure_metrics branch from b15444f to 036d01b Sep 16, 2019

@kkmsft kkmsft force-pushed the kkmsft:azure_metrics branch from 036d01b to b3ecf28 Sep 16, 2019

@kkmsft

This comment has been minimized.

Copy link
Contributor Author

commented Sep 16, 2019

@feiskyer - Thank you, squashed the commits.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 16, 2019

@kkmsft: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-kubemark-e2e-gce-big b3ecf28 link /test pull-kubernetes-kubemark-e2e-gce-big
pull-kubernetes-e2e-gce b3ecf28 link /test pull-kubernetes-e2e-gce

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.