Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scale-test: Scrape APIServer only metrics #16029

Merged
merged 3 commits into from Oct 26, 2023
Merged

scale-test: Scrape APIServer only metrics #16029

merged 3 commits into from Oct 26, 2023

Conversation

hakuna-matatah
Copy link
Contributor

No description provided.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 18, 2023
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 18, 2023
@hakuna-matatah
Copy link
Contributor Author

/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

@hakuna-matatah
Copy link
Contributor Author

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

2 similar comments
@hakman
Copy link
Member

hakman commented Oct 19, 2023

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@hakuna-matatah
Copy link
Contributor Author

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@hakuna-matatah
Copy link
Contributor Author

/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

@kubernetes kubernetes deleted a comment from k8s-ci-robot Oct 22, 2023
@hakman
Copy link
Member

hakman commented Oct 22, 2023

/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

1 similar comment
@hakman
Copy link
Member

hakman commented Oct 23, 2023

/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

@hakman
Copy link
Member

hakman commented Oct 23, 2023

/test pull-kops-kubernetes-e2e-ubuntu-gce-build

@hakuna-matatah
Copy link
Contributor Author

Given the small scale load test is succeeding consistently and large scale isn't due to following error :

>�������Failure�3no endpoints available for service "prometheus-k8s""�ServiceUnavailable0����"�
F1020 03:14:27.263151   13549 clusterloader.go:326] Error while setting up prometheus stack: timed out waiting for the condition

As we can debug the issue due to access issues, we have to take an educated guess for now, so, it makes me think that Prometheus pod is not able to come up due to the memory requirements and given we provide [large ec2 instance type](https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/pkg/prometheus/manifests/prometheus-prometheus.yaml#L33-L35) which has 4G memory, 10G is not available for prometheus pod as per the calculation setting mentioned here - https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/pkg/prometheus/manifests/prometheus-prometheus.yaml#L40-L41

@hakuna-matatah
Copy link
Contributor Author

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@hakuna-matatah
Copy link
Contributor Author

Last test failed due to unavailability of EC2 nodes

InstanceGroup	nodes-us-east-2b	InstanceGroup "nodes-us-east-2b" did not have enough nodes 800 vs 1667
InstanceGroup	nodes-us-east-2c	InstanceGroup "nodes-us-east-2c" did not have enough nodes 891 vs 1666
```

@hakuna-matatah
Copy link
Contributor Author

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@hakuna-matatah
Copy link
Contributor Author

Same error as above, ec2 instance types doesn't seem to be available, let me add more to the mixed fleet.

InstanceGroup	nodes-us-east-2b	InstanceGroup "nodes-us-east-2b" did not have enough nodes 967 vs 1667
InstanceGroup	nodes-us-east-2c	InstanceGroup "nodes-us-east-2c" did not have enough nodes 950 vs 1666```

@hakuna-matatah
Copy link
Contributor Author

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@hakuna-matatah
Copy link
Contributor Author

/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

1 similar comment
@hakuna-matatah
Copy link
Contributor Author

/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

@hakman
Copy link
Member

hakman commented Oct 25, 2023

/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

@hakman
Copy link
Member

hakman commented Oct 25, 2023

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@hakman
Copy link
Member

hakman commented Oct 25, 2023

/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

@hakman
Copy link
Member

hakman commented Oct 25, 2023

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

1 similar comment
@hakman
Copy link
Member

hakman commented Oct 25, 2023

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@hakuna-matatah
Copy link
Contributor Author

Your cluster e2e-ff02749ef8-a423a.test-cncf-aws.k8s.io is ready
I1025 17:08:27.065287   13030 validate_cluster.go:220] (will retry): cluster passed validation 8 consecutive times
Error: validation failed: wait time exceeded during validation
Error: exit status 1

Looks like the wait time exceeded even though cluster passed validation 8 consecutive times. Will need to increase the validation time or reduce the number from 10 to 5 for consecutive successful validation checks.

@hakuna-matatah
Copy link
Contributor Author

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@hakuna-matatah
Copy link
Contributor Author

/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

1 similar comment
@hakuna-matatah
Copy link
Contributor Author

/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

@hakuna-matatah
Copy link
Contributor Author

Looks like we are experiencing throttling during the creation of cluster in the latest run

W1025 18:59:46.369792   12985 executor.go:140] error running task "AutoscalingGroup/nodes-us-east-2c.e2e-ff02749ef8-a423a.test-cncf-aws.k8s.io" (-12m11s remaining to succeed): error listing AutoscalingGroups: Throttling: Rate exceeded
	status code: 400, request id: 50cc0b52-fcb6-4845-9af3-a6865416f6d0
Error: error running tasks: deadline exceeded executing task AutoscalingGroup/nodes-us-east-2c.e2e-ff02749ef8-a423a.test-cncf-aws.k8s.io. Example error: error listing AutoscalingGroups: Throttling: Rate exceeded
	status code: 400, request id: 50cc0b52-fcb6-4845-9af3-a6865416f6d0
Error: exit status 1

@hakuna-matatah
Copy link
Contributor Author

/test presubmit-kops-aws-scale-amazonvpc-using-cl2

@hakuna-matatah
Copy link
Contributor Author

/test presubmit-kops-aws-small-scale-amazonvpc-using-cl2

@hakuna-matatah hakuna-matatah changed the title [WIP] Scrape APIServer only metrics Scrape APIServer only metrics Oct 26, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 26, 2023
@hakman hakman changed the title Scrape APIServer only metrics scale-test: Scrape APIServer only metrics Oct 26, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 26, 2023
@hakman hakman added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 26, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 26, 2023
@k8s-ci-robot k8s-ci-robot merged commit 104393c into kubernetes:master Oct 26, 2023
24 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants