Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to C3 machine types for prow build cluster #6525

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

upodroid
Copy link
Member

@upodroid upodroid commented Mar 5, 2024

/cc @ameukam @BenTheElder @xmudrii

We should get better performance out of these instances. I also used a custom type to insert 2 extra cores for all the extra daemonset we need to run.

C3 instance types are a bit tricky. They don't support custom sizing so our options are either c3-standard-8-lssd(same problem as today) or c3-standard-22-lssd(no guarantees on a job being dedicated to a specific node).
https://cloud.google.com/compute/docs/general-purpose-machines#disk_and_capacity_limits_2

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 5, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: upodroid

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/infra Infrastructure management, infrastructure design, code in infra/ area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters approved Indicates a PR has been approved by an approver from all required OWNERS files. area/terraform Terraform modules, testing them, writing more of them, code in infra/gcp/clusters/ sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Mar 5, 2024
@upodroid upodroid changed the title Switch to N2 machine types Switch to N2 machine types for prow build cluster Mar 5, 2024
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 5, 2024
@BenTheElder
Copy link
Member

BenTheElder commented Mar 5, 2024

I also used a custom type to insert 2 extra cores for all the extra daemonset we need to run.

We really shouldn't need 2 cores of daemonsets ...?
/hold

We should be able to tighten this up, discarding 20% of the node as overhead is excessive.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 5, 2024
@BenTheElder
Copy link
Member

Maybe we can import from the existing metrics collection?

@upodroid
Copy link
Member Author

upodroid commented Mar 5, 2024

According to the calculations here:
We could adopt c3-standard-8-lssd right now and disable calico. I'm not sure why we need a network policy enabled on the build cluster. The capacity being used by calico would be available for node-exporter

@BenTheElder
Copy link
Member

[...] and disable calico. I'm not sure why we need a network policy enabled on the build cluster. The capacity being used by calico would be available for node-exporter

That sounds good to me (have not reviewed machine types yet). I don't recall an discussion about enabling network policy and I don't think we need it, 400m CPU is quite a bit back to fit other daemonsets cc @ameukam

@BenTheElder
Copy link
Member

Just checked: k8s-prow-builds in google.com (cluster: default) has calico network policy disabled.

@upodroid
Copy link
Member Author

upodroid commented Mar 5, 2024

One more question, are we ok with nodes having 32gb memory instead of 64gb?

@BenTheElder
Copy link
Member

BenTheElder commented Mar 5, 2024

EDIT: jinx
c3-standard-8-lssd has lower memory / core (32gb for 8 cores) versus n1-highmem-8 (52 gb for 8 cores).

I'm not sure if that's a problem for any of our workloads or not. I think we might have a few that are a bit lopsided in memory>cpu requests currently

@upodroid upodroid added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Mar 5, 2024
@upodroid
Copy link
Member Author

upodroid commented Mar 5, 2024

Apply plan:

  1. apply the 1st commit using terraform
  2. Cordon and Drain the old nodepool in the console
  3. Apply the 2nd commit

After lgtm is applied on the PR.

@upodroid upodroid changed the title Switch to N2 machine types for prow build cluster Switch to C3 machine types for prow build cluster Mar 5, 2024
@BenTheElder
Copy link
Member

I think we should probably query the jobs to see if any have 30+ GB or a larger memory : core ratio than c3, if so, we can do n2?

@ameukam
Copy link
Member

ameukam commented Mar 6, 2024

[...] and disable calico. I'm not sure why we need a network policy enabled on the build cluster. The capacity being used by calico would be available for node-exporter

That sounds good to me (have not reviewed machine types yet). I don't recall an discussion about enabling network policy and I don't think we need it, 400m CPU is quite a bit back to fit other daemonsets cc @ameukam

Network Policy was enabled by default at the time the cluster was created. We never took the opportunity to disable it. +1 for the disablement.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 7, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@upodroid
Copy link
Member Author

upodroid commented Mar 7, 2024

Instead of tweaking the prowjobs, lets go with N2 instances(Ice Lake platform) and keep the same machine-type and merge that change this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/infra Infrastructure management, infrastructure design, code in infra/ area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters area/terraform Terraform modules, testing them, writing more of them, code in infra/gcp/clusters/ cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants