Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace the existing PR and release blocking kube-up based jobs with CAPI jobs #82532

Open
neolit123 opened this issue Sep 10, 2019 · 12 comments

Comments

@neolit123
Copy link
Member

commented Sep 10, 2019

had a discussion with @timothysc about this yesterday.

this is a tracking issue for replacing the pending PR and release blocking jobs based on kube-up with Cluster API (CAPI) provider alternatives over time.

ideally:

  • the replacement jobs should co-existing in parallel with the kube-up jobs.
  • once the replacement jobs are stable the kube-up jobs should be removed.

xref issue for removing the /cluster directory:
#78995
which can be considered a parent of this one.

TODO:

  • enumerate the current list of PR and release blocking jobs
    assigned: @neolit123

  • experiment replacing pull-kubernetes-e2e-gce with a CAPA (cluster-api AWS) job
    assigned: TODO
    ATM CAPA is the most complete provider.
    there was a discussion here and during the sig-arch code organization meeting about this being the first step we can take.
    ideally we should be using CAPG (GCP provider).

  • estimate if/how can we replace the big SIG scalability jobs using Cluster API,
    assigned: @timothysc
    @timothysc mentioned that currently there are TODOs there to enable support for large number of nodes in CAPI.

  • replace SIG-Windows kube-up jobs
    assigned: TODO
    this is one of the trickier items actually, since CAPI does not support Windows.

/sig testing cloud-provider cluster-lifecycle release scalability
/kind feature
/area test
/priority important-longterm
/assign @spiffxp @justinsb @timothysc
/assign

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2019

@neolit123: The label(s) area/testing cannot be appled. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

had a discussion with @timothysc about this yesterday.

this is a tracking issue for replacing the pending PR and release blocking jobs based on kube-up with Cluster API (CAPI) provider alternatives over time.

ideally:

  • the replacement jobs should co-existing in parallel with the kube-up jobs.
  • once the replacement jobs are stable the kube-up jobs should be removed.

xref issue for removing the /cluster directory:
#78995
which can be considered a parent of this one.

TODO:

  • enumerate the current list of PR and release blocking jobs
    assigned: @neolit123

  • experiment replacing pull-kubernetes-e2e-gce with a CAPA (cluster-api AWS) job
    assigned: TODO
    ATM CAPA is the most complete provider.
    there was a discussion here and during the sig-arch code organization meeting about this being the first step we can take.
    ideally we should be using CAPG (GCP provider).

  • estimate if/how can we replace the big SIG scalability jobs using Cluster API,
    assigned: @timothysc
    mentioned that currently there are TODOs there to enable support for large number of nodes in CAPI.

/sig testing cloud-provider cluster-lifecycle release scalability
/kind feature
/area testing
/priority important-longterm
/assign @spiffxp @justinsb @timothysc
/assign

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@neolit123

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2019

/area test

@neolit123

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2019

experiment replacing pull-kubernetes-e2e-gce with a CAPA (cluster-api AWS) job

i don't have information on the state of CAPA e2e tests ATM; we had AWS account issues there.
are we ready to add a CAPA PR non-blocking job (potentially making it blocking)?

@detiber do you know?

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2019

@neolit123: The label(s) area/testing cannot be appled. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

had a discussion with @timothysc about this yesterday.

this is a tracking issue for replacing the pending PR and release blocking jobs based on kube-up with Cluster API (CAPI) provider alternatives over time.

ideally:

  • the replacement jobs should co-existing in parallel with the kube-up jobs.
  • once the replacement jobs are stable the kube-up jobs should be removed.

xref issue for removing the /cluster directory:
#78995
which can be considered a parent of this one.

TODO:

  • enumerate the current list of PR and release blocking jobs
    assigned: @neolit123

  • experiment replacing pull-kubernetes-e2e-gce with a CAPA (cluster-api AWS) job
    assigned: TODO
    ATM CAPA is the most complete provider.
    there was a discussion here and during the sig-arch code organization meeting about this being the first step we can take.
    ideally we should be using CAPG (GCP provider).

  • estimate if/how can we replace the big SIG scalability jobs using Cluster API,
    assigned: @timothysc
    @timothysc mentioned that currently there are TODOs there to enable support for large number of nodes in CAPI.

/sig testing cloud-provider cluster-lifecycle release scalability
/kind feature
/area testing
/priority important-longterm
/assign @spiffxp @justinsb @timothysc
/assign

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@neolit123

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2019

i'd appreciate comments on CAPG (GCP) vs CAPA (AWS).
is it a strict requirement to have GCP jobs that provide signal for Google infrastructure or are we OK with having AWS infrastructure replacements? do we care about infrastructure or are we going to focus on testing k8s?

@neolit123

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2019

/sig windows

@timothysc

This comment has been minimized.

Copy link
Member

commented Sep 10, 2019

There are open PR's to help uplevel GCP provider.
/cc @vincepri

@timothysc

This comment has been minimized.

Copy link
Member

commented Sep 10, 2019

/assign @dims

@alejandrox1

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2019

/cc

@vincepri

This comment has been minimized.

Copy link
Member

commented Sep 10, 2019

GCP PR is here kubernetes-sigs/cluster-api-provider-gcp#143

Would love to get some feedback / requirements to see what are the expectations

@mariantalla

This comment has been minimized.

Copy link
Contributor

commented Sep 17, 2019

I can start work on the second TODO (=experiment replacing pull-kubernetes-e2e-gce with a CAPA (cluster-api AWS) job)

Here's what I was had in mind more specifically:

Local dev environment setup

  • Get a cluster (perhaps even a kind one will work) and IaaS account (using GCP would help with identifying what, if anything, needs to be added to the provider, AWS could be an alternative as the Cluster API provider seems more complete from what I read)
  • Deploy prow on it
  • Run pull-kubernetes-e2e-gce and have it run successfully using that local prow instance
  • Take notes on 1) flake rate 2) duration (this data can even come from existing runs of the job)

Experimentation

  • Find usage of k/k/cluster/<script> in anything that gets called by pull-kubernetes-e2e-gce
  • For each mention, replace with Cluster API calls
  • Document flake rate and duration; compare with values before (anything else)
  • Write up and share

As always please let me know

  • what you think, especially if I've said something outrageous 😅
  • if you want to divide and conquer, or if someone has work in progress on this bit.
@dims

This comment has been minimized.

Copy link
Member

commented Sep 18, 2019

@mariantalla for CAPA (cluster-api AWS) you will need a AWS IaaS account i believe. @justinsb @detiber may be able to help get access ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.