Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CI test cluster version to 1.16 #1316

Merged

Conversation

andreyvelich
Copy link
Member

It seems that we can't create new GKE cluster with 1.14 version.

Ref: https://cloud.google.com/kubernetes-engine/docs/release-notes#august_27_2020_r28.
I update version to 1.16, KF serving uses it also.

/assign @johnugeorge @gaocegege
/cc @yuzisun

@kubeflow-bot
Copy link

This change is Reviewable

@andreyvelich
Copy link
Member Author

/retest

@yuzisun
Copy link
Member

yuzisun commented Aug 27, 2020

looks like getting similar errors as KFServing was having eariler

W
Activating service-account
ERROR: (gcloud.auth.activate-service-account) There was a problem refreshing your current auth tokens: Unable to find the server at oauth2.googleapis.com
Please run:
 $ gcloud auth login
to obtain new credentials, or if you have already logged in with a
different account:
 $ gcloud config set account ACCOUNT
to select an already authenticated account to use.

@andreyvelich
Copy link
Member Author

@yuzisun Yes, sometime this error appears, I don't know why.
v1apha3 workflow was successful: http://testing-argo.kubeflow.org/workflows/kubeflow-test-infra/kubeflow-katib-presubmit-e2e-v1alpha3-1316-1e0684c-6208-8d16?tab=workflow

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

@yuzisun Did you fix it somehow for KF Serving ?

@yuzisun
Copy link
Member

yuzisun commented Aug 27, 2020

@yuzisun Did you fix it somehow for KF Serving ?

no, it is random, sometimes retry works, I think it is gcp issue.

@andreyvelich
Copy link
Member Author

/retest

8 similar comments
@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

5 similar comments
@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

@johnugeorge @gaocegege I added part to fix connection error in activate service account part.
The tests passed, should we try to merge this change?

@johnugeorge
Copy link
Member

Is this a gke bug of not getting activated?

@andreyvelich
Copy link
Member Author

@johnugeorge I think it is a GCP networking issue, KF Serving has the same: kserve/kserve#979 (comment)

@gaocegege
Copy link
Member

/lgtm

@andreyvelich
Copy link
Member Author

Instead of this adding try catch on Activate Service Account, we can try to add retryStrategy to Argo, like what they are trying to do in KFServing.
What do you think @gaocegege @johnugeorge ?

@johnugeorge
Copy link
Member

It looks much more clean right?

@gaocegege
Copy link
Member

I think it is better!

@andreyvelich
Copy link
Member Author

The test is passed with retry strategy. Can you approve it @gaocegege @johnugeorge ?

@johnugeorge
Copy link
Member

/lgtm

@johnugeorge
Copy link
Member

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 0b7a5f2 into kubeflow:master Sep 1, 2020
@andreyvelich andreyvelich deleted the update-ci-cluster-version-1.16 branch October 6, 2021 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants