Skip to content
This repository has been archived by the owner on Aug 17, 2023. It is now read-only.

Enable istio injection in Kubeflow namespace #131

Merged
merged 9 commits into from
Mar 23, 2020

Conversation

Bobgy
Copy link
Contributor

@Bobgy Bobgy commented Dec 10, 2019

/cc @jlewi
Can you help me review?

Resolves kubeflow/kubeflow#3866
This PR is imitating #123, which is similar


This change is Reviewable

@Bobgy
Copy link
Contributor Author

Bobgy commented Dec 10, 2019

/assign @kkasravi

@jlewi
Copy link
Contributor

jlewi commented Dec 10, 2019

@Bobgy This change looks good but before we can merge this change, I think we need to update kubeflow/manifests and disable ISTIO side car injection for most services to preserve the current behavior.

@Bobgy
Copy link
Contributor Author

Bobgy commented Dec 11, 2019

@jlewi I didn't manage to get enough time to do this today.
Just asking if there's any documentation to build kfctl by myself? I want to build a specific kfctl just for current stage bugbash.

@jlewi
Copy link
Contributor

jlewi commented Dec 11, 2019

@Bobgy you should just be able to run make

kfctl/Makefile

Line 122 in 10c8271

build-kfctl: deepcopy generate fmt vet

@Bobgy
Copy link
Contributor Author

Bobgy commented Dec 12, 2019

@jlewi Thanks! I will build a binary by myself first.

/hold for kubeflow manifest change

@Bobgy
Copy link
Contributor Author

Bobgy commented Dec 12, 2019

FYI, I built the binary directly, it's working well. Thanks!

@Bobgy
Copy link
Contributor Author

Bobgy commented Jan 6, 2020

@jlewi We tried using

kfctl/Makefile

Line 126 in 10c8271

build-kfctl-tgz: build-kfctl
, but the built tar package doesn't work for Mac OS. When we execute kfctl, it errors

cannot execute binary file

it seems it is still built for linux. How do we build one for Mac?

/cc @gaoning777

@jlewi
Copy link
Contributor

jlewi commented Jan 7, 2020

@Bobgy see kubeflow/kfctl/#161

@gaoning777
Copy link

Awesome. I guess it should be working from #162, then.

@Bobgy
Copy link
Contributor Author

Bobgy commented Jan 29, 2020

The last blocker: kubeflow/manifests#716

UPDATE: I found the fix, submitted new PR: kubeflow/manifests#804

@jlewi
Copy link
Contributor

jlewi commented Jan 29, 2020

Looks like a bunch of tests are failing.

@Bobgy
Copy link
Contributor Author

Bobgy commented Jan 30, 2020

I read the error message saying there is an error running kustomize apply for 0.7-branch, because webhook doesn't respond.

That's expected because with the flag turned on webhook is inaccessible by default.

Only when using master branch, would the deploy work.

@jlewi
Copy link
Contributor

jlewi commented Jan 30, 2020

It looks like two workflows failed.
https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubeflow_kfctl/131/kubeflow-kfctl-presubmit/1222548130831011840/

The first failure is the upgrade test. The build deploy step failed. Here are the logs
kubeflow-kfctl-presubmit-kfctl-upgrade-131-60282bb-1840-be2e-2151024252.log.txt

The error is

2020-01-29T16:14:22.161951781Z  main            util.py                     72 INFO     failed to apply:  (kubeflow.error): Code 500 with message: kfApp Apply failed for kustomize:  (kubeflow.error): Code 500 with message: Apply.Run  Error error when creating "/tmp/kout371770818": Internal error occurred: failed calling webhook "pilot.validation.istio.io": Post https://istio-galley.istio-system.svc:443/admitpilot?timeout=30s: No SSH tunnels currently open. Were the targets able to accept an ssh-key for user "gke-a2f5e105802182d81b09"?

The test is trying to apply the 0.7.0 config.

2020-01-29T15:55:05.423739423Z  wait            time="2020-01-29T15:55:05Z" level=info msg="Executor (version: v2.2.1, build_date: 2018-10-11T16:27:29Z) initialized with template:\nactiveDeadlineSeconds: 3000\narchiveLocation: {}\ncontainer:\n  command:\n  - pytest\n  - kfctl_go_test.py\n  - -s\n  - --app_name=kfctl-2eae\n  - --config_path=https://raw.githubusercontent.com/kubeflow/manifests/v0.7-branch/kfdef/kfctl_gcp_iap.0.7.0.yaml\n 

This looks like it might be related to the GKE 1.14 bug with workload identity. This should be fixed in the 0.7.1 manifests but I'm guessing not the 0.7.0 configs.

@richardsliu what is the best way to fix the upgrade tests?

@Bobgy Could you please file a P0 bug about the upgrade test failures and assign to @richardsliu

Looking at our periodic test grid.
https://k8s-testgrid.appspot.com/sig-big-data#kubeflow-kfctl-postsubmit&group-by-hierarchy-pattern=%5B%5Cw-%5D%2B

It looks like the build_deploy step is passing. So you might need to rebase to pick up changes to fix the test.

@Bobgy
Copy link
Contributor Author

Bobgy commented Jan 31, 2020

Thanks for the investigation, I will continue efforts on this after vacation.

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 17, 2020

/retest

@jlewi
Copy link
Contributor

jlewi commented Mar 17, 2020

@Bobgy do you know what the test failures are? Is it still TFJob? Did you try patching the test to not run with ISTIO side car injection?

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 17, 2020

@jlewi, thanks for following up!

@Bobgy do you know what the test failures are? Is it still TFJob?

Yes, tfjob-smoke-test is still timing out without any explicit error messages.
The actual error should be very clear if I can look at tfjob pod status or log, but even after I requested permission in kubeflow/internal-acls#221, I am still not able to view kubeflow-ci and kubeflow-ci-deployment projects in GCP. If that's resolved, I can take a look.

Did you try patching the test to not run with ISTIO side car injection?

I tried patching the test in b3ddd39, but now I understand I wasn't patching the right yaml. I should be patching TFJob, but I haven't found it yet.

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 17, 2020

OK, now I understand this line: https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubeflow_kfctl/131/kubeflow-kfctl-presubmit/1239771165149368320/#1:build-log.txt%3A749 is actually calling test in https://github.com/kubeflow/tf-operator/blob/master/py/kubeflow/tf_operator/simple_tfjob_tests.py

but I still don't know which version is it pulling, is it tf operator's master? Should I send a PR in tf operator repo to add that annotation?

@jlewi
Copy link
Contributor

jlewi commented Mar 17, 2020

The google groups need to be sync'd manually (there's an open issue to fix that). So you may or may not have been added to the group and that could explain why you still don't have access (kubeflow/internal-acls#22).

If you need to fix the test you will need to submit a PR to the repo that fixes the test.

I believe its pulling the test from tf-operator at HEAD

"kubeflow/tf-operator@HEAD"

/cc @johnugeorge @richardsliu

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 18, 2020

Thanks for the help! I will try to make a PR for tf-operator test

@jlewi
Copy link
Contributor

jlewi commented Mar 19, 2020

@Bobgy I just checked and it looks like the groups hadn't been sync'd since you submitted your PR to join ci-viewer team. I sync'd them and you should now be a member.

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 20, 2020

/retest
upstream tfjob test added annotation to disable sidecar injection

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 20, 2020

@jlewi Thanks! I verified I can see kubeflow-ci-deployment project logs now

@jlewi
Copy link
Contributor

jlewi commented Mar 20, 2020

Fantastic work! Thanks for the persistence.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Bobgy
Copy link
Contributor Author

Bobgy commented Mar 23, 2020

@jlewi Thanks for your help!

/unhold

@k8s-ci-robot k8s-ci-robot merged commit 6fc3f29 into kubeflow:master Mar 23, 2020
@Bobgy Bobgy deleted the enable_istio_injection_in_kf_ns branch March 23, 2020 23:15
richardsliu added a commit that referenced this pull request Apr 6, 2020
crobby pushed a commit to crobby/kfctl that referenced this pull request Feb 25, 2021
* Enable istio injection in Kubeflow namespace

* Test disabling sidecar in e2e test workflow

* Update kfctl_e2e_workflow.py

* Update kfctl_e2e_workflow.py

* Update kfctl_e2e_workflow.py

* Update kfctl_e2e_workflow.py
crobby pushed a commit to crobby/kfctl that referenced this pull request Feb 25, 2021
crobby pushed a commit to crobby/kfctl that referenced this pull request Jun 3, 2021
Co-authored-by: Kebechet <noreply+kebechet@redhat.com>
crobby pushed a commit to crobby/kfctl that referenced this pull request Jun 3, 2021
Co-authored-by: Kebechet <noreply+kebechet@redhat.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable istio-injection in kubeflow namespace
5 participants