Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable istio-injection in namespace profile controller creates? #3935

Closed
Jeffwan opened this issue Aug 17, 2019 · 15 comments
Closed

Disable istio-injection in namespace profile controller creates? #3935

Jeffwan opened this issue Aug 17, 2019 · 15 comments

Comments

@Jeffwan
Copy link
Member

Jeffwan commented Aug 17, 2019

/kind enhancement

Why you need this feature:
Training job by default is not working because profile controller enables istio-injection in the namespace data scientist will work on.

In order to ask istio not inject sidecar, user has to explicitly add annotations to pods like this

apiVersion: "kubeflow.org/v1"
kind: "TFJob"
metadata:
  name: "distributed-tensorflow-job"
spec:
  tfReplicaSpecs:
    Worker:
      replicas: 4
      restartPolicy: Never
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "false"
        spec:
          containers:
            - name: tensorflow
              image: gcr.io/kubeflow-ci/tf-dist-mnist-test:1.0

I am thinking if we should make it namespace level switch on by default. If there're more workloads using istio, it's worth, otherwise, it's better use pod level control.

Seems user uses jupyter notebook needs sidecar. For most other components, there's no need to inject sidecar

Describe the solution you'd like:
Namespace: istio-injection=disabled
Pod: sidecar.istio.io/inject: "true"

Anything else you would like to add:

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label improvement/enhancement to this issue, with a confidence of 0.69. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@jlewi jlewi added priority/p1 area/tfjob Issues related to TFJobs. labels Aug 19, 2019
@jlewi
Copy link
Contributor

jlewi commented Aug 19, 2019

@lluunn @kunmingg Should we not inject ISTIO side cars by default and instead just turn it on for our notebooks and other resources that need it?

@richardsliu Any idea if this was/should have been caught by the E2E tests?

@Jeffwan How does the training job fail? I would expect training jobs to still work even if they are sending traffic through ISTIO.

An issue I encountered kubeflow/examples#619 is that you might need to wait for the ISTIO side car to start before you can communicate with other services. Is it possible that's what's happening?

@Jeffwan
Copy link
Member Author

Jeffwan commented Aug 19, 2019

Yeah. I think that's similar problem

Looks like istio-proxy is not up at the beginning and it failed readiness on 15020. Once a while, sidecar starts up but tensorflow pod is already failed at that time.

k logs -f distributed-tensorflow-job-ps-1 tensorflow
/usr/local/lib/python2.7/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "/var/tf_dist_mnist/dist_mnist.py", line 303, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 124, in run
    _sys.exit(main(argv))
  File "/var/tf_dist_mnist/dist_mnist.py", line 110, in main
    mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py", line 240, in read_data_sets
    source_url + TRAIN_IMAGES)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py", line 208, in maybe_download
    temp_file_name, _ = urlretrieve_with_retry(source_url)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py", line 165, in wrapped_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py", line 190, in urlretrieve_with_retry
    return urllib.request.urlretrieve(url, filename)
  File "/usr/lib/python2.7/urllib.py", line 98, in urlretrieve
    return opener.retrieve(url, filename, reporthook, data)
  File "/usr/lib/python2.7/urllib.py", line 245, in retrieve
    fp = self.open(url, data)
  File "/usr/lib/python2.7/urllib.py", line 213, in open
    return getattr(self, name)(url)
  File "/usr/lib/python2.7/urllib.py", line 443, in open_https
    h.endheaders(data)
  File "/usr/lib/python2.7/httplib.py", line 1053, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 897, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 859, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 1270, in connect
    HTTPConnection.connect(self)
  File "/usr/lib/python2.7/httplib.py", line 836, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 575, in create_connection
    raise err
IOError: [Errno socket error] [Errno 99] Cannot assign requested address

 Normal   Pulled     5m5s                kubelet, ip-192-168-83-91.us-west-2.compute.internal  Container image "docker.io/istio/proxy_init:1.1.6" already present on machine
  Normal   Created    5m5s                kubelet, ip-192-168-83-91.us-west-2.compute.internal  Created container
  Normal   Started    5m4s                kubelet, ip-192-168-83-91.us-west-2.compute.internal  Started container
  Normal   Pulled     5m2s                kubelet, ip-192-168-83-91.us-west-2.compute.internal  Container image "gcr.io/kubeflow-ci/tf-dist-mnist-test:1.0" already present on machine
  Normal   Created    5m2s                kubelet, ip-192-168-83-91.us-west-2.compute.internal  Created container
  Normal   Started    5m2s                kubelet, ip-192-168-83-91.us-west-2.compute.internal  Started container
  Normal   Pulled     5m2s                kubelet, ip-192-168-83-91.us-west-2.compute.internal  Container image "docker.io/istio/proxyv2:1.1.6" already present on machine
  Normal   Created    5m2s                kubelet, ip-192-168-83-91.us-west-2.compute.internal  Created container
  Normal   Started    5m1s                kubelet, ip-192-168-83-91.us-west-2.compute.internal  Started container
  Warning  Unhealthy  4m56s (x3 over 5m)  kubelet, ip-192-168-83-91.us-west-2.compute.internal  Readiness probe failed: HTTP probe failed with statuscode: 503

@lluunn
Copy link
Contributor

lluunn commented Aug 19, 2019

Do we know why it takes so long for sidecar container to be up?
I think it should be pretty light-weight?

@kunmingg
Copy link
Contributor

@jlewi
I think ideally network traffics should go through Istio by default and we can apply control policy easily.
Will check if there are Istio-best-practices that operators can follow.

@tmc
Copy link

tmc commented Aug 20, 2019

Just chiming in, I very much want all components to have sidecars injected. Some compliance requirements mandate traffic encrypted in transit and Istio mTLS can provide this for many use cases.

@krishnadurai
Copy link
Contributor

/cc @krishnadurai

@ghost
Copy link

ghost commented Sep 3, 2019

I second what @tmc is saying. I have a lot of regulations to follow since we're in clinical genomics, and I would prefer to have as many components as possible have sidecars injected.

I have tried recently to enable sidecar injection in the kubeflow namespace, but this causes the metadata deployment to always drop into error/crashbackoff.

[EDIT] Actually, I think it might be wrong headed to have sidecar injection in the kubeflow namespace (we don't use sidecars in kube/istio-system namespace why in the kubeflow one?). I think encrypted pod overlay is enough for me to meet my hippa/gdpr requirements.

@jlewi
Copy link
Contributor

jlewi commented Sep 4, 2019

I think we should distinguish between what the defaults are and turning it on/off.
I think we should definitely support using ISTIO to mediate traffic to shared services in the kubeflow namespace.

@jamesthegiantpeach if you have observed problems with specific services and ISTIO can you open specific bugs for each service.

@ryandawsonuk
Copy link

ryandawsonuk commented Oct 17, 2019

FWIW if you have istio sidecar injection on the namespace where you run kubeflow pipelines then you find argo jobs getting stuck - argoproj/argo-workflows#1616 (and kubeflow/pipelines#1774 and argoproj/argo-workflows#1282). This does not seem to be on by default, just something to be aware of (I happened to turn injection on at namespace level and so I hit the problem).

@discordianfish
Copy link
Member

I ran into the same problem when trying to run workflows in namespaces created by the profile controller.

I also agree there is a case to be made for istio injection by default. We need a solution to run argo workflows in the profile controller namespaces for kubeflow/pipelines#1223 anyway.

@ryandawsonuk
Copy link

ryandawsonuk commented Nov 13, 2019

@discordianfish I suppose the pipelines DSL could be extended with an option to add the sidecar.istio.io/inject annotation to Pods. Then it would be able to work with sidecars, which presumably there will be demand for (e.g. policy enforcement or MTLS).

@jlewi
Copy link
Contributor

jlewi commented Nov 23, 2019

We will need to support running KFP pipelines in profile controlled namespaces as part of the multi-user support in pipelines kubeflow/pipelines/issues/1223

@Ark-kun Does the pipelines DSL already support setting labels on individual pods in the workflow? This allows the default behavior of istio side car injection to be changed on the per pod level and overrides the namespace setting.

ISTIO side car injection can also be controlled at the namespace level. So if you have a namespace where you are running pipelines and nothing else you could add the label to the namespace to disable side car injection.

The issue I see with making disabling istio-sidecar injection an option in the profile controller is now every application needs to support both possible default values.

e.g. if we disable side car injection by default for a namespace, then the jupyter web app now needs to explicitly add the annotation to add the side car to jupyter pods which need it for access control

@jlewi
Copy link
Contributor

jlewi commented Dec 5, 2019

Correction; I misunderstood the meaning of the namespace label. The namespace label is not a default value. The namespace label is used to enable the istio side car mutating webhook for a particular namespace. So if we want to enable ISTIO for any pods in a namespace this would need to be set to true.

@stale
Copy link

stale bot commented Mar 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants