Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rootless Kubeflow part one (istio-cni) #2455

Merged
merged 32 commits into from Sep 7, 2023

Conversation

juliusvonkohout
Copy link
Member

@juliusvonkohout juliusvonkohout commented May 9, 2023

Which issue is resolved by this Pull Request:
Resolves #2014

Description of your changes:

The main steps are adding an additional profile for istio-cni or ambient mesh, updating the documentation and manifest generation process. Then adding the baseline and restricted PSS as kustomize component to /contrib and extending the profile controller to annotate user namespaces with configurable PSS labels.

First Stage:

  1. Implement Istio 1.17.5 and use it by default. This is important for the Kubeflow 1.8 feature freeze
  2. Implement istio-cni (--set components.cni.enabled=true --set components.cni.namespace=kube-system) as second option.
  3. Add simple tests similar to tests/gh-actions/install_istio.sh and tests/gh-actions/install_knative.sh for istio-cni and support both rootfull and rootless istio at the same time and give users one release to test

Second stage in a second PR:
4. Add pod security standards (https://kubernetes.io/docs/concepts/security/pod-security-standards/) base/restricted to manifests/contrib
5. Add istio-ambient to Kubeflow 1.9
6. Enforce PSS baseline (here you can still build OCI containers via Podman and buildah) in Kubeflow 1.9. It works with any istio
7. Optionally Enforce PSS restricted (this is where minor corner cases are affected) in Kubeflow 1.10

Third stage:
10. Upgrade Istio to 1.19 and use ambient service mesh by default in Kubeflow 1.10

Checklist:

  • Unit tests pass:
    Make sure you have installed kustomize == 3.2.1
    1. make generate-changed-only
    2. make test

@google-cla
Copy link

google-cla bot commented May 9, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@Talador12
Copy link

Talador12 commented May 9, 2023

To clarify, this pull request is the rootful daemonset with rootless sidecar solution?

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented May 10, 2023

To clarify, this pull request is the rootful daemonset with rootless sidecar solution?

yes. The damonset is completely isolated and can be managed by the cluster operators. It resides in kube-system so it becomes part of the underlying cluster then. There is no network interaction at all

@juliusvonkohout
Copy link
Member Author

@agilgur5 do you have any thoughts about this PR?

Copy link

@moorthy156 moorthy156 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are also running istio-cni to remove sidecar root requirement privileges, it is working fine us

@Talador12
Copy link

We ended up shifting to use a dedicated physical cluster instead of a virtual cluster for kubeflow, but we still want to try out this model to improve our security footprint. Looking forward to when this work is merged

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Jun 19, 2023

@moorthy156 do you want to help with an upstream implementation? If yes, please contact me on slack.

@kimwnasptd
Copy link
Member

@juliusvonkohout what's the state of this PR? Is it ready for review?

My only concern with the CNI is the initContainer redirection, which could break KServe etc that also has init container that fetches data #2014 (comment).

What's the state with this?

@moorthy156
Copy link

@moorthy156 do you want to help with an upstream implementation? If yes, please contact me on slack.

I would love too. Can i get directions on this

@MuhammadUsman90
Copy link

@moorthy156 do you want to help with an upstream implementation? If yes, please contact me on slack.

I would love too. Can i get directions on this

I would also like to be part of it. @juliusvonkohout @moorthy156 Can we arrange a small point on this?

@juliusvonkohout
Copy link
Member Author

Yes, always reach out in the Kubeflow slack.

juliusvonkohout and others added 5 commits July 26, 2023 16:20
The istio package is upgraded to the latest stable version, 1.18.1. This
upgrade is needed for running Kubeflow with rootless containers, as
there are a lot of improvements in the latest istio version for that.

Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
@@ -0,0 +1,59 @@
name: Build & Apply KServe manifests in KinD, using istio CNI
on:
workflow_dispatch:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a trigger here based on changes to specific directories, not just workflow_dispatch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tzstoyanov will check this on monday

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this Istio CNI functionality is still experimental and not enabled by default yet, I added only a manual trigger of the github action. However, added also an automatic trigger on every change that uses Istio CNI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the trigger correctly, only people with write access will have access to trigger the workflow based on workflow_dispatch. So, not sure if it's necessary to keep

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this would be the easiest way to run Istio CNI tests. Lets keep it at least for one release, until Istio CNI is not enabled by default. We can remove the manual trigger at any time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this test be triggered when there is a change to istio-cni-*? Can we trigger the test to ensure it runs successfully?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes more sense than the current logic. Changed the trigger action, thanks Anna.

@juliusvonkohout
Copy link
Member Author

juliusvonkohout commented Aug 11, 2023

It would be easier to review if the PR is separated into

* 1.17.5 upgrade

* Istio-cni

* contrib components

Also for contrib, can you edit to follow https://github.com/kubeflow/manifests/blob/master/proposals/20220926-contrib-component-guidelines.md?

I will just remove /contrib from this PR for now. I think the istio changes should stay together here.

juliusvonkohout and others added 2 commits August 11, 2023 15:04
The Istio CNI functionality is still experimental, that's why the github
action is triggered manually for now. Added an automatic trigger as well
- on every PR that uses ISTIO CNI code.

Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
@tzstoyanov
Copy link
Contributor

@annajung comments are addressed, please can you take a look again

tests/gh-actions/install_knative-cni.sh Outdated Show resolved Hide resolved
common/istio-cni-1-17/README.md Outdated Show resolved Hide resolved
@@ -0,0 +1,59 @@
name: Build & Apply KServe manifests in KinD, using istio CNI
on:
workflow_dispatch:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the trigger correctly, only people with write access will have access to trigger the workflow based on workflow_dispatch. So, not sure if it's necessary to keep

juliusvonkohout and others added 3 commits August 16, 2023 15:29
Co-authored-by: Anna <antheaj@vmware.com>
Trigger the test on every PR which changes istio-cni code only.

Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
rename to istio-cni
@annajung
Copy link
Member

Thanks everyone!
/lgtm

cc @kimwnasptd

@kimwnasptd
Copy link
Member

Thanks for the review @annajung! I'll also try to take a look within the weekend and merge this

@mpaulgreen
Copy link

mpaulgreen commented Sep 5, 2023

@juliusvonkohout The knative serving and kubeflow pods are still stuck with isito-init failing as described in Issue #2338. I was trying to run on Openshift 4.12.x and 4.13.x.

ENVOY_PORT=
INBOUND_CAPTURE_PORT=
ISTIO_INBOUND_INTERCEPTION_MODE=
ISTIO_INBOUND_TPROXY_ROUTE_TABLE=
ISTIO_INBOUND_PORTS=
ISTIO_OUTBOUND_PORTS=
ISTIO_LOCAL_EXCLUDE_PORTS=
ISTIO_EXCLUDE_INTERFACES=
ISTIO_SERVICE_CIDR=
ISTIO_SERVICE_EXCLUDE_CIDR=
ISTIO_META_DNS_CAPTURE=
INVALID_DROP=

2023-09-04T11:38:02.320411Z	info	Istio iptables variables:
PROXY_PORT=15001
PROXY_INBOUND_CAPTURE_PORT=15006
PROXY_TUNNEL_PORT=15008
PROXY_UID=1337
PROXY_GID=1337
INBOUND_INTERCEPTION_MODE=REDIRECT
INBOUND_TPROXY_MARK=1337
INBOUND_TPROXY_ROUTE_TABLE=133
INBOUND_PORTS_INCLUDE=*
INBOUND_PORTS_EXCLUDE=15090,15021,15020
OUTBOUND_OWNER_GROUPS_INCLUDE=*
OUTBOUND_OWNER_GROUPS_EXCLUDE=
OUTBOUND_IP_RANGES_INCLUDE=*
OUTBOUND_IP_RANGES_EXCLUDE=
OUTBOUND_PORTS_INCLUDE=
OUTBOUND_PORTS_EXCLUDE=
KUBE_VIRT_INTERFACES=
ENABLE_INBOUND_IPV6=false
DNS_CAPTURE=false
DROP_INVALID=false
CAPTURE_ALL_DNS=false
DNS_SERVERS=[],[]
OUTPUT_PATH=
NETWORK_NAMESPACE=
CNI_MODE=false
HOST_NSENTER_EXEC=false
EXCLUDE_INTERFACES=

2023-09-04T11:38:02.320573Z	info	Writing following contents to rules file: /tmp/iptables-rules-1693827482320445198.txt2641253036
* nat
-N ISTIO_INBOUND
-N ISTIO_REDIRECT
-N ISTIO_IN_REDIRECT
-N ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp --dport 15008 -j RETURN
-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15001
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A ISTIO_INBOUND -p tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15021 -j RETURN
-A ISTIO_INBOUND -p tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -j ISTIO_IN_REDIRECT
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_OUTPUT -o lo -s 127.0.0.6/32 -j RETURN
-A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --uid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --uid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -o lo ! -d 127.0.0.1/32 -m owner --gid-owner 1337 -j ISTIO_IN_REDIRECT
-A ISTIO_OUTPUT -o lo -m owner ! --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -m owner --gid-owner 1337 -j RETURN
-A ISTIO_OUTPUT -d 127.0.0.1/32 -j RETURN
-A ISTIO_OUTPUT -j ISTIO_REDIRECT
COMMIT
2023-09-04T11:38:02.320602Z	info	Running command: iptables-restore --noflush /tmp/iptables-rules-1693827482320445198.txt2641253036
2023-09-04T11:38:02.321779Z	error	Command error output: xtables parameter problem: iptables-restore: unable to initialize table 'nat'

Error occurred at line: 1
Try `iptables-restore -h' or 'iptables-restore --help' for more information.
2023-09-04T11:38:02.321791Z	error	Failed to execute: iptables-restore --noflush /tmp/iptables-rules-1693827482320445198.txt2641253036, exit status 2```

@juliusvonkohout
Copy link
Member Author

@mpaulgreen The istio-cni documentation says that for openshifts special cni implementation you need the networkattachmentdefinitions. But we will not cover this here.

And the logs you posted there look like the normal istio. This PR here is for istio-CNI. I used both istios on Openshift, so it works. But you have to configure both specifically for Openshift, since Openshift has some special defaults and a different CNI plugin that needs extra settings. As said in the other issue consulting is available, just reach out on slack or linkedin.

But please only discuss istio-CNI here. This PR is for development, not paid support.

@mpaulgreen
Copy link

mpaulgreen commented Sep 5, 2023

@juliusvonkohout You are correct. There are some additional configuration required istio-cni for OpenShift. I will stop posting defects of kubeflow with istio here.

@kimwnasptd
Copy link
Member

Tested as well in a KinD cluster and everything worked as expected. Great to see this in, thank you everyone that worked hard!

/lgtm
/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: juliusvonkohout, kimwnasptd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit b5b9a7b into kubeflow:master Sep 7, 2023
4 checks passed
@juliusvonkohout juliusvonkohout changed the title Rootless Kubeflow Rootless Kubeflow part one (istio-cni) Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add proper PSPs to enforce security and safety for Kubeflow on Kubernetes
9 participants