Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add storage-initializer uid handling for OpenShift with istio-cni #18

Merged
merged 5 commits into from
Jul 12, 2023

Conversation

ReToCode
Copy link

@ReToCode ReToCode commented Jul 11, 2023

What this PR does / why we need it:

  • This PR adds custom code to make KServe run on OpenShift without the need for anyuid SCC.

More context:
OpenShift uses istio-cni which causes an issue with init-containers when calling external services
like S3 or similar. Setting the uid for the storage-initializer to the same uid as the
uid of the istio-proxy resolves the issue. In OpenShift the istio-proxy always gets assigned
the first uid from the namespaces uid range + 1 (The range is defined in an annotation on the namespace).

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing:
Tested on OpenShift according to documentation in https://github.com/ReToCode/knative-kserve and opendatahub-io/odh-manifests#838.

Logs
Installation setup and documentation here.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Release note:

The `storage-initializer` container will now run with the same `uid` as the `istio-proxy` which resolves an issue when istio-cni is used.

/hold WIP - making tests pass.

Copy link

@danielezonca danielezonca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, adding few questions

pkg/webhook/admission/pod/mutator.go Show resolved Hide resolved
pkg/webhook/admission/pod/mutator.go Show resolved Hide resolved
pkg/webhook/admission/pod/storage_initializer_injector.go Outdated Show resolved Hide resolved
if initContainer.SecurityContext == nil {
initContainer.SecurityContext = &v1.SecurityContext{}
}
uidStr := targetNs.Annotations[OpenShiftUidRangeAnnotationKey]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see this mechanism is replacing the IstioSidecarUIDAnnotationKey alternative. Do you think it could make sense to keep it as fallback if OpenShiftUidRangeAnnotationKey is not available?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm not sure if it is better to fail or not (if the annotation is not there, something went pretty wrong on OCP). As the user in OCP probably will never know about the issue and how to resolve it (needing to know to add 1 for example), do we expect that he even could bring the correct user? I'm currently favouring erroring I think.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this mechanism with "+1" would work as Istio itself would not find that range, right ? I'm not sure what Istio/Service Mesh would do as a fallback when the range is not available as an annotation. I would then probably do the same here, too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it should be fine to assume it is there or error out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm more in agreement with @danielezonca here.

I think the code should honor the IstioSidecarUIDAnnotationKey if present. Otherwise, apply OpenShift logic. If both the OpenShift annotations and IstioSidecarUIDAnnotationKey aren't there, continue without doing any change and without error .

That logic ^ would make the code compatible with both upstream Istio-cni (which will want the 1337 annotation) and Maistra/OSSM (which will use OpenShift annotations). And the code would also remain compatible with an upstream non-cni-Istio installation (which presumably works well without UID hacking), in case community wants to try that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...I'm also thinking that such logic ^ (despite being a workaround) is perhaps more feasible to push to upstream, as it is less invasive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your proposal and explanation. I think that makes sense. I updated the PR accordingly, please re-review.

@rhuss
Copy link

rhuss commented Jul 11, 2023

/lgtm

Copy link

@danielezonca danielezonca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

I would like to have @Jooho or @israel-hdez approval too before merging

@danielezonca
Copy link

@ReToCode
Is the PR still WIP or it is just the title to be updated?

@Jooho @israel-hdez Please review when you have time :)

Copy link

@israel-hdez israel-hdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped a comment for your consideration: #18 (comment)

@ReToCode
Copy link
Author

Is the PR still WIP or it is just the title to be updated?

I'm still testing it with quay.io/rlehmann/kserve-controller:fix-anyuid. I'm targeting it to be ready today.

@ReToCode ReToCode changed the title [WIP] add storage-initializer uid handling for OpenShift with istio-cni add storage-initializer uid handling for OpenShift with istio-cni Jul 12, 2023
@ReToCode
Copy link
Author

I'm done with testing. Looks good. Please also see https://redhat-internal.slack.com/archives/C05742W6F7T/p1689149577193889 for a demo link.

@skonto
Copy link

skonto commented Jul 12, 2023

If the user actually wants to have anyuid in their namespace because they are running something privileged in there they could still use IstioSidecarUIDAnnotationKey according to the PR (storage init container uid should be set to 1337 in that case). We could even automate that too right (although a bit more complex eg. detect sa privileges etc)?

@ReToCode
Copy link
Author

If the user actually wants to have anyuid in their namespace because they are running something privileged in there they could still use IstioSidecarUIDAnnotationKey according to the PR (storage init container uid should be set to 1337 in that case). We could even automate that too right (although a bit more complex eg. detect sa privileges etc)?

Hm not sure if we even would want that, but you can run your user-container as privileged, the istio-container would still be 1337 or the OCP one (we just set the storage-initializer to the same id as istio). I doubt that there is a case for running the storage-initializer privileged. Or what case are you thinking of?

@skonto
Copy link

skonto commented Jul 12, 2023

we just set the storage-initializer to the same id as istio)

It seems to me that we only set it to project range uid + 1 unless there is some annotation that passes a specific uid. If user sets anyuid without setting any annotation, istio side car will run with 1337 but storage-initializer will run with a project range id no?

I doubt that there is a case for running the storage-initializer privileged.

It is not about the storage-initializer and probably this part will need some form of documentation. Given the fact that we dont control what users do (we had a recent case with setting anyuid in our operator ns causing problems) this might break stuff in some scenarios. As for the case that this may appear imagine a user wants to store some results in nfs in a ns and run his inference svc there (any such combination is a challenge I guess).

@israel-hdez
Copy link

israel-hdez commented Jul 12, 2023

It seems to me that we only set it to project range uid + 1 unless there is some annotation that passes a specific uid. If user sets anyuid without setting any annotation, istio side car will run with 1337 but storage-initializer will run with a project range id no?

I think this fully depends on which Mesh flavor you are using.

If you use Maistra/OSSM, AFAIK, the Istio sidecars will always use projectRangeUid + 1 regardless of the anyuid setting. If you use upstream Istio, this one will always run with uid 1337 and this is why the anyuid privilege is required in this case.

In any case, the initContainer must run with the same UID that the sidecar would use. Otherwise, the initContainer will have no network.

EDIT: And keep in mind that this is just a requirement for the CNI case.

We could even automate that too right (although a bit more complex eg. detect sa privileges etc)?

So, automating uid selection would depend more on detecting which Mesh flavor is being used, rather than detecting the anyuid privilege.

Quite doable, but I think, for now, I prefer to keep things simple.

@skonto
Copy link

skonto commented Jul 12, 2023

If you use Maistra/OSSM, AFAIK, the Istio sidecars will always use projectRangeUid + 1 regardless of the anyuid setting.

Ok if so makes sense then, however what I remember from my experiments when I was using anyuid in a ns, OSSM was picking 1337 for the Istio sidecar. Anyway I will double check.

Copy link

@israel-hdez israel-hdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@israel-hdez israel-hdez merged commit 4e7f0df into opendatahub-io:master Jul 12, 2023
55 checks passed
@skonto
Copy link

skonto commented Jul 12, 2023

@israel-hdez here are two fresh runs one without scc anyuid applied and one with scc anyuid applied. These are pods for the same knative svc created twice one for each run. The setup is done using Reto's repo and its Maistra/SM on OCP 4.12. As you see the istio sidecar uses 1337 in the second case. So I don't think this case is covered unless user knows and passes manually the right uid via the annotation (this case is not project range related). Probably some form of documentation would help here.

@Jooho
Copy link

Jooho commented Jul 12, 2023

@skonto Thanks for the test. I think there are very few people who want to use anyuid for their openshift. If the knative/kserve works well without anyuid, nobody wants to apply anyuid so I think we don't need to worry about the scenario. BTW, I tested it on my end, it works fine.

@israel-hdez
Copy link

The setup is done using Reto's repo and its Maistra/SM on OCP 4.12. As you see the istio sidecar uses 1337 in the second case.

That's quite interesting. Good find! So, I was wrong.

Digging on Maistra's code, for reference this is how they implement it: https://github.com/maistra/istio/blob/98f70f7641d14db1351c4b8f5ae1d8f19fc79a92/pkg/kube/inject/webhook.go#L922-L959. Looks like uid 1337 is a fallback.

Probably some form of documentation would help here.

I'm quite in agreement with JooHo comment and this is supposed to be a workaround. Perhaps, if we see that this workaround won't go away, we will need to improve it. So, I'm 👍 to cover it with docs for the time being.

I'll create a ticket.

israel-hdez pushed a commit that referenced this pull request Jul 12, 2023
**What this does / why we need it**:

    This PR adds custom code to make KServe run on OpenShift without the need for anyuid SCC.

**More context**:

OpenShift uses istio-cni which causes an issue with init-containers when calling external services
like S3 or similar. Setting the uid for the storage-initializer to the same uid as the
uid of the istio-proxy resolves the issue. In OpenShift the istio-proxy always gets assigned
the first uid from the namespaces uid range + 1 (The range is defined in an annotation on the namespace).

**Release note**:

```release-note
The `storage-initializer` container will now run with the same `uid` as the `istio-proxy` which resolves an issue when istio-cni is used.
```

---
Squashed commit titles:
* add storage-initializer uid handling for OpenShift with istio-cni
* update storage_initializer_injector tests
* Also use annotation on pod to override uid
@israel-hdez israel-hdez added the kind/bug Something isn't working label Jul 14, 2023
israel-hdez pushed a commit to israel-hdez/kserve that referenced this pull request Sep 20, 2023
…ream-merge

Merge upstream/release-v0.11 into master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants