Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: basic support of ModelMesh #17

Merged

Conversation

israel-hdez
Copy link

On-boarding ModelMesh to the MVP of the Service Mesh integration.

This PR depends on the following ones being merged:

This would install ModelMesh + its monitoring. Monitoring seems to be required, because the odh-model-controller uses a ClusterRole that is created by modelmesh-monitoring. It fails if it is not there.

Sidecar injection is, for now, turned off in etcd because it has an initContainer which fails to do its job when the Istio sidecar is present.

The resulting installation works at least from a pure ModelMesh perspective; i.e. without involving Istio in an infer request. VirtualServices for ModelMesh routing, although correctly created and associated to Gateways and the Ingress, do not work (trying to do an infer request results in an error). The right configurations still need to be figured out.

@@ -41,6 +41,7 @@ spec:
metadata:
labels:
component: model-mesh-etcd
sidecar.istio.io/inject: "false"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that needed? Are you using auto-inject?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using auto-inject?

IDK, probably yes. I just saw the sidecar being added to anything in opendatahub namespace, so I had to add this label.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@israel-hdez and you explicitly do not want every pod to be part of the mesh? Curious what are the reasons for that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand the question :-)

I can say that specifically for this etcd deployment, the sidecar conflicted with the initContainer because of the blocked network. Apparently, it doesn't hurt that etcd is not part of the mesh, so I simply turned off injection here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, now I know the reason :)

@@ -2,181 +2,182 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: manager-role
name: odh-model-controller-role

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this automatically prefixed by kustomize?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not prefixed.

I don't know the reason, but prefix is not turned on. See:

#namePrefix: model-serving-

@@ -2,181 +2,182 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: manager-role
name: odh-model-controller-role
rules:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are changes below only a side-effect of autoformatting or have you added some extra roles?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@bartoszmajsak bartoszmajsak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining why this temporary solution is needed. On the grounds of unblocking ModelMesh to move things forward, I think we are fine with merging it. There are just a few minor things I commented on.

I wonder if we shouldn't mark it in the code so it won't get lost. Something like # FIXME temporarily for unblocking ModelMesh comes to mind.

But most importantly I think we should identify what needs to be done so that you don't need to bypass security in the mesh.

Also curious about what exactly is failing here:

VirtualServices for ModelMesh routing, although correctly created and associated to Gateways and the Ingress, do not work (trying to do an infer request results in an error)

@@ -41,6 +41,7 @@ spec:
metadata:
labels:
component: model-mesh-etcd
sidecar.istio.io/inject: "false"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@israel-hdez would you say that using auto-injection is not a reasonable default then and being explicit in which pods should have injected side-cars is a better approach? The reason I'm asking is because we are actually considering switching the behaviour so that would be good argument for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that in the opendatahub namespace, we can be explicit about what is included in the mesh, because we own that namespace (if "own" is the right word).

In project namespaces, maybe it is the opposite.

Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
@israel-hdez
Copy link
Author

I wonder if we shouldn't mark it in the code so it won't get lost. Something like # FIXME temporarily for unblocking ModelMesh comes to mind.

Done!

But most importantly I think we should identify what needs to be done so that you don't need to bypass security in the mesh.

I do agree. I'm hoping we can find the right configs for this.

Also curious about what exactly is failing here:

VirtualServices for ModelMesh routing, although correctly created and associated to Gateways and the Ingress, do not work (trying to do an infer request results in an error)

It is basically because the ingress-gateway was blocking the request because the redirection to OAuth. This applied to both in-cluster and external requests. So, I had to do that temporary fix to bypass auth and be able to try it.

@bartoszmajsak bartoszmajsak changed the title Changes to add Service Mesh to ModelMesh feat: basic support of ModelMesh Jun 30, 2023
@bartoszmajsak bartoszmajsak merged commit 313cba2 into maistra:service-mesh-integration Jun 30, 2023
@israel-hdez israel-hdez deleted the add-modelmesh branch June 30, 2023 20:10
bartoszmajsak added a commit to bartoszmajsak/odh-manifests that referenced this pull request Sep 6, 2023
* feat(mvp): enabling service mesh for ODH

Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com>
Co-authored-by: Cameron Garrison <cgarriso@redhat.com>
Co-authored-by: Wambugu “Innocent” Kironji <37008228+InnocentK@users.noreply.github.com>

This PR implements our MVP of enabling service mesh with ODH, using Authorino combined with service mesh to facilitate authorization to ODH components.

To deploy ODH with Service Mesh, follow along with our instructions found in `enabling_ossm.md` to find a reference KfDef along with other steps.

Notes:
- Current implementation relies on some hardcoded namespaces: using `opendatahub` for the KfDef namespace, and `istio-system` + `auth-provider` for the OSSM SMCP and Authorino resources respectively. Ideally we would not be hardcoding `opendatahub`, we are looking for suggestions on how to avoid this.
- We use an initJob to dynamically create resources based on cluster configuration + to allow for secret propagation between requisite resources. In the future, this will ideally be implemented by a more sophisticated controller/control plane to create and reconcile these resources.
- We do not support a migration path for pre-existing ODH deployments yet.
- We do not use the opendatahub-operator `secret-generator` to create our `OauthClient` resource as we need the secret to be propagated to another resource - in the future, we may extend the `secret-generator` to perform this functionality rather than handle it in our initJob or in a controller. Is this something that folks well-versed in the operator think could be possible?
- We configure our service mesh deployment to use `autoInject`, injecting to all pods found in the mesh namespaces by default for convenience. This may cause problems if future components not yet considered should not be in the mesh, so this may be something to revisit.

* chore: re-add mistakenly removed cert secret (maistra#11)

re-add mistakenly removed cert secret

* fix: move secret to correct kustomization section (maistra#14)

move secret to correct kustomization section

* update OWNERS, add aslak

* fix: update jq download link (maistra#15)

fix jq download link

* use service-mesh-integration branch for kfdef

* fix: corrects kfdef file name in the docs

* fix(manifests): switches to use patches and overlays (maistra#18)

Co-authored-by: Cameron Garrison <cgarriso@redhat.com>

* fix: create hmac and token secrets as secrets rather than cfgmap (maistra#19)

Change from creating hmac and token secrets for oauth2 secrets

* chore(init-job): adds annotation to root namespace (maistra#21)

* feat: adds service-mesh env variables

They are optionally fetched from a config map, otherwise controller uses defaults

* chore: creates service-mesh specific OWNERS file

* fix: resolves ns in init-job (maistra#22)

* fix: extracts port and raw host (maistra#23)

* fix(job): defaults OAUTH_PORT to 443 if not defined in issuer

* chore: removes obsolete route (maistra#24)

* feat: basic support of ModelMesh (maistra#17)

* feat: migrate existing DS project namespaces to enable SM  (maistra#25)

* add ds proj migration to init job

* feat: patches SMCP instead of creating one (maistra#26)

Co-authored-by: Cameron Garrison <cgarriso@redhat.com>

* fix: specifies host in ODH dashboard gateway and virtualservice (maistra#27)

---------

Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com>
Co-authored-by: Edgar Hernández <ehernand@redhat.com>
bartoszmajsak added a commit that referenced this pull request Sep 13, 2023
* feat(mvp): enabling service mesh for ODH

Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com>
Co-authored-by: Cameron Garrison <cgarriso@redhat.com>
Co-authored-by: Wambugu “Innocent” Kironji <37008228+InnocentK@users.noreply.github.com>

This PR implements our MVP of enabling service mesh with ODH, using Authorino combined with service mesh to facilitate authorization to ODH components.

To deploy ODH with Service Mesh, follow along with our instructions found in `enabling_ossm.md` to find a reference KfDef along with other steps.

Notes:
- Current implementation relies on some hardcoded namespaces: using `opendatahub` for the KfDef namespace, and `istio-system` + `auth-provider` for the OSSM SMCP and Authorino resources respectively. Ideally we would not be hardcoding `opendatahub`, we are looking for suggestions on how to avoid this.
- We use an initJob to dynamically create resources based on cluster configuration + to allow for secret propagation between requisite resources. In the future, this will ideally be implemented by a more sophisticated controller/control plane to create and reconcile these resources.
- We do not support a migration path for pre-existing ODH deployments yet.
- We do not use the opendatahub-operator `secret-generator` to create our `OauthClient` resource as we need the secret to be propagated to another resource - in the future, we may extend the `secret-generator` to perform this functionality rather than handle it in our initJob or in a controller. Is this something that folks well-versed in the operator think could be possible?
- We configure our service mesh deployment to use `autoInject`, injecting to all pods found in the mesh namespaces by default for convenience. This may cause problems if future components not yet considered should not be in the mesh, so this may be something to revisit.

* chore: re-add mistakenly removed cert secret (#11)

re-add mistakenly removed cert secret

* fix: move secret to correct kustomization section (#14)

move secret to correct kustomization section

* update OWNERS, add aslak

* fix: update jq download link (#15)

fix jq download link

* use service-mesh-integration branch for kfdef

* fix: corrects kfdef file name in the docs

* fix(manifests): switches to use patches and overlays (#18)

Co-authored-by: Cameron Garrison <cgarriso@redhat.com>

* fix: create hmac and token secrets as secrets rather than cfgmap (#19)

Change from creating hmac and token secrets for oauth2 secrets

* chore(init-job): adds annotation to root namespace (#21)

* feat: adds service-mesh env variables

They are optionally fetched from a config map, otherwise controller uses defaults

* chore: creates service-mesh specific OWNERS file

* fix: resolves ns in init-job (#22)

* fix: extracts port and raw host (#23)

* fix(job): defaults OAUTH_PORT to 443 if not defined in issuer

* chore: removes obsolete route (#24)

* feat: basic support of ModelMesh (#17)

* feat: migrate existing DS project namespaces to enable SM  (#25)

* add ds proj migration to init job

* feat: patches SMCP instead of creating one (#26)

Co-authored-by: Cameron Garrison <cgarriso@redhat.com>

* fix: specifies host in ODH dashboard gateway and virtualservice (#27)

---------

Co-authored-by: Bartosz Majsak <bartosz.majsak@gmail.com>
Co-authored-by: Edgar Hernández <ehernand@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants