Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambient support for redhat openshift #42341

Closed
linsun opened this issue Dec 7, 2022 · 23 comments
Closed

Ambient support for redhat openshift #42341

linsun opened this issue Dec 7, 2022 · 23 comments
Assignees
Labels
area/ambient Issues related to ambient mesh lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed

Comments

@linsun
Copy link
Member

linsun commented Dec 7, 2022

Would be nice for IBM/Redhat team to step in and contribute an ambient openshift profile when ambient is merged to upstream, similar like the openshift profile today.

@linsun linsun added the area/ambient Issues related to ambient mesh label Dec 7, 2022
@jewertow
Copy link
Member

jewertow commented Mar 1, 2023

The tricky part of ambient mode on OpenShift is installation of ztunnel, which requires NET_ADMIN capabilities, so privileged SCC must be added to istio-system (oc adm policy add-scc-to-group privileged system:serviceaccounts:istio-system).

I wonder if we could install ztunnel in kube-system namespace, where we can install privileged workload out of the box?
Then the only missing part would be creating istio-ca-root-cert config map in that namespace, which is required by ztunnel.

What do you think @howardjohn @jwendell @morvencao?

PoC:

cat <<EOF >> control-plane.yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: control-plane
spec:
  hub: quay.io/jewertow
  tag: 1.18-latest
  profile: ambient
  components:
    cni:
      enabled: true
      namespace: kube-system
    ztunnel:
      enabled: true
      namespace: kube-system
  values:
    cni:
      cniBinDir: /var/lib/cni/bin
      cniConfDir: /etc/cni/multus/net.d
      chained: false
      cniConfFileName: "istio-cni.conf"
      excludeNamespaces:
       - istio-system
       - kube-system
      logLevel: info
      privileged: true
    sidecarInjectorWebhook:
      injectedAnnotations:
        k8s.v1.cni.cncf.io/networks: istio-cni
EOF
oc adm policy add-scc-to-group anyuid system:serviceaccounts:istio-system
istioctl install -y -n istio-system -f control-plane.yaml
// ztunnel will fail to start, so create missing config map manually
oc get cm istio-ca-root-cert -n istio-system -o json | jq -r '.data."root-cert.pem"' > root-cert.pem
oc create cm istio-ca-root-cert -n kube-system --from-file=root-cert.pem

@howardjohn
Copy link
Member

Definitely should (be able to) live in kube-system, just like CNI. I didn't known we don't have istio-ca-root-cert in kube-system, that is a bummer.

Do you know if we can give istiod abiliyt to write configmap to kube-system? I think some platforms may reject it... Does OpenShift allow this write?

If not CNI could read+mirror it, but that is pretty odd.

@linsun
Copy link
Member Author

linsun commented Mar 1, 2023

@jewertow just curious how do you install istio-cni (or your istio-cni equivalent) on openshift today for sidecars? Is it also in kube-system, or istio-system ns?

@keithmattix
Copy link
Contributor

Definitely should (be able to) live in kube-system, just like CNI. I didn't known we don't have istio-ca-root-cert in kube-system, that is a bummer.

I'm not sure of the historical reasons for this, but the only reason istio-ca-root-cert isn't in kube-system is because of a discoverySelector exclusion in NamespaceController.

@howardjohn
Copy link
Member

Definitely should (be able to) live in kube-system, just like CNI. I didn't known we don't have istio-ca-root-cert in kube-system, that is a bummer.

I'm not sure of the historical reasons for this, but the only reason istio-ca-root-cert isn't in kube-system is because of a discoverySelector exclusion in NamespaceController.

I vaguely recall some platforms reject kube-system writes and we exclude sidecars there anyways so there wasn't a need at the time

@jewertow
Copy link
Member

jewertow commented Mar 1, 2023

Do you know if we can give istiod abiliyt to write configmap to kube-system? I think some platforms may reject it... Does OpenShift allow this write?

I'm not sure, I will verify that. I will also check if we could check platform at runtime (e.g. based on git version) to conditionally create that config map in kube-system.

just curious how do you install istio-cni (or your istio-cni equivalent) on openshift today for sidecars?

@linsun, if you're asking about OpenShift Service Mesh, we install istio-cni in openshift-operators namespace alongside our operator, but I don't know why. I wasn't part of the team when that decision was made.

@linsun
Copy link
Member Author

linsun commented Mar 2, 2023

@jewertow yes, thanks, that is what I was asking. Would it be useful if we also put istio-cni & ztunnel in its dedicated operator ns (given potential limitation on what we can do in kube-system)?

@jewertow
Copy link
Member

@linsun, first of all, I'm sorry for the late reply, but I was on leave.

I guess that ztunnel could work fine in openshift-operators, but I didn't think about this approach, because I remember that @howardjohn was concerned about deploying istio-cni in openshift-operators.

I started prototyping profile openshift-ambient, but istio-cni logs errors, so I still work on it.

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jun 13, 2023
@jewertow jewertow removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jun 27, 2023
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jun 27, 2023
@jewertow jewertow added the lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed label Jun 27, 2023
@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jun 27, 2023
@linsun
Copy link
Member Author

linsun commented Dec 15, 2023

Hi @jewertow it is a good time to evaluate this again, I believe with the latest istio-cni change coming to upstream, ambient for openshift should be supported.

@linsun
Copy link
Member Author

linsun commented Dec 15, 2023

cc @srampal

@srampal can you open a PR to add yourself to istio/community as a member? check some of the merged PR for example. That way, i can assign issue to you.

@jewertow
Copy link
Member

Hey @linsun, I didn't test in-pod ambient networking yet, but I will do it early next week and will submit patches for OCP if it will be necessary.

@jewertow
Copy link
Member

@linsun I was trying to install in-pod ambient using hub: gcr.io/solo-test-236622/inpod-istio and tag: 1.19.4-1, but I got these errors in istio-cni:

2023-12-17T20:46:55.283288Z     info    ambient Node-level cleanup done
2023-12-17T20:46:55.283591Z     error   controllers     error handling kube-system/ztunnel-4d6cp, retrying (retry count: 1): failed to get veth device: no routes found for 10.217.0.86 controller=ambient
2023-12-17T20:49:05.867580Z     error   ambient Failed to list ipset entries: failed to list ipset ztunnel-pods-ips: no such file or directory

but these errors are no longer returned in the new implementation, so it seems that the istio-cni image was built from the old source code.

@jewertow
Copy link
Member

Ok, nevermind. I didn't use proper manifests.

@linsun
Copy link
Member Author

linsun commented Dec 18, 2023

Yes, the istio-cni was building using 1.19, but the PRs have been open if you want to build it yourself on latest code base. Thank you @jewertow for trying it on openshift!!

@linsun
Copy link
Member Author

linsun commented Dec 19, 2023

@jewertow any update? were you able to get it running?

@srampal
Copy link

srampal commented Dec 19, 2023

@linsun Some clarification on this topic. @jewertow can clarify further.

We have tested and verified the inpod redirection with the same CNI that Openshift uses (which is OVN-Kubernetes) but in a Kubernetes/ Kind environment. We (including @jewertow and others) will continue with more testing but what we have done so far gives us confidence that it should work even in a Openshift environment since that uses the same CNI that we have tested so far. This environment is different from the currently shipping Openshift Service Mesh (which is based on Maistra, a fork of Istio) and we may not test or support Ambient for the Maistra based version although that is still to be determined.

@jewertow
Copy link
Member

jewertow commented Dec 19, 2023

Hey @linsun,

any update? were you able to get it running?

Yes, I successfully installed ambient with "inpod" redirection mode on OpenShift. I tested flows from the Getting Started with Ambient Mesh and everything worked as expected and I didn't get any error in logs. And therefore I opened the PR with openshift-ambient profile.

@linsun
Copy link
Member Author

linsun commented Dec 20, 2023

Very nice, thanks for the update!

@linsun
Copy link
Member Author

linsun commented Dec 20, 2023

Hi @srampal - just to clarify, yep, I am not asking for support or test for ambient for maistra. :)

@sridhargaddam
Copy link
Contributor

Yes, I successfully installed ambient with "inpod" redirection mode on OpenShift. I tested flows from the Getting Started with Ambient Mesh and everything worked as expected and I didn't get any error in logs.

@jewertow did you try "L7 Authorization Policies" as well?

I was able to successfully install inPod on an OpenShift cluster with OpenShiftSDN as well as OVNK CNIs and verify the L4 Authorization policies. However, when I tried to verify L7 Authorization policies using K8s Gateway APIs, the bookinfo-gateway-istio pod is crash-looping due to readiness/startup probe failures. It appears like kubelet is unable to query the healthcheck url associated with the pod. But if I manually try the health-check url from another Pod it works fine. So, the issue seems to be - if we try to access from the hostNetwork, the healthcheck url could not be accessed, but the same url works fine when accessing from the podNetwork. I'm looking into it, but do let us know if the use-case worked in your setup.

@sridhargaddam
Copy link
Contributor

Yes, I successfully installed ambient with "inpod" redirection mode on OpenShift. I tested flows from the Getting Started with Ambient Mesh and everything worked as expected and I didn't get any error in logs.

@jewertow did you try "L7 Authorization Policies" as well?

I was able to successfully install inPod on an OpenShift cluster with OpenShiftSDN as well as OVNK CNIs and verify the L4 Authorization policies. However, when I tried to verify L7 Authorization policies using K8s Gateway APIs, the bookinfo-gateway-istio pod is crash-looping due to readiness/startup probe failures. It appears like kubelet is unable to query the healthcheck url associated with the pod. But if I manually try the health-check url from another Pod it works fine. So, the issue seems to be - if we try to access from the hostNetwork, the healthcheck url could not be accessed, but the same url works fine when accessing from the podNetwork. I'm looking into it, but do let us know if the use-case worked in your setup.

Please ignore the above error. The reason why the healthcheck was failing was because I forgot to update the namespace of the Gateway and HTTPRoute object, so it was installed in the default namespace. After changing it to istio-system, the healthcheck errors are no longer seen and the pod comes up fine.

@linsun
Copy link
Member Author

linsun commented Jan 29, 2024

This should be resolved, please try latest master or release 1.21 build, or wait for the official 1.21 release. see #48212.

@linsun linsun closed this as completed Jan 29, 2024
@linsun
Copy link
Member Author

linsun commented Jan 29, 2024

Sorry, just got confirmation from Eric that the change is not in beta 0 yet, but should be in -beta.1 since it was merged after beta.0 was released. I expect that the new build will be available in a day or 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ambient Issues related to ambient mesh lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed
Projects
Status: Done
Development

No branches or pull requests

7 participants