Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial attempt at GKE detection in sysbox-deploy-k8s script #108

Merged
merged 3 commits into from
Sep 15, 2023

Conversation

jamonation
Copy link
Contributor

Here's an initial attempt at resolving nestybox/sysbox#680.

The idea is the sysbox-deploy-k8s.sh script checks if the link local Google metadata endpoint is available at 169.254.169.254 and then to see if the host node is part of a GKE cluster.

If so, then the script removes the conflicting bridge configuration, and sets the correct paths in the crio.network toml config.

Could definitely use some proper testing from someone who knows their way around the install process!

Signed-off-by: Jamon Camisso <jamonation+git@gmail.com>
Copy link
Member

@ctalledo ctalledo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jamonation for the contribution, much appreciated!

Looks good (just one review comment).

I assume you had a chance to test this on GKE and things work well (I've not tried it yet).

Thanks again!

@@ -948,6 +948,24 @@ function do_distro_adjustments() {
sed -i '/^kernel.unprivileged_userns_clone/ s/^#*/# /' ${sysbox_artifacts}/systemd/99-sysbox-sysctl.conf
}

# determines if running on a GKE cluster
function check_gke() {
cluster_name=$(curl --connect-timeout 1 -s -H "Metadata-Flavor: Google" 169.254.169.254/computeMetadata/v1/instance/attributes/cluster-name || false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, is this the GKE suggested way to find if the cluster is on GKE?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctalledo it is possible not to enable Workload Identity on your GKE clusters if that's what you're getting at. This is a reliable way to tell if we are executing in GKE as long as WI is enabled. Another option is to look for fingerprint-like data on the GKE node's VM OS. However, the GKE docs warn against relying on the underlying OS not changing for your Daemonsets functionality to work correctly. GKE releases are decoupled from the node OS in my experience.

Another option might be to make this a bool that we flip as a config item so the GKE-specific functionality is explicitly enabled at install time and that could be covered by a documentation update?

Thank you for working on this @jamonation. I am no longer at the employer where I was impacted by this issue, but I can help with any GKE-specific validations or testing if you'd like an assist. Just let me know 👍🏽

Copy link
Member

@ctalledo ctalledo Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kevholmes, thanks for the response.

This is a reliable way to tell if we are executing in GKE as long as WI is enabled.

Got it, thanks. I am not too familiar with WI yet. Is it enabled by default, and when would it make sense for users to disable it.

Another option is to look for fingerprint-like data on the GKE node's VM OS. However, the GKE docs warn against relying on the underlying OS not changing for your Daemonsets functionality to work correctly. GKE releases are decoupled from the node OS in my experience.

Agreed.

Another option might be to make this a bool that we flip as a config item so the GKE-specific functionality is explicitly enabled at install time and that could be covered by a documentation update?

In general we prefer to not add too many configs, unless absolutely required. In this case I think the solution you provide above might work just fine.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ctalledo: the only time I might disable WI support in a GKE cluster is in an environment where I'm executing workloads that I absolutely cannot trust. For instance, if I offered a CI/CD service that allowed folks external to my company to sign up with a credit card and run custom containers/code/cli commands in my k8s cluster. It sounds like a scenario where I might want Sysbox 😄. In a design like that, I would be:

a) disabling Workload Identity entirely

or

b) creating NetworkPolicies to drastically limit who can speak to 169.254.169.254 / kube-system namespace / the gke-metadata-server pods themselves from within the k8s cluster(s) running these untrustworthy workloads.

The goal would be to ensure absolutely no private data could be leaked/exposed to bad actors from the Workload Identity service in GKE. This scenario of me blocking access to the metadata service IP could possibly cause comms issues if the NetworkPolicies were not crafted appropriately to take this requirement of being able to speak to the special IP / WI pods into account at the setup/install time of Sysbox via its daemonset.

Typically the Metadata service is "transparent" if we call it from a compute engine VM. However, in GKE, it's actually something we can see, restart, log, and instrument through tooling like Prometheus as it lives in the kube-system namespace.

@jamonation
Copy link
Contributor Author

🤔 Thinking more about this, the 169.254.169.254 IP is used across cloud providers for instance metadata. e.g. Azure and AWS.

Since my is_gke function is not parsing the response from the metadata server, the resulting check on another cloud provider would treat any response as a positive indicator that the node is part of a GKE cluster, whereas it could very well be running in EKS or AKS or any other environment that has a metadata server at that IP address.

The solution is to check the HTTP response code returns 200, which isn't as clean, but will at least ensure this logic only applies to GKE nodes (with Workload Identity enabled as noted of course). I'll update my PR when I have some time to work on this and test it.

@jamonation
Copy link
Contributor Author

Right, I've pushed an updated check_gke function that looks for an HTTP 200 response from the metadata endpoint. This approach isn't the most robust as you noted @kevholmes but it is something to at least cover those clusters with workload identity turned on.

To test, I've done this in a Dockerfile and built/pushed a sysbox:v0.6.2-dev image to my artifact registry:

FROM registry.nestybox.com/nestybox/sysbox-deploy-k8s:v0.6.2

COPY my-patched-sysbox-deploy-k8s.sh /opt/sysbox/scripts/sysbox-deploy-k8s.sh

Then edited the sysbox-deploy-k8s DaemonSet to use my customised sysbox:v0.6.2-dev image. So far so good!

--connect-timeout 1 \
-H "Metadata-Flavor: Google" \
169.254.169.254/computeMetadata/v1/instance/attributes/cluster-name)
if [ $is_cluster -ne 200 ]; then
Copy link
Member

@ctalledo ctalledo Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamonation: If possible, please add a comment here on why we are looking for return code 200 (similar to what you wrote down in this PR's conversation thread). Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out @ctalledo, I added a comment explaining the logic

@ctalledo
Copy link
Member

ctalledo commented Sep 6, 2023

Right, I've pushed an updated check_gke function that looks for an HTTP 200 response from the metadata endpoint. This approach isn't the most robust as you noted @kevholmes but it is something to at least cover those clusters with workload identity turned on.

Out of curiosity, have you had a chance to try on a non-GKE cluster? That would be excellent, but if you haven't we can do it.

@yachub
Copy link

yachub commented Sep 14, 2023

FWIW, the fork's default branch is behind a could commits, so after checking out your branch ran git remote add upstream https://github.com/nestybox/sysbox-pkgr.git, git fetch upstream, and git rebase upstream/master, then ran the make commands to build the image.

IMHO I do feel that something like curl -Ls -o /dev/null "http://metadata.google.internal/computeMetadata/v1/instance/image" -H "Metadata-Flavor: Google" && true || false would be a bit more concise as oppsed to checking specific HTTP reponse codes against the IP. GCP docs reference the use of metadata.google.internal, and I wouldn't that expect to resolve on other providers, so curl would return a non-zero exit code. But either certainly has the same result! :)

@ctalledo
Copy link
Member

Thanks @yachub; let's go ahead and merge this PR then and close this other PR which does the same thing.

Copy link
Member

@ctalledo ctalledo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ... thanks @jamonation!

@ctalledo ctalledo merged commit fed0ad3 into nestybox:master Sep 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants