Fleet versions
- Discovered: 4.83.0
- Reproduced: 4.83.0
Web browser and operating system: N/A - it's an issue with the Helm chart
🧑💻 Expected behavior
I expected software installers in the web UI of my self-hosted instance in my GKE cluster to work off the shelf once the S3 bucket is configured in the values.yml for the Helm chart.
💥 Actual behavior
The web UI on my self-hosted FleetDM instance running in my Google Kubernetes Engine in GCP returns an error stating "file system is read-only" when I attempt to add software to my Fleet library via the software installer functionality, leaving me unable to install any software packages across any of my fleets.
🛠️ To fix
Triage
I spent a couple days triaging this issue, including deep diving on my kube cluster settings, node storage settings, cluster node pool settings, and attempting to mount an emptyDir at various mount points (including /tmp and /opt/fleet). I even spun up multiple different GKE clusters with various node pool and local storage configurations.
I pulled the deployment.yaml created by the Helm chart directly from my kube cluster and found that the deployment.yaml manifest generated by the Helm chart includes a spec.template.spec.containers.securityContext.readOnlyRootFilesystem: true config on the main fleet container.
This config option prevents /tmp (and all other directories) from being writable under any circumstance, even though the directory permissions in the container/pod are configured to allow writes.
My hypothesis
When I attempt to install a software package like 1Password, the Fleet server attempts to download the package and cache it locally before uploading the package to the bucket I configured in values.yml for long-term storage. Since the entire filesystem is mounted as read-only, however, the server fails when it attempts to cache the software package locally.
Validation
I fixed this issue in my own cluster by pulling the deployment.yml generated by the Helm chart directly from the kube cluster running my Fleet instance after an install, toggled spec.template.spec.containers.securityContext.readOnlyRootFilesystem to false, and kubectl apply'ed the modified deployment.yml. The software installation functionality now works correctly in my Fleet instance.
Proposed fix
Add a securityContext.readOnlyRootFilesystem setting to values.yml that the Helm chart can pass through to the deployment.yml template. This will allow users to easily fix this issue for themselves if/when they encounter it while preserving the current default behavior.
Additionally, consider removing spec.template.spec.containers.securityContext.readOnlyRootFilesystem: true from the deployment.yml template to change the default behavior, although this is a less conservative approach that may not be appropriate in all circumstances.
🧑💻 Steps to reproduce
- Deploy the Helm chart to a fresh namespace in a GKE cluster
- I suspect this repro will work in non-GKE clusters as well, since it didn't turn out to be a GCP/GKE-specific issue
- Navigate to the Software > Add Software web UI on a fresh self-hosted instance
- Attempt to add 1Password for macOS (just as an example)
- Note the web UI returns an error stating that the filesystem is read-only
These steps:
🕯️ More info (optional)
#6990 Related bug report that was closed a few years ago as completed, but the issue either persists or has been reintroduced.
Fleet versions
Web browser and operating system: N/A - it's an issue with the Helm chart
🧑💻 Expected behavior
I expected software installers in the web UI of my self-hosted instance in my GKE cluster to work off the shelf once the S3 bucket is configured in the
values.ymlfor the Helm chart.💥 Actual behavior
The web UI on my self-hosted FleetDM instance running in my Google Kubernetes Engine in GCP returns an error stating "file system is read-only" when I attempt to add software to my Fleet library via the software installer functionality, leaving me unable to install any software packages across any of my fleets.
🛠️ To fix
Triage
I spent a couple days triaging this issue, including deep diving on my kube cluster settings, node storage settings, cluster node pool settings, and attempting to mount an
emptyDirat various mount points (including/tmpand/opt/fleet). I even spun up multiple different GKE clusters with various node pool and local storage configurations.I pulled the
deployment.yamlcreated by the Helm chart directly from my kube cluster and found that thedeployment.yamlmanifest generated by the Helm chart includes aspec.template.spec.containers.securityContext.readOnlyRootFilesystem: trueconfig on the mainfleetcontainer.This config option prevents
/tmp(and all other directories) from being writable under any circumstance, even though the directory permissions in the container/pod are configured to allow writes.My hypothesis
When I attempt to install a software package like 1Password, the Fleet server attempts to download the package and cache it locally before uploading the package to the bucket I configured in
values.ymlfor long-term storage. Since the entire filesystem is mounted as read-only, however, the server fails when it attempts to cache the software package locally.Validation
I fixed this issue in my own cluster by pulling the
deployment.ymlgenerated by the Helm chart directly from the kube cluster running my Fleet instance after an install, toggledspec.template.spec.containers.securityContext.readOnlyRootFilesystemtofalse, andkubectl apply'ed the modifieddeployment.yml. The software installation functionality now works correctly in my Fleet instance.Proposed fix
Add a
securityContext.readOnlyRootFilesystemsetting tovalues.ymlthat the Helm chart can pass through to thedeployment.ymltemplate. This will allow users to easily fix this issue for themselves if/when they encounter it while preserving the current default behavior.Additionally, consider removing
spec.template.spec.containers.securityContext.readOnlyRootFilesystem: truefrom thedeployment.ymltemplate to change the default behavior, although this is a less conservative approach that may not be appropriate in all circumstances.🧑💻 Steps to reproduce
These steps:
🕯️ More info (optional)
#6990 Related bug report that was closed a few years ago as completed, but the issue either persists or has been reintroduced.