Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ownership change init container with K8s FSGroup #1078

Merged

Conversation

tangledbytes
Copy link
Member

@tangledbytes tangledbytes commented Mar 29, 2023

Explain the changes

This PR attempts to replace FS ownership change init container with K8s FSGroup in an attempt to:

  1. Fix Bug: failing to change ownership of the NFS based PVC for PostgreSQL pod by using kube_pv_chown utility noobaa-core#7030.

FSGroup unfortunately is not a silver bullet:

  • Its usage helps us gets rid of init container but that just delegates the responsibility of changing permissions and setting owner to kubelet. In theory, this should be more performant because CRI doesn't have to spin up a container just so that we can do our permission adjustment rather kubelet does that automatically on the mounted volume.
  • Kubelet will perform 2 actions on the mounted directory
    • Change the group of the owner to the fsGroup passed in pod spec. Kubelet does this simply by running os.Lchown(file, -1, int(*fsGroup)) which is run as a root user. This should work most of the time HOWEVER in the issue linked above, it WILL fail because any operation performed by the root user will be executed as nobody:nogroup by NFS. This does NOT causes any issues however because kubelet just logs the error instead of erroring out. This works out in our favor in THIS case.
    • Change the directory permissions to 0660 | os.ModeSetgid | 0110 which ensures that we get drwxdrwx-- permission on the mounted directory which is good enough for us because we run group ID 0 while os.ModeSetgid set the first octal to 4 implying that directories and files created within this directory will inherit the group ownership. This suffers the same problem in NFS (root_squash) mounts but because kubelet will not error out, it should work just fine.

So this does NOT exactly fixes out problems with NFS mounts but makes sure that our DB pod doesn't ends up in CLBO because of permission issues but will crash if our second init container fails to do its thing (which it will if NFS mount does not provide enough permissions - which I think are available in the above issue's environment)

Reference: Kubernetes FSGroup Source Code

Tests Performed

  1. Deploy and run in fresh Minikube Cluster.
  2. Deploy and run in fresh OpenShift Cluster (AWS gp2 storageclass).
  3. Deploy and run in fresh OpenShift Cluster (AWS NFS mount with drwxdrwxdrwx mount permissions and root_squash enabled).

Issues: Fixed #xxx / Gap #xxx

Fixes noobaa/noobaa-core#7030.

Testing Instructions:

Passing pre-existing CLI tests should be good enough indication as only deployment has changed.

@tangledbytes tangledbytes force-pushed the utkarsh-pro/remove/db-ownership-init branch from 0a34bf5 to 9a82c3c Compare March 30, 2023 02:39
@tangledbytes tangledbytes requested a review from a team March 30, 2023 08:45
@baum
Copy link
Contributor

baum commented Mar 30, 2023

Hello 👋 Ukarsh,

You have described test scenarios verbosely, however all reuse the existing PV. I'm wondering about clean install scenario in OpenShift environment. Is DB pod without ownership change init container able to use a fresh PV which was just provisioned for it?

For instance, for the "2. Normal NooBaa deployment in OpenShift" what happens if we change "ii. Delete the DB Pod." to "ii. Delete the DB Pod and PVC" making sure DB gets a fresh PV?

@tangledbytes
Copy link
Member Author

Hi @baum! That's a good test! I haven't performed it yet. Lemme get back to you once I am done testing this. Please lemme know if there are more tests that I should perform here.

@tangledbytes tangledbytes marked this pull request as draft April 10, 2023 13:34
@tangledbytes
Copy link
Member Author

tangledbytes commented Apr 10, 2023

Under development - ignore for review ATM.

@tangledbytes tangledbytes changed the title Remove ownership change init container Replace ownership change init container with K8s FSGroup Apr 11, 2023
@tangledbytes tangledbytes force-pushed the utkarsh-pro/remove/db-ownership-init branch from faccdf7 to 6a64533 Compare April 11, 2023 10:48
@pull-request-size pull-request-size bot added size/L and removed size/M labels Apr 11, 2023
@tangledbytes tangledbytes force-pushed the utkarsh-pro/remove/db-ownership-init branch from 36a01c7 to 721645d Compare April 11, 2023 12:36
@pull-request-size pull-request-size bot added size/M and removed size/L labels Apr 11, 2023
@tangledbytes tangledbytes marked this pull request as ready for review April 11, 2023 12:52
@tangledbytes
Copy link
Member Author

I don't think that golangci-lint is upset with the PR rather something related to actions/runner-images#7188? 🤔

@tangledbytes tangledbytes requested review from baum, a team and jackyalbo and removed request for a team April 11, 2023 14:11
@tangledbytes tangledbytes force-pushed the utkarsh-pro/remove/db-ownership-init branch from 721645d to 9dcfb0e Compare April 13, 2023 13:44
@tangledbytes tangledbytes requested review from dannyzaken, a team and liranmauda and removed request for a team April 13, 2023 13:54
@baum
Copy link
Contributor

baum commented Apr 16, 2023

I don't think that golangci-lint is upset with the PR rather something related to actions/runner-images#7188? 🤔

Please rebase, hope #1088 fixes the golangci-lint issue

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix fsgroup

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>
@tangledbytes tangledbytes force-pushed the utkarsh-pro/remove/db-ownership-init branch from 9dcfb0e to a15e2e5 Compare April 17, 2023 03:15
@tangledbytes
Copy link
Member Author

Thank you @baum! It fixed the test.

@tangledbytes tangledbytes merged commit d5a8531 into noobaa:master Apr 17, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: failing to change ownership of the NFS based PVC for PostgreSQL pod by using kube_pv_chown utility
4 participants