Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Error in Docs: Incorrect Config suggested in docs related to integration of EFS storage and EKS IAM #3280

Open
neoakris opened this issue Nov 28, 2023 · 2 comments
Labels

Comments

@neoakris
Copy link

neoakris commented Nov 28, 2023

Background Context:

  • I've never used / installed jupyter notebooks before
  • I ran into this on a 4-hour troubleshooting call while helping another engineer debug their environment.
  • The details I'll post are my recollection of a troubleshooting call that ultimately resulted in fixing the issue. The notes will be incomplete as I'm going off memory of bits and pieces seen over screen share, but I figured It'd be worth documenting the relevant notes I can call while they're still fresh in my head, in case this helps anyone else in the future.
  • Since this wasn't my environment / a tool I'm not familiar with:
    • I don't know how to reproduce it
    • and I'm limited in the amount of details I can share.

Bug description

Following docs seems to suggest problematic configuration:
https://z2jh.jupyter.org/en/stable/kubernetes/amazon/efs_storage.html

The gist of the config problem is:

  • If you follow the doc's recommended config, it does make EFS storage access work.
  • BUT that config causes an integration issue in that if you were leveraging EKS IAM from the Jupyter Notebook web terminal that will stop working.

Here's a screen-snip of what the docs looked like at the time of the issue:
image

Proposed change of better suggested configuration:

I think the recommended config should look more like
uid: 0
fsGid: 100 (or blank/omitted entirely)

How to reproduce

  • I can't see background context.

Your personal set up

  • I'll be limited in details I can share, due to background context.
  • Running on EKS with EFS
  • Sometimes commands like aws sts get-caller-identity were run from the web terminal.
    So the web terminal was leveraging both:
    • AWS IAM integration
    • AWS EFS mount point integration.

Expected behavior

  • If a user is using a workflow that leverage's aws cli commands in Jupyter Notebook's web terminal
    (like aws sts get-caller-identity)
  • Then they want to add EFS support. Following the EFS support docs. Shouldn't make EFS work in exchange for breaking EKS IAM. Both should work.

Actual behavior

  • If you have EKS IAM working, and you go to add EFS support. Following the docs results in EFS working at the expense of IAM breaking.
  • Notes from a post call brain dump. (hopefully this helps others running into similar issue debug faster in the future)

Notes:

  • adding
    uid: 0
    fsGid: 0
    CHOWN_HOME: "yes"
    Which was mentioned in their docs
    https://z2jh.jupyter.org/en/stable/kubernetes/amazon/efs_storage.html
  • Adding the above fixed efs, but resulted in IAM breaking.
  • By IAM breaking I mean:
    Running aws sts get-caller-identity in Jupyter web interface's terminal would fail with a filesystem permission error.:
    [Errno 13] Permission denied: '/var/run/secrets/eks.amazonaws.com/serviceaccount/token'
  • Important detail: The docker image had logic in startup.sh & single user.sh to do a live change of the user from root to jovyan upon startup, the temp root access was likely used to change the ownership of home (efs) to the jovyan user.
  • The reason it was broke seemed to be that the script changed the active shell user & ownership of most files on the container's file system to jovyan, BUT there was a key file related to IAM that was still owned by root as a result of the container starting off as root user.
    /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    was owned by root user and group, by root:root,
    (per ls -lah /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    so the jovyan user didn't have access.
    (we played around a bit and found that even if you override the kube yaml defaults which list that as read only, it stays read only due to the nature, so it's permissions can't be updated at run time / only established at container creation time.)
  • We did discover a hacky workaround that allowed both (efs and iam) to work at the same time using the specs recommended in the doc (of uid:0, fsGid:0)
    The workaround involved:
    • updating 2 settings to enable sudo to work in the container
      https://z2jh.jupyter.org/en/stable/resources/reference.html#singleuser-storage-static
      was used as a point of reference
      singleuser.allowPrivilegeEscalation was set to true
      and we had to enable some setting in another spot that I can't recall off the top of my head.
      That allowed the following commands to work
    • sudo cp /var/run/secrets/eks.amazonaws.com/serviceaccount/token /home/jovyan/token
    • sudo chown $USER:$USER /home/jovyan/token
    • export AWS_WEB_IDENTITY_TOKEN_FILE=/home/jovyan/token
    • aws sts get-caller-identity
    • (before aws sts get-caller-identity was throwing a file system permission error, when whoami returned jovyan)
    • The above allowed both efs & IAM to work. It was a hacky manual workaround, but it at least proved it was possible for both to work at the same time.
  • Removing those newly added configuration's (uid: 0, fsGid: 0) that the docs (https://z2jh.jupyter.org/en/stable/kubernetes/amazon/efs_storage.html) suggested should be added to make efs work, brought efs back into a broken state, but fixed IAM. (basically rolled back to the previous config.)
    • When IAM was working
      ls -lah /var/run/secrets/eks.amazonaws.com/serviceaccount/token
      showed jovyan had access to it.
      (I think the file system permissions were set to user_id:group_id, 1000:100, which would correspond to jovyan:users)
  • Through Trial and error, we discovered a solution that allowed both (EKS IAM and EFS storage mount) to work at the same time.
    • We went against what the docs recommended and set it to (uid:0, & blank fsGid, which I think has an explicit default of fsGid:100 / represents a file system group named users.)
    • I think we also left enable root or singleuser.allowPrivilegeEsclation enabled as well from our testing, but I don't recall if that was actually needed or not.
    • That allowed both (AWS IAM calls and EFS file system access) to work. I'll try to recall some observations of the setup.
      • Even though uid:0 was set, there were some startup scripts built into the container that made it so the user you got when you requested an interactive web terminal via the web GUI interface would be jovyan when checked with the whoami command.
      • In that setup of the ideal config ls -lah /var/run/secrets/eks.amazonaws.com/serviceaccount/token
        showed 0:100 (owned by "root"(0), and group named "users"(100) had access), which allowed aws cli commands from user jovyan to continue working.
@neoakris neoakris added the bug label Nov 28, 2023
Copy link

welcome bot commented Nov 28, 2023

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@manics
Copy link
Member

manics commented Nov 28, 2023

Thanks for the notes. I think the EFS doc is aimed at a relative newcomer to AWS. If you've got more experience of AWS and you've got some time it might be worth checking the latest offerings from AWS. For example, it looks like there's an EKS CSI driver for EFS: https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html
If you have a chance to look at this please let us know if it's a better replacement for the current instructions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants