New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #285 and make few tweaks to notebook Dockerfile #311
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
note @jlewi - we'll need to re-push images to hub after it merges |
done | ||
fi | ||
|
||
# Ensure pv directory belongs to user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you no longer changing the ownership of the CONDA_DIR and JULIA_PKG_DIR.
Can you explain what the new code does differently? It looks like the only difference is that instead of doing chown with the UID we use the username now.
Do you know why that makes a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Difference is that, instead of changing only CONDA_DIR and JULIA_PKG_DIR (both of which lives in /home/$NB_USER) we chown whole /home/$NB_USER - most importantly /home/$NB_USER/work
Because we change user here in the script (line 48) and not in kubernetes manifest, PV isn't connected with correct user ownership, it assumes root (if you look at our notebook pod manifesto, it explicitly runs container as UID 0). I'm doing chown in startup script instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why run the container as UID 0? Should we not be doing that?
The original code was
for d in "$CONDA_DIR" "$JULIA_PKGDIR" "/home/$NB_USER"; do
So it chown'd the whole /home/$NB_USER dir.
It looks like the difference is that you are now doing this always as opposed to
if [ "$NB_UID" != $(id -u $NB_USER) ]
What I'm inferring from this is that NB_UID = id NB_USER therefore in the current startup script we are not executing the chown.
However, for some reason the permsions on /home/${NB_USER} are not set correctly to allow that user to use that container.
If UID is 0, doesn't that mean we are running as root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you're right, I'll restore CONDA_DIR etc, issue was "if" above.
I think we shouldn't run it as root, maybe ensure perms for these dirs in Dockerfile. I'll open separate issue for that one tho as someone did that on purpose:)
Yeah we run pod as root, but start.sh does su $NB_USER <> so we run jupyter itself as NB_USER. Not a typical way to deal with it tbh.
/assign |
/assign jlewi |
Besides adding chown /home/$USER also move non-install layers lower in Dockerfile to make better use of docker cache during edits
If we run as root ( #300 ) this issue wouldn't happen too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but can you update the PR description and include links to relevant issues and the slack link as well.
Done, I linked #101 which has slack link in discussion |
/assign @willb Adding Will because apparently running as root causes problems for open shift. |
@inc0 clued me in
|
* Update Kubeflow org admin * Update org.yaml
Besides adding chown /home/$USER also move non-install layers lower in
Dockerfile to make better use of docker cache during edits
Later we will probably switch to root user (issue #300), so it will make this whole section redundant.
Aslo touches #101
Fixes #285
This change is