Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Mounting a NFS share as the users home locks ip ipykernel #1820

Closed
antofthy opened this issue Nov 1, 2022 · 8 comments
Closed

[BUG] - Mounting a NFS share as the users home locks ip ipykernel #1820

antofthy opened this issue Nov 1, 2022 · 8 comments
Labels
type:Bug A problem with the definition of one of the docker images maintained here

Comments

@antofthy
Copy link

antofthy commented Nov 1, 2022

What docker image(s) are you using?

minimal-notebook

OS system and architecture running docker image

RHEL7 x86_64

What Docker command are you running?

It runs in a swarm environment...

docker service create --detach --init --restart-condition=none --replicas 1 --name s357751 --hostname s357751.**domain**  --env HOME=/home/jovyan  --network ingress_net jupyter/minimal-notebook:latest

and then with a local mount of a NFS mount

docker service create --detach --init --restart-condition=none --replicas 1 --name s357751 --hostname s357751.**domain** --mount type=bind,src=/export/s357751/jovyan,dst=/home/jovyan --env HOME=/home/jovyan  --network ingress_net jupyter/minimal-notebook:latest

The contents of the direct has the correct uid:gid or 1000:100 and is a exact replica of the /home/jovyan within the image

How to Reproduce the problem?

The first command above (no mount, just the image itself) works perfectly fine.
I get the token from the logs, and login, and can run ipykernel notebooks

The second command (with a local bind mount of an NFS mount) also logs in fine, Lab pages are visible, and terminals can be opened. But as soon as you try to run a ipykernel the web site locks up.
The server is running but no longer responds to any queries, from anywhere.
The log output ends with the line [I 2022-10-31 16:47:10.952 ServerApp] Creating new notebook in and no further.

If the directory is mounted else where I can see and access it, and looks perfect fine, with expected UIDs and files.
Mounting as just the work sub-irectory also works fine. Even if you start jupyter notebook while labs is in the mounted work sub-directory. As such it likely involve home 'dot' files.

This system of mounting has been used for may other docker environments and has been in use by users for more than three years with code-server, NPM, apache, and larval software. Only Jupyter Notebook locks up, using many attempts to allow its use, on and off over many years. This is the first time I traced the problem so it only happens with the mounted home, while working fine without the mounted home.

Command output

No response

Expected behavior

ipykernel starts and notebook page appears.

Actual behavior

The notebook page does not appear, and a refresh results in a timeout to read data.

Anything else?

The NFS mount that docker does a bind mount to is...

storage.**domain**:/homes on /export type nfs (rw,nosuid,nodev,noatime,nodiratime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.250.201.58,mountvers=3,mountport=635,mountproto=udp,local_lock=none,addr=10.250.201.58)

It is used for the homes for may other docker containers for our users, (running code-server).

The only thing I can think of is perhaps there is a lockfile that does not work with the NFS mount docker is bind mounting against.

@antofthy antofthy added the type:Bug A problem with the definition of one of the docker images maintained here label Nov 1, 2022
@antofthy
Copy link
Author

antofthy commented Nov 1, 2022

UPDATE:
It only happens with mounting from the NFS home share mount, which is a requirement for a full swarm service, where the docker container may run on any of a number of hosts.
Running with the home mounted from the host machines local disk works fine! I can run ipykernel (notebook) just fine.

docker service create --detach --init --restart-condition=none --replicas 1 --name s357751 --hostname s357751.**domain** --mount type=bind,src=/home/jovyan,dst=/home/jovyan --env HOME=/home/jovyan --network ingress_net jupyter/minimal-notebook:latest

Mount info...

# mount | grep /home
/dev/mapper/rootvg-lv_home on /home type xfs (rw,nosuid,nodev,noatime,nodiratime,attr2,inode64,noquota)

So it is likely something about NFS. Perhaps some sort of lockfile failure. NFS is known for file lock problems.

@antofthy
Copy link
Author

antofthy commented Nov 1, 2022

UPDATE 2

At moment of failure system reports...
lockd: cannot monitor storage.**domain**

So is there anything that can be done? I have no control over the source of the storage.

@Bidek56
Copy link
Contributor

Bidek56 commented Nov 1, 2022

Have you looked at SO?

@antofthy
Copy link
Author

antofthy commented Nov 2, 2022

Have you looked at SO?

Yes I did... It is irrelevant.
1/ The whole home is being mount not just the "work" sub-directory. (this will be used for multiple users - students)
2/ UID/GID is not a problem. As mentioned I was very careful is ensuring UID/GID was correct.
3/ Everything works fine without a bind mount, or with a bind mount not involving NFS.

I did track it down to being cause by NFS file locking. If I can determine what file is being locked, I could posibly move it out of the home directory (directory symlink?)

I am also investigating getting the NFS file locking working, but I don't have control of the source systems.

So really the issue boiled down to...

  • What file is being locked by notebook?
  • Can something be done from the jupyter side?

@Bidek56
Copy link
Contributor

Bidek56 commented Nov 2, 2022

  1. If you are trying to set up a multi user learning env, I would suggest using Google Colab and avoid all the setup issues.
  2. Jupyter NFS issues should be asked/addressed on SO or Jupyter Forum not on GH. :)

@antofthy
Copy link
Author

antofthy commented Nov 3, 2022

Thank you Bidek56, for the suggestion about asking on the Jupyter Forum.

A user (big thanks to kevin-bates) who was able to point to some code information about NFS problems which resolved the problem! Notebook fails on NFS mount, lockd not available

For reference... It is caused by a SQLite history in ipykernal issue
with the solution in the code SQLite is known to misbehave on some NFS mounts

@antofthy antofthy closed this as completed Nov 3, 2022
@antofthy
Copy link
Author

antofthy commented Nov 3, 2022

Addendum... Perhaps this should be added to docker-stacks trouble shooting with regards to NFS mounted homes?

@Bidek56
Copy link
Contributor

Bidek56 commented Nov 3, 2022

Feel free to submit a PR, this stack has a documentation and recipe section. Perhaps add an entire recipe based on your findings.
I am glad you were able to resolve your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Bug A problem with the definition of one of the docker images maintained here
Projects
None yet
Development

No branches or pull requests

2 participants