-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSF integration issue #118
Comments
If you try submitting that batch script as the user the hub is running as, does it work? Depending on your config, this may be with a sudo wrapper or not. I'm not sure the requirements for your cluster. The log line right above should have said what command was being used to submit the job, that would be helpful to tell what is happening (can your user sudo to itself and lose something which is required to submit jobs?) But... since it sounds like you are running as your username and spawning your username, the above may not apply. More likely solution: some environment variables are missing which the For example, on my cluster the job is submitted with "sudo -u {username}", and no further environment (like a kerberos ticket) is required to submit jobs. |
Thank you for the feedback. Jupyterhub is running as myself. We’ve made progress by defining an extra env variable EGO_CONFDIR and added it to the env variable in py I have limited LSF to the local server for now, I can now login but the notebook shuts down after 30s. Any idea how to address this? We will then test the hub running under root and more servers. |
To clarify: EGO_CONFDIR was added where? Is this a generic LSF variable or something special to your site? Assuming the job runs and the singleuser server starts, JupyterHub will automatically stop the server if it doesn't get a positive response that it's up. This is should be seen in the logs. It connects back using If I get JH logs that show when it starts and is cancelled, I can possibly say more. Also the singleuser server logs (stdout of the batch job) give important clues about if it can't connect back. |
Is this working enough (or no longer relevant) so that it can be closed now? |
As an aside we have batchspawner working with LSF |
Hi,
We have an LSF Cluster that has been successfully tested with a python bsub submission. Jupyterhub / anaconda installed on the same serverworks wonderfully with the out of the box configuration.
I switched to batchapawner to test LSF which comes part of a SAS grid installation.
To keep things simple, jupyterhub is running under my own account to avoid any spawning issues. I have replaced my username with myusername
[I 2018-09-05 11:42:27.775 JupyterHub batchspawner:189] Spawner submitted script:
#!/bin/sh
#BSUB -R "select[type==any]" # Allow spawning on non-uniform hardware
#BSUB -R "span[hosts=1]" # Only spawn job on one server
#BSUB -q
#BSUB -J spawner-jupyterhub
#BSUB -o /home/myusername/.jupyterhub.lsf.out
#BSUB -e /home/myusername/.jupyterhub.lsf.err
[D 2018-09-05 11:42:27.780 JupyterHub base:427] 0/100 concurrent spawns
[D 2018-09-05 11:42:27.781 JupyterHub base:430] 0 active servers
...
[E 2018-09-05 11:27:08.847 JupyterHub user:427] Unhandled error starting myusername's server: /opt/sas94/thirdparty/platform/lsf/9.1/linux2.6-glibc2.3-x86_64/etc/eauth: read conf error!
Failed in an LSF library call: External authentication failed. Job not submitted.
The text was updated successfully, but these errors were encountered: