Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting up JupyterHub with HTCondor #125

Closed
nifuki opened this issue Nov 10, 2018 · 6 comments
Closed

Setting up JupyterHub with HTCondor #125

nifuki opened this issue Nov 10, 2018 · 6 comments

Comments

@nifuki
Copy link

nifuki commented Nov 10, 2018

Hi,

I'm trying to make batchspawner work with HTCondor but I'm stuck with the following error:

[I 2018-11-09 13:35:39.816 JupyterHub batchspawner:242] Spawner submitting job using sudo -i -u testuser condor_submit
[I 2018-11-09 13:35:39.816 JupyterHub batchspawner:243] Spawner submitted script:
    
    Executable = /bin/sh
    RequestMemory = 4gb
    RequestCpus = 1
    Arguments = "-c 'exec batchspawner-singleuser --ip=""0.0.0.0""'"
    Remote_Initialdir = /home/testuser
    Output = /home/testuser/.jupyterhub.condor.out
    Error = /home/testuser/.jupyterhub.condor.err
    ShouldTransferFiles = False
    GetEnv = True
    Universe = vanilla
    Queue
    
[I 2018-11-09 13:35:40.119 JupyterHub batchspawner:246] Job submitted. cmd: sudo -i -u testuser condor_submit output: Submitting job(s).
    1 job(s) submitted to cluster 19.
[D 2018-11-09 13:35:40.120 JupyterHub batchspawner:269] Spawner querying job: sudo -i -u testuser condor_q 19 -format "%s, " JobStatus -format "%s, " RemoteHost -format "
    " True
[E 2018-11-09 13:35:40.356 JupyterHub batchspawner:215] Subprocess returned exitcode 1
[E 2018-11-09 13:35:40.357 JupyterHub batchspawner:216] Stdout:
[E 2018-11-09 13:35:40.357 JupyterHub batchspawner:217] b''
[E 2018-11-09 13:35:40.357 JupyterHub batchspawner:218] Stderr:
[E 2018-11-09 13:35:40.357 JupyterHub batchspawner:219] Error: -format requires format and attribute parameters
[E 2018-11-09 13:35:40.357 JupyterHub batchspawner:274] Error querying job 19
[W 2018-11-09 13:35:40.358 JupyterHub batchspawner:372] Job  neither pending nor running.
    
[E 2018-11-09 13:35:40.359 JupyterHub user:477] Unhandled error starting testuser's server: The Jupyter batch job has disappeared while pending in the queue or died immediately after starting.
[D 2018-11-09 13:35:40.373 JupyterHub user:578] Deleting oauth client jupyterhub-user-testuser
[E 2018-11-09 13:35:40.410 JupyterHub web:1670] Uncaught exception GET /hub/user/testuser/ (159.93.40.25)
    HTTPServerRequest(protocol='http', host='jupyterhub.jinr.ru', method='GET', uri='/hub/user/testuser/', version='HTTP/1.1', remote_ip='159.93.40.25')
    Traceback (most recent call last):
      File "/usr/share/anaconda3/lib/python3.7/site-packages/tornado/web.py", line 1592, in _execute
        result = yield result
      File "/usr/share/anaconda3/lib/python3.7/site-packages/jupyterhub/handlers/base.py", line 1052, in get
        await self.spawn_single_user(user)
      File "/usr/share/anaconda3/lib/python3.7/site-packages/jupyterhub/handlers/base.py", line 705, in spawn_single_user
        timedelta(seconds=self.slow_spawn_timeout), finish_spawn_future
      File "/usr/share/anaconda3/lib/python3.7/site-packages/jupyterhub/handlers/base.py", line 626, in finish_user_spawn
        await spawn_future
      File "/usr/share/anaconda3/lib/python3.7/site-packages/jupyterhub/user.py", line 489, in spawn
        raise e
      File "/usr/share/anaconda3/lib/python3.7/site-packages/jupyterhub/user.py", line 409, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
      File "/usr/share/anaconda3/lib/python3.7/site-packages/batchspawner/batchspawner.py", line 373, in start
        raise RuntimeError('The Jupyter batch job has disappeared'
    RuntimeError: The Jupyter batch job has disappeared while pending in the queue or died immediately after starting.

The condor_q command succeeds if ran manually:

# sudo -i -u testuser condor_q 19 -format "%s, " JobStatus -format "%s, " RemoteHost -format "\n" True
1,

# echo $?
0

I'm using the latest batchspawner (from the master):

# pip list |grep batchspawner
batchspawner                       0.9.0.dev0

And the spawner configuration:

c.JupyterHub.spawner_class = 'batchspawner.CondorSpawner'
c.Spawner.http_timeout = 120

c.BatchSpawnerBase.req_nprocs = '1'
c.BatchSpawnerBase.req_memory = '1gb'
c.BatchSpawnerBase.req_runtime = '12:00:00'

c.CondorSpawner.exec_prefix = 'sudo -i -u {username}'

What can be the cause of this error?

Thanks

@nifuki
Copy link
Author

nifuki commented Nov 12, 2018

I think I figured it out: it is due to -format "\n" true on this line, removing it jobs now get monitored.

@nifuki
Copy link
Author

nifuki commented Nov 12, 2018

Still struggling with making it work: the job is submitted now, but the server fails to start. Here is the command which is executed on the node: /bin/sh -c exec' '/usr/share/miniconda3/bin/batchspawner-singleuser' '--ip="0.0.0.0"

And in .jupyterhub.condor.err I only see:
JUPYTERHUB_API_TOKEN env is required to run jupyterhub-singleuser. Did you launch it manually?

Maybe the environment is not set properly, but I can't figure out how to set this up.

Any help appreciated.

Thanks.

@nifuki
Copy link
Author

nifuki commented Nov 14, 2018

Found the problem: sudo wasn't passing the environment variables. I changed the exec_prefix to sudo -E -u {username} and it now works. So, I'm closing the issue.

@nifuki nifuki closed this as completed Nov 14, 2018
@mbmilligan
Copy link
Member

Ok, thanks for the followup. Quick question: had you changed the exec_prefix in your configuration file? sudo -E is supposed to be in the default prefix, so it would be helpful to know if that default setting is getting broken somehow.

Thanks!

@loadnabox
Copy link

I know I'm a bit late to this party but I'm having a very similar issue

I'm trying to run a centralized JupyterHub server for all users so I'm executing it as root. Our environment very carefully set environment variables on login. If those variables are changed jobs do not run properly.

So I'm stuck because using -E passes the JUPYTERHUB_API_TOKEN env properly, but nothing will run (including batchspawner-singleuser will not run) because the commands to load needed packages and modules is broken with the overridden env.

If I use -i The packages and modules load properly, however the JUPYTERHUB_API_TOKEN env is no longer passed and the compute node fails to connect or be registered by jupyterhub.

Advice would be greatly appreciated

@heavenkong
Copy link

The same issue with me. How did you solve this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants