Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

404 errors when combined with wrapspawner #129

Closed
loadnabox opened this issue Feb 1, 2019 · 5 comments
Closed

404 errors when combined with wrapspawner #129

loadnabox opened this issue Feb 1, 2019 · 5 comments

Comments

@loadnabox
Copy link

loadnabox commented Feb 1, 2019

Apologies if this is the wrong place,struggling through the documentation and hit a brick wall, this was my best guess for help

Scenario:

  • Slurm HPC cluster
  • jupyterhub server is a VM that is also a registered Slurm submission node (slurm jobs can be submitted directly from the jupyterhub node)
  • conda, jupyterhub, etc are located on a shared drive for easy access to all the compute nodes
  • modules are used to preserve multiple versions of software packages
  • jupyterhub has been given it's own private anaconda module

Problem:
I got batchspawner working on it's own first. I later added in lines for wrapspawner
After adding wrapspawner options I now get the below errors:

Slurm output:

[I 2019-02-01 13:23:06.093 BatchSingleUserNotebookApp manager:46] [nb_conda_kernels] enabled, 0 kernels found
[I 2019-02-01 13:23:06.919 BatchSingleUserNotebookApp extension:168] JupyterLab extension loaded from /packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterlab
[I 2019-02-01 13:23:06.919 BatchSingleUserNotebookApp extension:169] JupyterLab application directory is /packages/7x/anaconda3/2018.12-jh/share/jupyter/lab
[W 2019-02-01 13:23:06.931 BatchSingleUserNotebookApp auth:303] Failed to check authorization: [404] Not Found
[W 2019-02-01 13:23:06.931 BatchSingleUserNotebookApp auth:304] {"status": 404, "message": "Not Found"}
Traceback (most recent call last):
  File "/packages/7x/anaconda3/2018.12-jh/bin/batchspawner-singleuser", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/packages/7x/build/batchspawner/0.9.0dev0/batchspawner/scripts/batchspawner-singleuser", line 6, in <module>
    main()
  File "/packages/7x/build/batchspawner/0.9.0dev0/batchspawner/batchspawner/singleuser.py", line 18, in main
    return BatchSingleUserNotebookApp.launch_instance(argv)
  File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyter_core/application.py", line 266, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/packages/7x/build/batchspawner/0.9.0dev0/batchspawner/batchspawner/singleuser.py", line 14, in start
    json={'port' : self.port})
  File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/services/auth.py", line 305, in _api_request
    raise HTTPError(500, "Failed to check authorization")
tornado.web.HTTPError: HTTP 500: Internal Server Error (Failed to check authorization)

From JupyterHub side:

[I 2019-02-01 13:23:01.411 JupyterHub batchspawner:243] Spawner submitted script:
    #!/bin/bash
    #SBATCH -q debug
    #SBATCH -p debug
    #SBATCH -t 0-12:00:00
    #SBATCH -N 1
    #SBATCH -n 1
    #SBATCH -o /home/USER/jupyterhub.%j.out
    #SBATCH -e /home/USER/jupyterhub.%j.err
    #SBATCH --export ALL
    ###SBATCH -w cg1-6
    source /etc/profile
    unset XDG_RUNTIME_DIR
    module load anaconda3/.2018.12-jh
    batchspawner-singleuser --ip="0.0.0.0" --notebook-dir="~"
    
[I 2019-02-01 13:23:01.483 JupyterHub batchspawner:246] Job submitted. cmd: sudo -E -u USER sbatch --parsable output: 859661
[D 2019-02-01 13:23:01.484 JupyterHub batchspawner:269] Spawner querying job: sudo -E -u USER squeue -h -j 859661 -o '%T %B'
[D 2019-02-01 13:23:01.512 JupyterHub batchspawner:369] Job 859661 still pending
[D 2019-02-01 13:23:02.013 JupyterHub batchspawner:269] Spawner querying job: sudo -E -u USER squeue -h -j 859661 -o '%T %B'
[D 2019-02-01 13:23:02.044 JupyterHub batchspawner:369] Job 859661 still pending
[D 2019-02-01 13:23:02.547 JupyterHub batchspawner:269] Spawner querying job: sudo -E -u USER squeue -h -j 859661 -o '%T %B'
[W 2019-02-01 13:23:07.008 JupyterHub log:158] 404 POST /hub/api/batchspawner (USER@10.126.16.15) 1.09ms
[W 2019-02-01 13:23:11.100 JupyterHub base:714] User USER is slow to start (timeout=10)
[I 2019-02-01 13:23:11.178 JupyterHub log:158] 302 POST /hub/spawn?next=%2Fhub%2Fuser%2FUSER%2F -> /hub/user/USER/ (USER@10.126.17.240) 10146.75ms
[D 2019-02-01 13:23:11.280 JupyterHub base:1008] Waiting for USER pending spawn
[I 2019-02-01 13:23:21.281 JupyterHub base:1012] Pending spawn for USER didn't finish in 10.0 seconds
[I 2019-02-01 13:23:21.281 JupyterHub base:1018] USER is pending spawn
[I 2019-02-01 13:23:21.289 JupyterHub log:158] 200 GET /hub/user/USER/ (USER@10.126.17.240) 10079.46ms
[D 2019-02-01 13:23:21.344 JupyterHub log:158] 200 GET /hub/static/css/style.min.css?v=dd1df30ccc6c4d3e9705d78012d25b57 (@10.126.17.240) 2.31ms
[W 2019-02-01 13:24:01.483 JupyterHub user:471] USER's server failed to start in 60 seconds, giving up
[D 2019-02-01 13:24:01.484 JupyterHub batchspawner:269] Spawner querying job: sudo -E -u USER squeue -h -j 859661 -o '%T %B'
[D 2019-02-01 13:24:01.552 JupyterHub user:578] Deleting oauth client jupyterhub-user-USER
[E 2019-02-01 13:24:01.685 JupyterHub gen:974] Exception in Future <Task finished coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/handlers/base.py:619> exception=TimeoutError('Timeout')> after timeout
    Traceback (most recent call last):
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/tornado/gen.py", line 970, in error_callback
        future.result()
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/handlers/base.py", line 626, in finish_user_spawn
        await spawn_future
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/user.py", line 489, in spawn
        raise e
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/user.py", line 409, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
    tornado.util.TimeoutError: Timeout
    
[E 2019-02-01 13:24:01.699 JupyterHub gen:974] Exception in Future <Task finished coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/handlers/base.py:619> exception=TimeoutError('Timeout')> after timeout
    Traceback (most recent call last):
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/tornado/gen.py", line 970, in error_callback
        future.result()
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/tornado/gen.py", line 970, in error_callback
        future.result()
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/handlers/base.py", line 626, in finish_user_spawn
        await spawn_future
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/user.py", line 489, in spawn
        raise e
      File "/packages/7x/anaconda3/2018.12-jh/lib/python3.7/site-packages/jupyterhub/user.py", line 409, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
    tornado.util.TimeoutError: Timeout

Config file:

## Load Batchspawner which enables intergration with SLURM
c.JupyterHub.spawner_class = 'wrapspawner.ProfilesSpawner'
c.Spawner.http_timeout = 120

#------------------------------------------------------------------------------
# BatchSpawnerBase configuration
#    These are simply setting parameters used in the job script template below
#------------------------------------------------------------------------------
#c.BatchSpawnerBase.req_nprocs = '4'
#c.BatchSpawnerBase.req_queue = 'debug'
#c.BatchSpawnerBase.req_runtime = '0-8:00:00'
#c.BatchSpawnerBase.req_memory = '4gb'
c.Spawner.notebook_dir = '~'

#------------------------------------------------------------------------------
# SlurmSpawner configuration
#------------------------------------------------------------------------------
c.SlurmSpawner.batch_script = '''#!/bin/bash
#SBATCH -q {queue}
#SBATCH -p debug
#SBATCH -t {runtime}
#SBATCH -N 1
#SBATCH -n {nprocs}
#SBATCH -o {homedir}/jupyterhub.%j.out
#SBATCH -e {homedir}/jupyterhub.%j.err
#SBATCH --export ALL
###SBATCH -w cg1-6
source /etc/profile
unset XDG_RUNTIME_DIR
module load anaconda3/.2018.12-jh
{cmd}
'''

##  SSL Certificate locations
c.JupyterHub.ssl_cert = '/etc/pki/CA/certs/jupyter.crt'
c.JupyterHub.ssl_key = '/etc/pki/CA/private/jupyter.key'

##  URL for Jupyterhub to bind to
#c.JupyterHub.bind_url = 'https://jupyterhub.localdomain.com:443'

c.JupyterHub.ip = 'jupyterhub.localdomain.com'
c.JupyterHub.port = 443
c.JupyterHub.hub_ip = 'jupyterhub.localdomain.com'

##  Set authentication options
#  prevents JupyterHub from creating local users
c.LocalAuthenticator.create_system_users = False

#  Set admin users (admin users can run jobs and/or manage other users notebook servers)
c.Authenticator.admin_users = {'USER1', 'USER2'}

# Set the Jupyterhub log file location
c.JupyterHub.extra_log_file = '/var/log/jupyterhub.log'

# Set the log level by value or name.
c.JupyterHub.log_level = 'DEBUG'

#  Spawner Profiles
c.ProfilesSpawner.profiles = [
  ( "Local Server", 'local', 'jupyterhub.spawner.LocalProcessSpawner', {'ip':'0.0.0.0'} ),
  ('clustername - 1 core, 4.5GB, 12 hours', 'clustername1c12h', 'batchspawner.SlurmSpawner',
    dict(req_nprocs='1', req_queue='debug', req_runtime='0-12:00:00')),
  ('clustername - 4 cores, 18GB, 8 Hours', 'clustername4c8h', 'batchspawner.SlurmSpawner',
    dict(req_nprocs='4', req_queue='debug', req_runtime='0-08:00:00')),
  ('clustername - 14 cores, 63GB, 4 hours', 'clustername14c4h', 'batchspawner.SlurmSpawner',
    dict(req_nprocs='14', req_queue='debug', req_runtime='0-04:00:00')),
  ('clustername - 28 cores, 128GB 1 hour', 'clustername28c1h', 'batchspawner.SlurmSpawner',
    dict(req_nprocs='28', req_queue='debug', req_runtime='0-01:00:00')),
]

TYIA, help is greatly appreciated

@YFLOPS
Copy link

YFLOPS commented Mar 8, 2019

I have the exact same issue. Any luck getting this figured out?

Does it have anything to do with the 10s timeout for the spawner? I'm not sure how it maintains the tree information internally.

[W 2019-02-01 13:23:11.100 JupyterHub base:714] User USER is slow to start (timeout=10)

@YFLOPS
Copy link

YFLOPS commented Mar 8, 2019

More infomation:

Same issue with master branch on both JupyterHub 0.9.4 and 0.8.1 with the master batchspawner master (0.9dev)

I got my system working using JupyterHub 0.8.1 and batchspawner tag 0.8.1.

@Hoeze
Copy link
Contributor

Hoeze commented Jul 7, 2019

I got the same problem. What would be the correct API call?
Any ideas how to fix this?

Hoeze added a commit to Hoeze/batchspawner that referenced this issue Jul 7, 2019
@Hoeze
Copy link
Contributor

Hoeze commented Jul 7, 2019

I found the root cause of this:
The problem is that the API handler never gets loaded since "batchspawner.api" never was imported.

The best solution to this is to add the following line in your jupyterhub_config.py:

c.JupyterHub.extra_handlers = [(r"/api/batchspawner", 'batchspawner.api.BatchSpawnerAPIHandler')]

See also #126

@rkdarst
Copy link
Contributor

rkdarst commented Sep 6, 2019

I think this is updated in the current README now, with a solution of import batchspawrer, which is a bit more generic and works even if the API handling gets changed. Please let us know if more is needed.

@rkdarst rkdarst closed this as completed Sep 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants