Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'SingleUserLabApp' object has no attribute 'io_loop' #609

Closed
mriedem opened this issue Nov 10, 2021 · 3 comments
Closed
Labels

Comments

@mriedem
Copy link

mriedem commented Nov 10, 2021

Description

I'm doing some stress and scale testing on a testing environment jupyterhub deployment using zero-to-jupyterhub-k8s 1.2.0 with jupyterhub 1.5.0, kubespawner 1.1.2 and jupyter-server 1.11.2:

>>> print(jupyter_server.__version__)
1.11.2

We build our own user image based on jupyter/scipy-notebook:837c0c870545 from docker-stacks and then with some additional extensions and updates (like jupyterhub 1.5.0).

I was running a scale up test using hub-stress-test to 1000 singleuser-server pods (these are basically micro pods just to stress the hub and proxy [core pods], we care less about the actual notebook servers doing anything).

In 1 out of 1000 notebook servers there was a failure to spawn. In the hub log I noticed this:

Nov 10 14:57:09 hub-85fcbd49b8-p52dt hub ERROR ERROR 2021-11-10T20:57:09.759Z [JupyterHub gen:623] Exception in Future <Task finished name='Task-33261' coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /usr/local/lib/python3.8/dist-packages/jupyterhub/handlers/base.py:900> exception=TimeoutError("Server at http://172.30.102.60:8888/user/hub-stress-test-979/ didn't respond in 30 seconds")> after timeout
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/tornado/gen.py", line 618, in error_callback
        future.result()
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/handlers/base.py", line 907, in finish_user_spawn
        await spawn_future
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/user.py", line 748, in spawn
        await self._wait_up(spawner)
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/user.py", line 795, in _wait_up
        raise e
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/user.py", line 762, in _wait_up
        resp = await server.wait_up(
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/utils.py", line 236, in wait_for_http_server
        re = await exponential_backoff(
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/utils.py", line 184, in exponential_backoff
        raise TimeoutError(fail_message)
    TimeoutError: Server at http://172.30.102.60:8888/user/hub-stress-test-979/ didn't respond in 30 seconds

Looking in the notebook server pod logs for that pod I see this:

Nov 10 14:56:58 jupyter-hub-stress-test-979 notebook [C 2021-11-10 20:56:58.977 SingleUserLabApp notebookapp:1972] received signal 15, stopping
Nov 10 14:56:58 jupyter-hub-stress-test-979 notebook [E 2021-11-10 20:56:58.977 LabApp] Exception while loading config file jupyter_lab_config
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.8/genericpath.py", line 30, in isfile
        st = os.stat(path)
    FileNotFoundError: [Errno 2] No such file or directory: '/home/jovyan/jupyter_lab_config.json'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.8/site-packages/traitlets/config/application.py", line 738, in _load_config_files
        config = loader.load_config()
      File "/opt/conda/lib/python3.8/site-packages/traitlets/config/loader.py", line 560, in load_config
        self._find_file()
      File "/opt/conda/lib/python3.8/site-packages/traitlets/config/loader.py", line 542, in _find_file
        self.full_filename = filefind(self.filename, self.path)
      File "/opt/conda/lib/python3.8/site-packages/traitlets/utils/__init__.py", line 58, in filefind
        if os.path.isfile(testname):
      File "/opt/conda/lib/python3.8/genericpath.py", line 30, in isfile
        st = os.stat(path)
      File "/opt/conda/lib/python3.8/site-packages/notebook/notebookapp.py", line 1973, in _signal_stop
        self.io_loop.add_callback_from_signal(self.io_loop.stop)
    AttributeError: 'SingleUserLabApp' object has no attribute 'io_loop' 

I'm not sure what's going on there, maybe a race condition while the pod is starting up? Or maybe because it was taking a long time to start something started failing in weird ways? Here is a gist of the notebook app logs:

https://gist.github.com/mriedem/384ff1578aca0163e743fe4ddea176a7

We can see that it's 34 seconds from the time the app is starting up to the time that error happens. I'm wondering if maybe we hit this and the hub/kubespawner killed the pod?

https://jupyterhub.readthedocs.io/en/stable/api/spawner.html#jupyterhub.spawner.Spawner.http_timeout

image

Actually digging more into the hub logs it looks like yes the hub hits that http_timeout and then kills the pod, here are the hub logs scoped for that pod:

https://gist.github.com/mriedem/8d226d7f934ae27e71a35ef6ba8a13ec

So I guess the issue is just there is an ugly AttributeError in the notebook app / jupyter-server logs when there is a race on startup where the pod is killed during startup and things aren't all setup. I guess that's probably hard to predict and account for though so probably a low priority issue to resolve.

Reproduce

Hard to reproduce. I've been running scale tests all day and have hit maybe a couple of the http_timeouts in the hub logs but only this one AttributeError in the notebook server app logs.

Expected behavior

Not to see AttributeError red herring-like errors in the app logs.

Context

  • Operating System and version: ubuntu focal
  • Browser and version: n/a
  • Jupyter Server version: 1.11.2
@mriedem mriedem added the bug label Nov 10, 2021
@welcome
Copy link

welcome bot commented Nov 10, 2021

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@vidartf
Copy link
Member

vidartf commented Nov 11, 2021

I see that your stack trace has the error in notebook package. The corresponding code in jupyter-server already has a fix for this:

def stop(self, from_signal=False):
"""Cleanup resources and stop the server."""
if hasattr(self, "_http_server"):
# Stop a server if its set.
self.http_server.stop()
if getattr(self, "io_loop", None):
# use IOLoop.add_callback because signal.signal must be called
# from main thread
if from_signal:
self.io_loop.add_callback_from_signal(self._stop)
else:
self.io_loop.add_callback(self._stop)

So please consider updating to use jupyter-serer and then re-run the stress tests using that 😃

@mriedem
Copy link
Author

mriedem commented Nov 11, 2021

I see that your stack trace has the error in notebook package.

Gah you're right:

$ kubectl -n jhub exec -it hub-b59477988-rr5k9 -- printenv JUPYTERHUB_SINGLEUSER_APP
command terminated with exit code 1

We have migrated to using the jupyterlab UI and have jupyterlab and all that installed in our singleuser-server images but apparently the hub isn't using it to spawn those. 🤦

Thanks for looking at this, I'll close it and report back on anything I find from scale testing when we switch to actually using jupyter-server 😅 .

@mriedem mriedem closed this as completed Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants