Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return poll status after first load finish #742

Merged
merged 2 commits into from Aug 14, 2023
Merged

Conversation

ivyxjc
Copy link
Contributor

@ivyxjc ivyxjc commented Jun 7, 2023

Now, spawner does not wait for fist load finish. So it cannot detect the running pod and return incorrect status to hub.

Update by Erik

This is a bugfix for a regression introduced with KubeSpawner version 5.0.0 and Z2JH since 3.0.0 (or the pre-release 3.0.0-alpha.1 or the development release 3.0.0-0.dev.git.6133.hbfc583f8). It is resolved in KubeSpawner 6.1.0 and z2jh 3.1.0.

For more information and help cleaning up orphaned user pods, see https://discourse.jupyter.org/t/how-to-cleanup-orphaned-user-pods-after-bug-in-z2jh-3-0-and-kubespawner-6-0/21677

@welcome
Copy link

welcome bot commented Jun 7, 2023

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its super tricky to follow how KubeSpawner works with this, so I'm struggling to review this.

Did you run into issues with this using enable_user_namespaces set to True @ivyxjc ?

Related

kubespawner/spawner.py Outdated Show resolved Hide resolved
@ivyxjc
Copy link
Contributor Author

ivyxjc commented Jun 7, 2023

I ran into the issue with enable_user_namespaces set to default value False.

I am encountering the following problem:
When I restart the hub pod, some opening notebooks ran into Service Unavailable .

And I found out that when hub starts, hub will delete some users' server form database due to kubespawner.poll() does not return correct status.
The root cause is that kubespwaner.poll() return status even the first load doest not finish.

https://github.com/jupyterhub/jupyterhub/blob/be07c7ef31ea586fcb49143e8dc6b942aadbb8ec/jupyterhub/app.py#L2527

Comment on lines 2188 to 2190
reflector = await self._start_watching_pods()
if not reflector.first_load_future.done():
await reflector.first_load_future
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minrk I think this makes sense, but would appreciate your review help.

I'm especially thinking about if we should put this code in _start_watching_pods or _start_reflector.

Currently, self._start_watching_pods is awaited from _start, stop, and poll, where _start also awaits await self._start_watching_events.

Should do the await on first_load_future if needed from _start_reflector?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think awaiting it in _start_reflector makes sense. If I were to guess based on my foggy memory of the distant past, I think perhaps _start_reflector maybe used to have a requirement to not be async, so it couldn't wait for things? That doesn't appear to be the case anymore.

@danilopeixoto
Copy link

danilopeixoto commented Aug 12, 2023

We've implemented a copy of KubeSpawner with minor changes. We also noticed the Hub was deleting the spawner object of running servers because it couldn't find the server resources in the reflector at startup (init_spawners). The solution presented in the pull request solved our problem. Now the poll method waits for the reflector to flood for the first time.

We did not test the implementation in the original KubeSpawner.

@minrk
Copy link
Member

minrk commented Aug 14, 2023

Thanks for the comment @danilopeixoto! I think that this bug may be the cause of jupyterhub/mybinder.org-deploy#2686 leaving orphan pods taking up space on mybinder.org.

I moved the await of first_load to inside _start_reflector, so it's always awaited and hopefully less likely to get missed.

@minrk minrk merged commit 0e5bb46 into jupyterhub:main Aug 14, 2023
9 checks passed
@welcome
Copy link

welcome bot commented Aug 14, 2023

Congrats on your first merged pull request in this project! 🎉
congrats
Thank you for contributing, we are very proud of you! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants