Skip to content

Don't replace workers that exited normally#68

Merged
pattonw merged 1 commit into
funkelab:masterfrom
aschampion:worker-explosion
Apr 14, 2026
Merged

Don't replace workers that exited normally#68
pattonw merged 1 commit into
funkelab:masterfrom
aschampion:worker-explosion

Conversation

@aschampion
Copy link
Copy Markdown
Contributor

reap_dead_workers now keeps normally exited workers in the pool dict
so they are still counted toward the target, preventing a reap replace
cycle that caused unbounded worker growth. The bug was introduced in
#67.

There is still an unclean separation of failure and retry logic, since
some crashes go through the retry path (if _spawn_wrapper catches the
exception), while others are reaped here. But this fixes the immediate
bug and the reaper is still an improvement for terminations/kills not
caught in python. There is a pre-existing bug that workers exiting with 0
(e.g., because of poor exception handling in some libraries) can eventually
fill the pool with None workers and stall. This does not alter behavior in that case.

LLM disclosure: while I diagnosed and debugged the root issue, LLMs were used to exclude other causes and draft fixes. All code here has been human reviewed or created.

reap_dead_workers now keeps normally-exited workers in the pool dict
so they are still counted toward the target, preventing a reap-replace
cycle that caused unbounded worker growth. The bug was introduced in
funkelab#67.

There is still no clean separation of failure and retry logic, since
some crashes go through the try fail (if _spawn_wrapper catches the
exception), while others are reaped here. But this fixes the immediate
bug and the repear is still an improvement for terminations/kills not
caught in python.
@pattonw pattonw merged commit ea05304 into funkelab:master Apr 14, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants