Skip to content

Stuck RELEASING jobs - debugging#83

Merged
praiskup merged 2 commits intomainfrom
releasing-logging
Apr 27, 2022
Merged

Stuck RELEASING jobs - debugging#83
praiskup merged 2 commits intomainfrom
releasing-logging

Conversation

@praiskup
Copy link
Owner

No description provided.

@praiskup praiskup force-pushed the releasing-logging branch 2 times, most recently from 755a849 to 1f6412f Compare April 25, 2022 08:39
Sometimes workers get stuck in RELEASING state forever (until server
restart, when all the RELEASING resources are removed).  See the
Copr issue #2115 for more info [1].

Some related output in hook.log exists, but because the affected
resources usually get into RELEASING state times during their lifetimes,
it's not easy to tell if
(a) the failed ReleaseWorker run for the affected resource actually
failed, or
(b) if it started, but the cmd_release command (Popen) failed, or
(c) ReleaseWorker only failed to store the release status to DB.
With this commit, we should have an evidence next time and see what is
the problem to be fixed.

Relates: https://pagure.io/copr/copr/issue/2115
@praiskup praiskup force-pushed the releasing-logging branch from 1f6412f to ad9c2ed Compare April 25, 2022 08:44
Exceptions raised in Worker threads are totally unexpected, but they
probably occasionally happen, see Copr issue #2115.  We don't want to
terminate the whole resalloc-server process for them for sure (that
would cause too much harm) but we should at least log them to main.log
appropriately for further analysis.

This is per my best judgement, as long as Python interpreter actually
successfully starts the Thread, there must be some exception causing
that the ReleaseWorker isn't able to store the "RState.UP" into the
database.  See how simple the related code is (checkout
297f59e, and check
resallocserver/manager.py#L207-L209).

Relates: https://pagure.io/copr/copr/issue/2115
@praiskup praiskup force-pushed the releasing-logging branch from ad9c2ed to 956f7be Compare April 25, 2022 08:54
@praiskup praiskup changed the title server: more verbose RELEASING thread Stuck RELEASING jobs - debugging Apr 25, 2022
@FrostyX
Copy link
Collaborator

FrostyX commented Apr 26, 2022

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants