Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task/runner: Fix NRE in publishing task result #92

Merged
merged 1 commit into from
Dec 2, 2022
Merged

Conversation

victorges
Copy link
Member

After a recent change to the ErrYieldExecution into a ContinueAsync output, I introduced a null reference exception on line 202 of the runner any time a task failed: the output was nil so it panicked.

A panic at that point in the code won't restart the process, but will be recovered by the AMQP consumer itself which will nack the event instead of acking. This means that the event goes back to the front of the queue, which means that any failing task would keep 1 task-runner go-routine stuck.

After 15 failed tasks on each region, the task-runner would stop executing any tasks at all! It just entered an eternal loop of picking up a task, failing, panicking, putting it back in the front of the queue and repeat.

We probably need an alert for increased panics as well. We had that on papertrail but I think haven't created on Loki?

@victorges victorges requested a review from a team as a code owner December 1, 2022 20:56
@victorges victorges merged commit dda1b59 into main Dec 2, 2022
@victorges victorges deleted the vg/fix/nre branch December 2, 2022 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant