Skip to content

[5.8] Fail job immediately after it timeouts if it wont be retried #29024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 3, 2019

Conversation

themsaid
Copy link
Member

@themsaid themsaid commented Jul 2, 2019

Here's the scenario:

  • You have a job with a timeout of 2 seconds, and retries = 1
  • You have your worker's retry_after set to 60 seconds
  1. The job runs, it takes more than 2 seconds so it timeouts and process gets killed.
  2. Jobs isn't marked failed yet.
  3. Worker starts again and after 60 minutes it picks the job again to process.
  4. It finds that this job has no retries left so it marks the job as failed.

So currently a timed out job will wait after the retry_after period to pass before it can be reported as failed even if there are no retries remained.

This PR will fail the job immediately if there are no more retries left, same as when the job throws an exception, we fail it immediately if we won't be retrying it anymore.

@driesvints
Copy link
Member

@themsaid can you rebase with 5.8 branch? Build should pass after that.

@themsaid themsaid force-pushed the failJobImmediately branch from 9287b23 to c126390 Compare July 2, 2019 17:34
@taylorotwell taylorotwell merged commit 446b009 into laravel:5.8 Jul 3, 2019
@cannahan
Copy link

Since this commit Im getting null pointer exceptions:

"Call to a member function getConnectionName() on null" in "/laravel/framework/src/Illuminate/Queue/Worker.php:144"

This commit changes the method "registerTimeoutHandler($job, WorkerOptions $options)" and calls "getConnectionName()" on "$job" in the registered signal handler for the alarm handler. As declared in the doc-param "$job" could be null, but imo its not handled appropriately.

Marked in Source: c126390#diff-b0152e2539005c772b52151cc5d4aa60R144

@djtarazona
Copy link
Contributor

djtarazona commented Aug 1, 2019

That's interesting. Yes, $job can technically be null if there is no job to process. However, if the timeout handler is being fired when there is no job processing, that means something else is taking long enough to reach the timeout. Perhaps fetching the next job is taking a long time (and doesn't reset the timeout in time), or your sleep value is longer than your timeout.

Anyways, simply checking that $job is not null first and then calling markJobAsFailedIfWillExceedMaxAttempts() in the timeout handler should fix the issue. I can open up a PR fix soon if someone else doesn't get to it first.

@djtarazona
Copy link
Contributor

Since this commit Im getting null pointer exceptions:

"Call to a member function getConnectionName() on null" in "/laravel/framework/src/Illuminate/Queue/Worker.php:144"

This commit changes the method "registerTimeoutHandler($job, WorkerOptions $options)" and calls "getConnectionName()" on "$job" in the registered signal handler for the alarm handler. As declared in the doc-param "$job" could be null, but imo its not handled appropriately.

Marked in Source: c126390#diff-b0152e2539005c772b52151cc5d4aa60R144

@cannahan I opened #29366 to fix this.

@koenvu
Copy link
Contributor

koenvu commented Jun 3, 2020

We recently noticed that we weren't getting the MaxAttempt exceptions in our error logging anymore. As it turns out there is a discrepancy between markJobAsFailedIfWillExceedMaxAttempts and markJobAsFailedIfAlreadyExceedsMaxAttempts on the Worker class. The former doesn't throw $e whereas the latter does.

Since timeouts are now handled "differently" due to this PR, using the first method, there no longer is that automatic error reporting.

Is this a deliberate difference between the methods, or is it an oversight and should throw $e perhaps be added to markJobAsFailedIfWillExceedMaxAttempts as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants