-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitor intermitten failures on long running tasks #2422
Comments
Hey @anze3db ! What are the grace and timeout settings of the |
Yes, I agree. I just thought I'd mention that the wrapper monitor passed. I think this should rule out any errors in my function.
All the monitors have a grace period of 60 minutes and max runtime at 300 minutes: I can try to make the values less loose and report back if the issue still reproduces. |
Thanks for the update. Hm, indeed very strange. Currently I do not have even a hunch why this is happening. |
The only difference is that
# Main entry point for a Django Command:
def handle(
self,
*args,
offset=0,
instances=None,
skip_inactive_for: int = 0,
pre_filter: bool = False,
**options,
):
self.main(
offset=offset,
instances=instances,
skip_inactive_for=skip_inactive_for,
pre_filter=pre_filter,
)
# My Async method that does all the work:
@async_to_sync
async def main(
self,
offset: int,
instances: str | None,
skip_inactive_for: int = 0,
pre_filter: bool = False,
):
... I can try rewriting the code to not use async to see if async is the cause of this problem. |
Even without any async code in |
Dang. And the monitor is marked as "Timed out" right? |
Correct.
I've just done that. It is in the in progress state while it's processing and stays in the in progress state when the job finishes: |
So it stays "in progress" for the whole 5 hours and then is set to "failed"? Is this correct? |
Yes, correct. |
Ok, thanks for confirming. I still dont know what is going wrong. I have asked the team handling the checkins of crons in our backend, if they have any ideas what is going wrong... |
Just wanted to check in, if you try with the newest Version of the SDK, does this still apply? |
Oh no, I am reopening the issue because it has just happened again: I see from my logs that there were no errors and that the statuser job finished after about 30minutes:
|
How do you use Sentry?
Sentry Saas (sentry.io)
Version
1.31.0
Steps to Reproduce
Expected Result
No alerts when the job finishes without errors as seen by the logs:
Actual Result
The monitor sometimes fails on the job that takes more than 1hour:
Even though the monitor after it (
daily-sync-stats
) worked ok:And interestingly, the
daily-sync
monitor which wraps the whole job function also passed without problems.I initially only had the
daily-sync
monitor configured since I don't really need every step monitored, but when I only had a single monitor it failed consistently every day. With multiple monitors the failures are more intermittent.I'm also not sure if it's relevant, but the job function is part of a always running Django Command that looks like this:
The text was updated successfully, but these errors were encountered: