New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sentry error reporting doesn't quite work #350
Comments
Hey @wh5a, I make a fix for this on this branch. I've seen this unreliable Sentry delivery first-hand recently and I wasn't sure what was the root cause here, so thanks for reporting. Could you test if this new setup works for you? I haven't given it a shot yet. If it works, I'll put this in prod myself after the weekend, and release RQ 0.4.6 with this change applied. Note that this is a backward-incompatible change: you'll need to pass |
Hmm I thought the failure was due to forking. But your patch still didn't fix it. Now I'm puzzled... |
@nvie I'd actually like to see contrib.sentry deprecated and encourage the use of more standard logging mechanism. I have an example logging configuration using django-rq here: https://github.com/ui/django-rq . Thoughts? |
As a side note, I'm also using sentry as my error logger in production (via django-rq) and I've never had any problem with it |
@selwin I copied your LOGGING setting, and created a Django function that intentionally triggers an exception. Then I used |
@wh5a can you show me your logging configuration (don't forget to censor out sensitive data)? Also, did you run |
@selwin I like your suggestion a lot, thanks. |
@selwin good point. I was directly running 08:31:51 Registering birth of worker send-clone.19216 My logging configuration is identical to the one in https://github.com/ui/django-rq/blob/master/README.rst. |
From the |
@selwin that doesn't seem to be the real error. It still has plenty of memory, and the error only happens when it's started through |
How much memory does your box have? Workers started from |
@selwin Sorry you were right. It was really running out of memory. After killing a few processes I didn't get the OOM, but for some reason I didn't get sentry log either. |
It seems to be the same issue going through the logging system - exceptions in worker process for some reason don't get logged into sentry. |
Mind copy pasting the relevant logging configuration so I can take a look? Win~ On Mon, May 19, 2014 at 5:21 PM, Wei Hu notifications@github.com wrote:
|
I had the same logging configuration as your README:
|
Did you set up Sentry's DSN and other related settings correctly? You can test it by setting Django's error logger to Sentry and see if it logs correctly.Sent from my phoneOn Mon, May 19, 2014 at 3:32 AM -0700, "Wei Hu" notifications@github.com wrote: I had the same logging configuration as your README: LOGGING = { —Reply to this email directly or view it on GitHub. |
Yes, I have verified that the main process can log to sentry, but not worker process. |
I thought it could be a bug in the raven library, but I created a simple test and it worked fine in the child process:
|
Could you immediately call |
@nvie Yes, I believe you're on the right track. Calling |
Could you try changing your Sentry DSN to 'sync+https://...', so prefix it with |
Note: I'm not advocating you should permanently do this. This is not the final solution, but it definitely confirms the theory that this is the underlying issue. It is not the final solution, because if Sentry's down, this will block the worker processes, preventing them from termination (and thus stops job execution unnecessarily). Here are a few solutions that we could consider:
Option 2 sounds like the most desirable one, but how would this work if we move Sentry handling to the more standard logging method, @selwin? I don't think we can directly interact with Raven anymore that way, and so we cannot know if the event has been delivered or not. Which one has your preference? |
Yes, indeed running synchronously made the sentry delivery go through. Would it block job execution forever? I thought there would be a timeout after which the worker would be killed anyway? Or do you just not like the unnecessary timeout? |
That, and the timeout itself causes the exception in the first place, after which Sentry delivery is due, so it's a bit of a corner case. |
Unfortunately, I'm -1 on all three options. I think we should just document that if sentry is used to handle the errors, sync transport should be used. This is, after all, due to the way I never ran into these issues because I'm running |
@selwin Some people run self hosted sentry servers which may not be as reliable as the official cloud. |
Then they should be aware of the risks of hosting their own error logging service, just kidding ;) We already have a mechanism in place to kill long running processes using timeouts, and blocking Sentry calls seem to be in this category of problems. Apart from that, perhaps we should encourage the use of custom Worker, Job or Queue classes? We can then document or pull these custom workers/jobs into contrib. What do you think @nvie? Sent from my phone
|
@selwin I think you're right. The timeout that I wanted to instate can actually be specified in the DSN string as well, if you encode it in the URL param, |
@wh5a do you mind copy pasting a working django-rq logging configuration using Raven's sync transport? I want to update django-rq's documentation to make sure it works with newer Raven versions. |
@selwin I decided to run rqworker directly rather than going through the djang-rq wrapper, since as you pointed out it's more memory efficient. The change was simply to prefix the server with |
@wh5a alright, cool. I'm closing this issue since this is caused by Raven's transport method rather than RQ itself. |
Cool, I'll release RQ 0.4.6 in an hour. |
In rqworker.py the main process creates a sentry client and passes it to
register_sentry
. However when an exception is raised in the worker process, the client inherited from parent doesn't work any more. No error is logged into sentry.The text was updated successfully, but these errors were encountered: