-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure Django errors are represented in Error Tracking #647
Comments
Started a thread on Slack with DD:
Here is the existing DD support ticket that mentions this issue, and needs to be spun off into a separate support ticket. |
I'm not clear if #625 will be more important once we fix all these issues, or only if we do alerting of Error Tracking issues. |
From the Slack thread noted above:
|
There's more updates on the Slack thread (so please read it in addition to these comments). It looks like Error Tracking may only work for Service Entry span errors. |
Comparing NR and DD error listings...
Looking for errors in DD that might not be handled properly:
Basically, I'm having trouble seeing the problem here. |
I don't understand this statement. I think your comments comparing DD and NR make the problem clear, right? Are you trying to say you don't see what the solution is? Or you don't see what the root cause is? Or you don't see why more errors aren't showing up under Error Tracking? Or are you trying to say that New Relic has the problem, and we shouldn't be seeing these errors in DD? Or what did you mean? :) Once you are more clear on the problem, can you spin off an appropriate DD support ticket and have DD help us understand how to resolve? Thank you. |
Well, the comparison certainly shows that NR's Error Inbox and DD's Error Tracking don't show the same thing. But I don't think that Datadog is wrong here, it's just showing a different selection. And that selection seems reasonable to me—operationally, I mostly don't care about 4xx errors unless there's a surge, but for 5xx errors I want to know about everyone of them. |
Thank you for clarifying. You are welcome to close this if you are satisfied, but I’ve got some thoughts that could relate to spinning off tickets:
|
Additionally, do we know how to promote an error to the error inbox if we wanted to, and is that useful to have documented? |
|
Spot-checking 5xx from prod-lms in DD, I found https://app.datadoghq.com/apm/trace/664f66d200000000830052761f18d571 -- EDIT: Tracking in https://help.datadoghq.com/hc/en-us/requests/1707057 |
I think what's going on here is that when a Django view raises an exception, Datadog is failing to annotate the service entry span with error stack, message, and type. We might need to add a middleware as a workaround until they can fill that gap. |
@timmc-edx: Note that exception handling for DRF might need to be covered in a DRF handler, and not via the middleware. I'm not sure if we actually have a problem with DRF and non-DRF? See our ignored error handling code in edx-platform for examples of the handler I am referring to. |
Here's an example of a DRF view that has the same problem: https://app.datadoghq.com/apm/trace/6656276100000000c495bf4489c97e42 (view raises exception, view span has appropriate error tags, root span is a 5xx with status:error but no error tags). So I think this actually isn't a DRF issue at all. Probably we can work around this by adding a custom middleware (or adjusting an existing one). |
[inform] See openedx/edx-platform#26980 for an example of how we might need to handle exceptions for DRF and Django in two ways (through middleware and through a DRF exception handler). |
Datadog has issues with tagging the root span with error information. This change adds the functionality to tag the root span on exceptions. This also renames the CachedCustomMonitoringMiddleware into MonitoringSupportMiddleware so that it can be a central place for most monitoring middleware changes instead of adding a new one every time. At some point, CachedCustomMonitoringMiddleware will be removed. edx/edx-arch-experiments#647
Datadog has issues with tagging the root span with error information. This change adds the functionality to tag the root span on exceptions. This also renames the CachedCustomMonitoringMiddleware into MonitoringSupportMiddleware so that it can be a central place for most monitoring middleware changes instead of adding a new one every time. At some point, CachedCustomMonitoringMiddleware will be removed. edx/edx-arch-experiments#647
Datadog has issues with tagging the root span with error information. This change adds the functionality to tag the root span on exceptions. This also renames the CachedCustomMonitoringMiddleware into MonitoringSupportMiddleware so that it can be a central place for most monitoring middleware changes instead of adding a new one every time. At some point, CachedCustomMonitoringMiddleware will be removed. edx/edx-arch-experiments#647
Django views that raise exceptions are not showing up in Datadog's "Error Tracking" feature. This is because the service entry span is not being annotated with
error.stack
,error.message
, anderror.type
span tags, which are all required for that feature. It appears this is just a gap in ddtrace's Django support.Acceptance Criteria:
record_exception
in edx-django-utils'DatadogBackend
).Implementation details:
process_exception
method for detecting these errors and passing it to the telemetry backends.Notes:
The text was updated successfully, but these errors were encountered: