Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentry doesn't report errors in the traces_sampler function #2582

Open
brettdh opened this issue Dec 11, 2023 · 7 comments
Open

Sentry doesn't report errors in the traces_sampler function #2582

brettdh opened this issue Dec 11, 2023 · 7 comments
Labels
Type: Bug Something isn't working

Comments

@brettdh
Copy link

brettdh commented Dec 11, 2023

How do you use Sentry?

Sentry Saas (sentry.io)

Version

1.38.0

Steps to Reproduce

  1. Call sentry_sdk.init with a traces_sampler= kwarg of a function that raises an exception.
  2. Perform a request (in my case: HTTP request to Django app running in AWS Lambda)

Expected Result

Error in sampling function is reported to Sentry

Actual Result

Unhandled error occurs (found in logs), but no Sentry issue is created.

TypeError: expected string or bytes-like object

  File "/var/task/sentry_sdk/integrations/aws_lambda.py", line 151, in sentry_handler
    with hub.start_transaction(
  File "/var/task/sentry_sdk/hub.py", line 572, in start_transaction
    transaction._set_initial_sampling_decision(sampling_context=sampling_context)
  File "/var/task/sentry_sdk/tracing.py", line 841, in _set_initial_sampling_decision
    options["traces_sampler"](sampling_context)
  File "/var/task/myapp/config/settings/base.py", line 263, in my_traces_sampler
    if metrics_api_rx.search(path):

where path was None.

@antonpirker
Copy link
Member

Hey @brettdh thanks for reporting this.
We will have a look.

@sl0thentr0py
Copy link
Member

by design, we do not report exceptions in SDK code, you should see it in logs if you adddebug=True

@brettdh
Copy link
Author

brettdh commented Dec 12, 2023

@sl0thentr0py can you elaborate on "by design"? It seems like an observability gap; i.e. an exception in the sampling function can quietly cause all requests to fail.

Further, the sampling function was a bit hard to write, due to the context object format being largely undocumented, and this makes me all the more nervous that I may have missed some corner case - which is why I wanted to make sure I still have error reporting for the sampling function (which is my code, not SDK code).

@sl0thentr0py
Copy link
Member

for one - if we self-report, we can end up in a busy loop of reporting.
we can make the experience a bit better here and maybe use logger.warn or logger.error so that it's clear to users.

@sl0thentr0py
Copy link
Member

ok actually nvm i just tried it out and I did get an issue in sentry

import sentry_sdk


def fail(_context):
    1 / 0


sentry_sdk.init(debug=True, traces_sampler=fail)

with sentry_sdk.start_transaction(name="traces sampler fail"):
    pass

sentry_sdk.flush()
image

@brettdh
Copy link
Author

brettdh commented Dec 15, 2023

Hmm. I got bit by this again today - due to a real ZeroDivisionError, coincidentally - and I didn't notice until I dug into logs.

Oddly, I did see a seemingly-spurious ServerlessTimeoutWarning in Sentry, and jumping to the logs in Cloudwatch linked from that error in Sentry, that's how I found the ZeroDivisionError:

  | 2023-12-14T19:30:49.641Z | START RequestId: d5d126c4-8cf3-4ae3-b931-14a7167dbea4 Version: $LATEST
  | 2023-12-14T19:30:49.645Z | Exception in thread Thread-4169:
  | 2023-12-14T19:30:49.645Z | Traceback (most recent call last):
  | 2023-12-14T19:30:49.645Z |   File "/var/lang/lib/python3.9/threading.py", line 980, in _bootstrap_inner
  | 2023-12-14T19:30:49.645Z |     self.run()
  | 2023-12-14T19:30:49.645Z |   File "/var/task/sentry_sdk/integrations/threading.py", line 72, in run
  | 2023-12-14T19:30:49.645Z |     reraise(*_capture_exception())
  | 2023-12-14T19:30:49.645Z |   File "/var/task/sentry_sdk/_compat.py", line 115, in reraise
  | 2023-12-14T19:30:49.645Z |     raise value
  | 2023-12-14T19:30:49.645Z |   File "/var/task/sentry_sdk/integrations/threading.py", line 70, in run
  | 2023-12-14T19:30:49.645Z |     return old_run_func(self, *a, **kw)
  | 2023-12-14T19:30:49.645Z |   File "/var/task/sentry_sdk/utils.py", line 1392, in run
  | 2023-12-14T19:30:49.645Z |     raise ServerlessTimeoutWarning(
  | 2023-12-14T19:30:49.646Z | sentry_sdk.utils.ServerlessTimeoutWarning: WARNING : Function is expected to get timed out. Configured timeout duration = 29 seconds.
  | 2023-12-14T19:30:49.649Z | [ERROR] ZeroDivisionError: float division by zero 
  Traceback (most recent call last):
   File "/var/task/sentry_sdk/integrations/aws_lambda.py", line 151, in sentry_handler
     with hub.start_transaction(
   File "/var/task/sentry_sdk/hub.py", line 572, in start_transaction
     transaction._set_initial_sampling_decision(sampling_context=sampling_context)
   File "/var/task/sentry_sdk/tracing.py", line 841, in _set_initial_sampling_decision
     options["traces_sampler"](sampling_context)
   File "/var/task/myproj/config/settings/base.py", line 340, in traces_sampler
     rate = 1.0 / Host.objects.filter(up=True).count()
  | 2023-12-14T19:30:49.677Z | END RequestId: d5d126c4-8cf3-4ae3-b931-14a7167dbea4
  | 2023-12-14T19:30:49.677Z | REPORT RequestId: d5d126c4-8cf3-4ae3-b931-14a7167dbea4 Duration: 35.35 ms Billed Duration: 36 ms Memory Size: 1024 MB Max Memory Used: 248 MB

I say the error is spurious because, as you see in the logs, the lambda execution was only 36 ms, not 29 seconds.

@sentrivana
Copy link
Contributor

We should properly define the expected behavior when a custom traces_sampler (or before_send{_transaction}, or error_sampler, or any other hook) throws an exception. Currently we usually log the error, but apparently in some cases the exception also gets intercepted by excepthook and makes it to Sentry. If that's how it should be, then this is a bug in the AWS Lambda integration since it apparently doesn't do that.

FWIW I do think we should make this sort of errors easier to discover, or at least make it clearer how to notice them. We've had some cases recently where folks were confused why there were no transactions appearing in Sentry and it took a long time to trace it back to a faulty before_send_transaction. Internal SDK exceptions should never affect the user's app, but these custom hooks are not SDK code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

4 participants