Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DedupeIntegration does not intercept messages from the logging #2459

Open
saippuakauppias opened this issue Oct 19, 2023 · 6 comments
Open
Labels
Enhancement New feature or request Triaged Has been looked at recently during old issue triage

Comments

@saippuakauppias
Copy link

Problem Statement

I happened to notice that we are using LoggingIntegration (the last one in the list of integrations, don't know if it's important), but for some reason logged messages are not deduplicated. This problem grew out of #2440 and #2456 (comment).

Integration source: https://github.com/getsentry/sentry-python/blob/master/sentry_sdk/integrations/dedupe.py

Solution Brainstorm

It seems wrong to deduplicate only errors, because if DedupeIntegration is enabled, it is worth capturing log messages as well. However, I'm not sure if this way will work in my case, because it seems that DedupeIntegration.self._last_seen logic is tied to the local thread.

Can you tell me how to refine DedupeIntegration so that it deduplicates repeated messages from the logger?

@saippuakauppias saippuakauppias added the Enhancement New feature or request label Oct 19, 2023
@antonpirker
Copy link
Member

Hey @saippuakauppias
yes, Dedupe currently only works if there is an exception.

Hm, maybe we could do something with stack_info from python logging (see https://docs.python.org/3/library/logging.html#logging.Logger.debug)

@saippuakauppias
Copy link
Author

saippuakauppias commented Nov 3, 2023

Hi, @antonpirker , do you think it might be possible to improve it in the SDK itself? I'm talking about logger messages of the same message and log_level (not exceptions).

@antonpirker
Copy link
Member

Hey @saippuakauppias

Do you need the deduplication of log messages because you do not want to see the duplicate messages in Sentry, or to save quota by not sending to many messages to Sentry?

(I am thinking if we should have some deduplication logic in the LoggingIntegration instead. Not sure yet)

@saippuakauppias
Copy link
Author

Hi, @antonpirker!

Sorry for the long reply, didn't see the message in the mail.

This is to keep the quota and not send too many identical messages to Sentry (because we occasionally get 100k messages when one of our services restarts). And this queue takes a very long time to parse, so we might miss some more important events.

@antonpirker
Copy link
Member

Hey! Thanks for getting back. I will have a look.

@antonpirker
Copy link
Member

I looked at your issue and comment linked in the description. And I think you can achieve what you want with this code (it uses the new error_sampler we released):

already_seen = set()

def sample_errors(event, hint):
    msg = event["logentry"]["message"]
    if msg not in already_seen:
        already_seen.add(msg)
        return True
    
    return False

sentry_sdk.init(
    dsn=os.environ.get("SENTRY_DSN"),
    release="test",
    before_send=prepare_event,
    error_sampler=sample_errors,
    default_integrations=False,
    debug=True,
    integrations=[
        StdlibIntegration(),
        ExcepthookIntegration(),
        DedupeIntegration(),
        AtexitIntegration(),
        ModulesIntegration(),
        LoggingIntegration(level=None, event_level=logging.WARNING),
    ],
)

When you then something like this only the first occurrence will be sent:

for x in range(10):
    try:
        raise RuntimeError('Oh no!')
    except Exception as e:
        logger.warning('Error get value from service', exc_info=True)

Not that this checks for the string of the warning message. So make sure you use variables when logging variable data.
Good:

logger.warning('Wrong  user id %s', user_id)

Bad:

msg = 'Wrong user id {}'.format(user_id)
logger.warning(msg)

@antonpirker antonpirker added the Triaged Has been looked at recently during old issue triage label Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request Triaged Has been looked at recently during old issue triage
Projects
Status: No status
Development

No branches or pull requests

2 participants