-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(profiling): Increase profile maximum size by an order of magnitude #1321
Conversation
Rather than changing the default here, would it be possible to run canary test with a changed config instead? That would still yield test samples for you. |
This is sort of a way to help us find a better limit. I'm not expecting 100MB profiles and wouldn't leave the limit like this (it's way too big). This is intended to change in the near future. Remember, we're still operating at a ridiculously low scale, receiving less than 2 profiles per minute, with 90% being less than 500KB. This is really to detect what's a maximum good size and right now, we're dropping them without a clue. We won't upload something bigger than 20MB compressed for sure as the clients have hard limits anyway (5MB for Android). I can lower this maximum limit to something bigger than what we had before and see. I brought it up to 100MB since it was the maximum payload we allowed for 1 other item type and seemed more than enough. |
To my knowledge the only thing stopping somebody from sending us a lot of profiles today is the fact that we have an organization-level featureflag. But, our frontend DSN is public and we don't have rate limits to drop profiles specifically (the load balancer limits are not low enough). So assuming that is true, and considering that profiles are being processed within the same infrastructure as other traffic, I think this change is quite risky and I also don't understand what we get from it. You already said that a limit of 100MB is not going to stick long term. What do you want to find out by deploying this limit? Is this about debugging those 100MB profiles and seeing why they got so big in the first place? |
Yes, the idea is we're receiving profiles that, when expanded, are bigger than the current limit and we want to explore why. 100MB might be too big (I mean, the max attachment size is also 100MB, so what's so different with attachments? I'd be happy to modify the rate limits for the same values) but the idea is to understand those profiles better and limit/adjust accordingly. |
@phacops i think the main two differences are that attachments are written in chunks to kafka, and that they have rather strict abuse quotas passed down from sentry. My understanding is that we don't have any rate limits configured for profiles atm I think i would be fine with this as a temporary thing if it's not kept turned on overnight, and if we're prepared to quickly disable the profile feature flag via flagr in case things turn south. but I'll leave the decision to @jan-auer |
@untitaker OK, I see. We still have the feature flag we can turn off in case something goes weird but it's true we don't have strict abuse quotas for profiles specifically. Thanks for the explanation. |
and yeah @phacops as soon as you have enough debugging data let's revert this again. |
* master: fix(profiling): Increase profile maximum size by an order of magnitude (#1321) feat(dynamic-sampling): Extend protocol [INGEST-1395 INGEST-1397] (#1318) ref(dynamic-sampling): Sample trace after metrics extraction (#1317) ref(project_redis): Add counter for redis cache usage (#1322) feat(profiling): Add thread metadata to tell which thread is the main (#1320)
With a few customers onboarded, we're seeing profiles lasting a few seconds that can end up being right above 10MiB after being expanded (so, they can be 1MiB coming from the client but then we expand them into a more useful format and the resulting JSON ends up being more than 10MiB to be sent to Kafka). This aims to allow such profiles to make it to the rest of the pipeline as they are not a problem.
There is a good discussion to have around the maximum size we want to allow (perhaps there's a limit to what Kafka can handle without too much trouble) and we'd be able to reduce the size of the payload (using a different format than JSON, compressing the payload before it's send to Kafka) but for now, I want to see those profiles through and assess if they are outliers or the norm before we put too much work into reducing the size.