fix(profiling): Increase profile maximum size by an order of magnitude #1321

phacops · 2022-06-27T15:15:03Z

With a few customers onboarded, we're seeing profiles lasting a few seconds that can end up being right above 10MiB after being expanded (so, they can be 1MiB coming from the client but then we expand them into a more useful format and the resulting JSON ends up being more than 10MiB to be sent to Kafka). This aims to allow such profiles to make it to the rest of the pipeline as they are not a problem.

There is a good discussion to have around the maximum size we want to allow (perhaps there's a limit to what Kafka can handle without too much trouble) and we'd be able to reduce the size of the payload (using a different format than JSON, compressing the payload before it's send to Kafka) but for now, I want to see those profiles through and assess if they are outliers or the norm before we put too much work into reducing the size.

jan-auer · 2022-06-27T22:01:17Z

Have you run a performance test to check how well Relay is handling such a large JSON blob? This is now two orders of magnitude larger than any other JSON we process.
Note that there are in fact two absolute size limits in production that you cannot reconfigure like this (and please don't for experiments 😄): 20MB overall request size and 100MB after inflating the content encoding (e.g. gzip). If your profiles do not compress to 20MB or less, they will still be dropped with 413.
100MB requires quite a bit of uploading. We've primarily allowed such large payloads for minidumps which usually contain mostly empty memory, so they compress very well. I expect 100MB of JSON to compress much less, which also means that the upload from the client will take significant time. This means, our request duration timing may be affected negatively.

Rather than changing the default here, would it be possible to run canary test with a changed config instead? That would still yield test samples for you.

phacops · 2022-06-28T14:10:46Z

This is sort of a way to help us find a better limit. I'm not expecting 100MB profiles and wouldn't leave the limit like this (it's way too big). This is intended to change in the near future.

Remember, we're still operating at a ridiculously low scale, receiving less than 2 profiles per minute, with 90% being less than 500KB. This is really to detect what's a maximum good size and right now, we're dropping them without a clue.

We won't upload something bigger than 20MB compressed for sure as the clients have hard limits anyway (5MB for Android).

I can lower this maximum limit to something bigger than what we had before and see. I brought it up to 100MB since it was the maximum payload we allowed for 1 other item type and seemed more than enough.

untitaker · 2022-06-28T14:36:12Z

To my knowledge the only thing stopping somebody from sending us a lot of profiles today is the fact that we have an organization-level featureflag. But, our frontend DSN is public and we don't have rate limits to drop profiles specifically (the load balancer limits are not low enough). So assuming that is true, and considering that profiles are being processed within the same infrastructure as other traffic, I think this change is quite risky and I also don't understand what we get from it.

You already said that a limit of 100MB is not going to stick long term. What do you want to find out by deploying this limit? Is this about debugging those 100MB profiles and seeing why they got so big in the first place?

phacops · 2022-06-28T16:58:28Z

Yes, the idea is we're receiving profiles that, when expanded, are bigger than the current limit and we want to explore why. 100MB might be too big (I mean, the max attachment size is also 100MB, so what's so different with attachments? I'd be happy to modify the rate limits for the same values) but the idea is to understand those profiles better and limit/adjust accordingly.

untitaker · 2022-06-28T17:10:57Z

@phacops i think the main two differences are that attachments are written in chunks to kafka, and that they have rather strict abuse quotas passed down from sentry. My understanding is that we don't have any rate limits configured for profiles atm

I think i would be fine with this as a temporary thing if it's not kept turned on overnight, and if we're prepared to quickly disable the profile feature flag via flagr in case things turn south. but I'll leave the decision to @jan-auer

phacops · 2022-06-29T13:19:08Z

@untitaker OK, I see. We still have the feature flag we can turn off in case something goes weird but it's true we don't have strict abuse quotas for profiles specifically. Thanks for the explanation.

untitaker · 2022-06-29T13:20:26Z

and yeah @phacops as soon as you have enough debugging data let's revert this again.

* master: fix(profiling): Increase profile maximum size by an order of magnitude (#1321) feat(dynamic-sampling): Extend protocol [INGEST-1395 INGEST-1397] (#1318) ref(dynamic-sampling): Sample trace after metrics extraction (#1317) ref(project_redis): Add counter for redis cache usage (#1322) feat(profiling): Add thread metadata to tell which thread is the main (#1320)

fix(profiling): Increase profile maximum size by an order of magnitude

837df43

phacops requested a review from a team June 27, 2022 15:15

Add changelog

d200e91

Lower the limit for profiles

d5c17a5

jan-auer approved these changes Jun 28, 2022

View reviewed changes

Merge branch 'master' into pierre/profiling-increase-profile-max-size

edf0423

untitaker approved these changes Jun 29, 2022

View reviewed changes

phacops merged commit 6109cec into master Jun 29, 2022

phacops deleted the pierre/profiling-increase-profile-max-size branch June 29, 2022 13:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(profiling): Increase profile maximum size by an order of magnitude #1321

fix(profiling): Increase profile maximum size by an order of magnitude #1321

phacops commented Jun 27, 2022

jan-auer commented Jun 27, 2022

phacops commented Jun 28, 2022 •

edited

Loading

untitaker commented Jun 28, 2022

phacops commented Jun 28, 2022 •

edited

Loading

untitaker commented Jun 28, 2022

phacops commented Jun 29, 2022

untitaker commented Jun 29, 2022

fix(profiling): Increase profile maximum size by an order of magnitude #1321

fix(profiling): Increase profile maximum size by an order of magnitude #1321

Conversation

phacops commented Jun 27, 2022

jan-auer commented Jun 27, 2022

phacops commented Jun 28, 2022 • edited Loading

untitaker commented Jun 28, 2022

phacops commented Jun 28, 2022 • edited Loading

untitaker commented Jun 28, 2022

phacops commented Jun 29, 2022

untitaker commented Jun 29, 2022

phacops commented Jun 28, 2022 •

edited

Loading

phacops commented Jun 28, 2022 •

edited

Loading