Skip to content

Conversation

@mydea
Copy link
Member

@mydea mydea commented Apr 17, 2023

This PR updates the replay network body capture by truncating bodies to the max size of 150k characters.

The key part is, that we try to fix truncated JSON, if possible, so it remains a valid JSON object even after truncating.

For this, the key design goals have been:

  • Only run JSON.parse once (as it can be expensive to run it on large payloads multiple times)
  • Try to stay in O(n) complexity or similar (Avoid nested loops etc.)

This has been achieved by parsing the truncated JSON string once, keeping a stack of the JSON tokens, and completing the JSON afterwards in a valid fashion.

Truncated output

For non-JSON content, bodies will simply be truncated to 150k characters, followed by .

For JSON content, we'll add "~~" a the end of the JSON to indicate where it was truncated. Note that this can take various forms, depending on where the JSON is cut. E.g.:

  • ["aa~~"]
  • ["aa","bb","~~"]
  • {"aa":"~~"}
  • {"aa":"bb","~~":"~~"}

Edge cases

Some edge cases we handle:

  • Incomplete non-string values will be replaced with "~~". So we never have incomplete numbers, it will replace [1 with ["~~"].
  • Incomplete string values will have ~~ added. so ["aa will become ["aa~~"].
  • White space is ignored
  • It will add trailing elements when we don't know for sure the element is closed. e.g. ["aa" becomes ["aa", "~~"].

Meta

Instead of the previous _meta.errors field (which is never set anymore), the following _meta.warnings can be set:

  • JSON_TRUNCATED: The field is JSON & was truncated
  • TEXT_TRUNCATED: The field is plain text & was truncated
  • INVALID_JSON: We think the field should be JSON but we could not parse it. This can happen if we try to truncate the JSON but something goes wrong - in this case the body sent will be plain string.

This is a replacement for #7730.

Closes #7531

@mydea mydea added the Package: replay Issues related to the Sentry Replay SDK label Apr 17, 2023
@mydea mydea self-assigned this Apr 17, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Apr 17, 2023

size-limit report 📦

Path Size
@sentry/browser - ES5 CDN Bundle (gzipped + minified) 20.98 KB (0%)
@sentry/browser - ES5 CDN Bundle (minified) 65.49 KB (0%)
@sentry/browser - ES6 CDN Bundle (gzipped + minified) 19.52 KB (-0.01% 🔽)
@sentry/browser - ES6 CDN Bundle (minified) 57.95 KB (0%)
@sentry/browser - Webpack (gzipped + minified) 21.12 KB (0%)
@sentry/browser - Webpack (minified) 68.9 KB (0%)
@sentry/react - Webpack (gzipped + minified) 21.14 KB (0%)
@sentry/nextjs Client - Webpack (gzipped + minified) 48.99 KB (0%)
@sentry/browser + @sentry/tracing - ES5 CDN Bundle (gzipped + minified) 28.55 KB (-0.01% 🔽)
@sentry/browser + @sentry/tracing - ES6 CDN Bundle (gzipped + minified) 26.78 KB (-0.01% 🔽)
@sentry/replay ES6 CDN Bundle (gzipped + minified) 46 KB (+1.44% 🔺)
@sentry/replay - Webpack (gzipped + minified) 39.91 KB (+1.63% 🔺)
@sentry/browser + @sentry/tracing + @sentry/replay - ES6 CDN Bundle (gzipped + minified) 64.86 KB (+0.99% 🔺)
@sentry/browser + @sentry/replay - ES6 CDN Bundle (gzipped + minified) 57.84 KB (+1.13% 🔺)

Copy link
Member

@billyvg billyvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Would it be worth benchmarking fixJson against existing libraries (or even against something intensive in our sdk like compression) just to sanity check ourselves?


/**
* Takes an incomplete JSON string, and returns a hopefully valid JSON string.
* Note that this _can_ fail, so you should check the return value is valid JSON.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would cause it to fail? If it fails, what would be the result? Would it be better to throw in that case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, we only know it failed once we tried to JSON.parse() it. It can fail in two cases:

  1. We incorrectly identified a string as JSON (e.g. think: { this is not JSON }). It will still not be JSON after our fixing, in this case 😅
  2. It was correct JSON, but something went wrong in the implementation. This should not happen, and I know of no case where this would happen, but it is reasonably complex that I wouldn't feel comfortable stating that it will never result in broken JSON.

My first implementation actually try-catched JSON.parse() here and threw an error, but the current implementation only has a single JSON.parse in networkUtils, where we try-catch this in a central place, also making it easier to generate the correct warnings for the _meta field.


// Nested object/arrays
['{"a":{"bb', '{"a":{"bb~~":"~~"}}'],
['{"a":["bb",["cc","d', '{"a":["bb",["cc","d~~"]]}'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a few test cases with an array of nested objects

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, can add a few more here!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more tests, esp. with a large JSON body of nested stuff (found some issues while doing this, great!)

Copy link
Member

@Lms24 Lms24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a nice implementation! 🚀 You should really think about extracting fixJson to its own library!

} catch {
return {
body,
warnings: ['INVALID_JSON'],
Copy link
Member

@Lms24 Lms24 Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems we can end up in catch either because we didn't hit the size limit and something went wrong with parsong, or because the size limit was hit, we truncated and repared and then something went wrong while parsing. To distinguish these cases, should we add a JSON_TRUNCATED (or perhaps a new warning) to warnings?
(I guess this might be helpful for debugging later on but I'm probably missing a lot of context on what we do with these warnings. Feel free to disregard if this isn't helpful).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it's a good point, I wasn't sure. We can do INVALID_JSON & JSON_TRUNCATED together here, in that case, if that is helpful for the UI. @billyvg WDYT? we just need to make sure to handle this case properly in the UI then (when both of these are present)!

@mydea mydea force-pushed the fn/replay-body-truncate branch from 6c34d3a to e20f9ec Compare April 19, 2023 10:28
@mydea mydea force-pushed the fn/replay-body-truncate branch from 1186380 to 3a18365 Compare April 19, 2023 15:56
@mydea mydea force-pushed the fn/replay-body-truncate branch from d681d6d to 57efea8 Compare April 20, 2023 07:53
@mydea mydea merged commit 100369e into develop Apr 20, 2023
@mydea mydea deleted the fn/replay-body-truncate branch April 20, 2023 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Package: replay Issues related to the Sentry Replay SDK

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add truncation to request/response bodies

5 participants