Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic] Replay: Connect backend errors to Replays #45529

Closed
10 tasks done
bruno-garcia opened this issue Mar 8, 2023 · 6 comments
Closed
10 tasks done

[Epic] Replay: Connect backend errors to Replays #45529

bruno-garcia opened this issue Mar 8, 2023 · 6 comments
Assignees

Comments

@bruno-garcia
Copy link
Member

bruno-garcia commented Mar 8, 2023

Session Replay can give the reproduction steps users need to solve problems across their stack. This means not only frontend but also errors happening on the backend.

We want to better connect Session Replay with errors happening on the backend. So to help backend engineers solve bugs faster.

Phases

  1. Propagate the replayId through traces. Today that requires performance monitoring enabled.

This will allow us to work on the product and CTAs. But limits the impact by only customers that have performance monitoring. We'll ideally achieve this without SDK change.

  1. Decouple trace propagation from performance monitoring.

This will allow "all" backend errors to get linked to Replays, regardless of users having performance monitoring enabled.
For this to happen we'll need to change all Sentry SDKs. It'll affect other parts of the product like errors and require more coordination with other teams.

Internal Release

EA Release

GA Release

  1. Status: Backlog
    Jesse-Box
  2. Product Area: Replays Status: Backlog
    eliashussary
  3. Product Area: Replays Type: Content

image (twitter)

@bruno-garcia
Copy link
Member Author

bruno-garcia commented Mar 8, 2023

Pulling an update from the RFC, from @JoshFerge

After doing some prototyping, I've found that each individual backend SDK will have to be modified in order to set the replay_id on the event. this is because the dynamic sampling context is not sent along to ingest on error events.

I also found that at least on the python SDK, the dynamic sampling values are hardcoded, which means that we'll have to update SDKs to add another value.

I'm curious because from reading the spec, it seems like we wanted more flexibility in the propagation.

Being able to simply copy key-value pairs from the baggage header onto the trace envelope header gives us the flexibility to provide dedicated API methods to propagate additional values using Dynamic Sampling Context. This, in return, allows users to define their own values in the Dynamic Sampling Context so they can sample by those in the Sentry interface.

Given these details, I believe we can take the following course of action to accomplish the initial goal of backend errors being tagged w/ replay_id when tracing is enabled:

  • Modify the sentry-javascript replay SDK to expose the current replayId on the integration
  • Modify the sentry-javascript DSC function to grab the replayId on the replay integration if it exists, and add it to the DSC obj.
  • For each Backend SDK:
    • let replay_id propagate on the DSC
    • Set the replay_id on the scope like we do for trace_id currently
    • When we send an event, enrich it with the scope's replay_id
    • TODO: prioritize SDK implementations by adoption/importance
  • From there, we shouldn't need any changes on relay
  • Given this needs and SDK upgrade, we'd likely want to put SDK CTA upgrades for this feature in non-intrusive manner in the replays parts of the product.

This would allow users to navigate to replays from backend errors / issues. The extra work I described in option A of the RFC would allow users to search for replays by a specific backend error, but I don't think that's where the majority of the value lies in this integration, so for the initial feature release, I believe the above is the minimum set we'd need.

Given the amount of SDK work and friction in getting users to upgrade their SDKs, I do think it would be worthwhile to prototype what some kind of query engine for linking all of this via trace_id instead (Option D in the RFC). I will plan to spend a couple of days on this to explore feasibility before moving forward with the implementation / delegation of these SDK changes.

@JoshFerge
Copy link
Member

update: going to write some quick thoughts on trying to do this backend only, but its likely not a path we're going to pursue, and will commence work on the replay_id trace propagation project shortly.

JoshFerge added a commit to getsentry/relay that referenced this issue Apr 11, 2023
…#1983)

- [x] Defines a new context `ReplayContext` that will contain the
replay_id on events (errors and transactions, etc.). Note that the SDKs
never add this value to the event, it is always added in relay via the
DSC.
- [x] Adds replay_id to dynamic sampling context (DSC) schema 
- [x] If replay_id exists on DSC, add it onto the event being processed
in the replay context




Related: getsentry/rfcs#60
getsentry/team-webplatform-meta#41
getsentry/sentry#45529

Closes: #1958

---------

Co-authored-by: Oleksandr <1931331+olksdr@users.noreply.github.com>
@bruno-garcia
Copy link
Member Author

bruno-garcia commented Apr 19, 2023

Once this design ticket is done we'll better understand the changes needed and new issues will be created:

@JoshFerge
Copy link
Member

@bruno-garcia
Copy link
Member Author

Since is GA already so closing. Missing only docs but this is a pretty self explanatory feature since Replays show up on the backend issues. We'll add docs as a follow up.

@github-actions github-actions bot locked and limited conversation to collaborators May 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants