feat(server): Retry project configs asynchronously [INGEST-1220] #1263

flub · 2022-05-06T16:41:12Z

Introduces version 3 of the endpoint protocol for project configs. It is
a superset version 2 that adds a list of pending keys. Keys in that
list are computed asynchronously by the upstream and should be retried
after a short while.

The version 2 endpoint blocked the response of the entire batch request
on every requested project config. Especially for longer chains of
Relays, a single slow config could therefore fail or timeout an entire
batch. Under load or in extraordinary circumstances, this had the
potential to accelerate the degradation of ingestion.

The version 3 implementation is twofold:

The upstream project source puts every project key from the pending
list back into the queue. It will be retried on the next fetch
cycle(s), until a config can be retrieved or the project exceeds its
retry limits.
The endpoint now supports both versions, 2 and 3. In version two, it
continues to await the project state response and never populates
pending. In version 3, it only responds with a config if the cached
state is up-to-date (including soft outdated). Otherwise, version 3
always responds with pending.

An example for the response schema is:

{
  "configs": {
    "50f0f59961bce997c872b45942c57e9f": { /* ... */ },
    "8f19b1b4d628e4ef0a3014c70379c180": { /* ... */ }
  },
  "pending": ["e1eef6aef6b20fcdb8190d703ab6a56b"]
}

Requires getsentry/sentry#34588, do not deploy Relay before Sentry is deployed.

a bit all over the place with indicators of where to look

jjbayer · 2022-05-11T11:38:20Z

relay-server/src/actors/project_upstream.rs

 #[derive(Debug, Deserialize, Serialize)]
 #[serde(rename_all = "camelCase")]
 pub struct GetProjectStatesResponse {
    #[serde(default)]
    pub configs: HashMap<ProjectKey, ErrorBoundary<Option<ProjectState>>>,
+    #[serde(default)]
+    pub pending: Vec<ProjectKey>,


To make it impossible to get into an inconsistent state here (project key appears in both collections), should we turn configs into something like HashMap<ProjectKey, ErrorBoundary<MaybeProjectState>>, with

enum MaybeProjectState { None, Present(ProjectState), Pending }

We could also define precedence of when both are present, for instance that the supplied config always wins over the pending state.

The initial approach was to do exactly what you are suggesting. Making serialization work in an unobtrusive way (supporting both v2 and v3 without much overhead) is slightly harder in that scenario, but still possible.

what is the advantage of the currently implemented schema? the protocol is not backwards-compatible anyway, even if it is at the syntactical level, we cannot send v3 to a v2 relay. if we accidentally send v3 response to v2 relay, deserialization failure (unknown value "pending") is actually safer IMO than assuming the project does not exist. The latter causes silent data dropping while the former throws errors everywhere. (Presumably we will see alerts either way, but customer relays may have less monitoring)

also, will we write pending state into redis, and if so, how?

This implementation was just massively easier to implement and read than the other one after trying both out.

In theory, we can send this down to a V2 Relay, and this is also how the endpoint is implemented. Such a Relay would assume the now missing project config as missing and stop polling. Since that would be incorrect in practice, the new key will never be sent to a V2 Relay based on the endpoint implementation.

relay-server/src/endpoints/project_configs.rs

* master: feat(sampling): Support custom tags and more contexts (#1268) deps: Update symbolic to pull in Unreal parser fixes (#1266) Bring in CLA Lite (#1257) release: 0.8.11 ref(profiling): Normalize all profiles (#1250) feat(profiling): Add a profile data category and count profiles in an envelope to apply rate limits (#1259) fix: Raise a new InvalidCompression Outcome for invalid Unreal compression (#1237) chore: Update logo for dark or light theme (#1262) ref: Use Duration to compute breakdowns and span durations (#1260)

jan-auer · 2022-05-13T06:52:42Z

Let's wait with the changelog entry until after May 15th, when the next OSS release runs.

flub · 2022-05-13T14:24:41Z

Let's wait with the changelog entry until after May 15th, when the next OSS release runs.

whoops, feel free to revert it. but why?

jan-auer · 2022-05-16T14:01:26Z

No worries about the changelog. Once the OSS release runs, the existing section will be renamed to 22.5.0, so we'll have to move the changelog entry to a new Unreleased section.

This PR is ready to be reviewed. Before merging and deploying it, we will have to accept version=3 requests in Sentry, though.

untitaker · 2022-05-16T14:05:43Z

relay-server/src/actors/project_upstream.rs

 #[derive(Debug, Deserialize, Serialize)]
 #[serde(rename_all = "camelCase")]
 pub struct GetProjectStatesResponse {
    #[serde(default)]
    pub configs: HashMap<ProjectKey, ErrorBoundary<Option<ProjectState>>>,
+    #[serde(default)]
+    pub pending: Vec<ProjectKey>,


what is the advantage of the currently implemented schema? the protocol is not backwards-compatible anyway, even if it is at the syntactical level, we cannot send v3 to a v2 relay. if we accidentally send v3 response to v2 relay, deserialization failure (unknown value "pending") is actually safer IMO than assuming the project does not exist. The latter causes silent data dropping while the former throws errors everywhere. (Presumably we will see alerts either way, but customer relays may have less monitoring)

also, will we write pending state into redis, and if so, how?

untitaker · 2022-05-16T14:22:28Z

relay-server/src/endpoints/project_configs.rs

-            .map(|query| query.version == self.0)
-            .unwrap_or(false)
+        let query = VersionQuery::from_request(req);
+        query.version >= ENDPOINT_V2 && query.version <= ENDPOINT_V3


the previous move from v1 to v2 has been implemented such that relay only supports one version, forwards all other requests at the HTTP layer, and all legacy compat stuff is handled in sentry. did we decide against continuing this path because we fear the increased traffic on python web workers coming from existing v2 customer relays? that's a bit unfortunate, it makes versioning more complicated. I think sentry will have to support all versions forever anyway. I thought it would be possible to weather through the increased traffic on python web workers until customers upgrade their relays.

The basic model is still the same. Relay supports V2 and V3 now, handling all deployed versions of Relay without relying on the HTTP endpoint in Sentry. It will forward V4+ though, just like we used to forward future versions before. So the difference now is just that we also keep supporting the "old" V2 for performance reasons.

that's a bit unfortunate, it makes versioning more complicated.

Agreed, although we can drop V2 support some time in the future, returning to supporting just a single version. Given the current implementation, do you think this is too much added complexity in this case? Personally, to me this seems still quite low-overhead.

untitaker · 2022-05-16T14:25:31Z

relay-server/src/actors/project_upstream.rs

    #[macro_export]
    macro_rules! project_states_version {
        () => {
-            2
+            3


nit, this constant is now also defined as const item further down

Good point. We might also get rid of this macro now. The ENDPOINT_V3 constant wasn't set to this because V3 != current_version. Thoughts?

jan-auer · 2022-05-16T15:21:51Z

Closing & reopening as per suggestion from @chadwhitacre to get CLA checks working.

* master: fix(ci): Use correct trigger condition for CLA (#1269) release: 22.5.0

Floris Bruynooghe and others added 3 commits May 6, 2022 18:38

Fetch from upstream with pending state

39a6842

WIP: figuring out how to propagate endpoint through correctly

3a606d4

a bit all over the place with indicators of where to look

Don't need new message types, but futures 0.1 async is meh

aed99e0

jan-auer changed the title ~~Feat/upstream pending~~ feat(server): Retry project configs asynchronously [INGEST-1220] May 9, 2022

jjbayer reviewed May 11, 2022

View reviewed changes

jan-auer and others added 4 commits May 12, 2022 12:27

wip: Update endpoint implementation

8d43f7c

fix: Broken docs link

d6df486

Add test for v3 projectconfig requests

4c05f99

Add changelog entry

2488d65

jan-auer mentioned this pull request May 16, 2022

fix(ci): Use correct trigger condition for CLA #1269

Merged

jan-auer marked this pull request as ready for review May 16, 2022 13:48

jan-auer requested a review from a team May 16, 2022 13:48

untitaker reviewed May 16, 2022

View reviewed changes

jan-auer closed this May 16, 2022

jan-auer reopened this May 16, 2022

jan-auer mentioned this pull request May 17, 2022

feat(api): Add support for relay projectconfigs v3 [INGEST-1220] getsentry/sentry#34588

Merged

jan-auer added 2 commits May 17, 2022 09:15

Merge branch 'master' into feat/upstream-pending

a87f8cc

* master: fix(ci): Use correct trigger condition for CLA (#1269) release: 22.5.0

meta: Update changelog

43bf3d8

untitaker approved these changes May 17, 2022

View reviewed changes

jan-auer merged commit 7e63150 into master May 17, 2022

jan-auer deleted the feat/upstream-pending branch May 17, 2022 11:54

jan-auer mentioned this pull request May 18, 2022

feat(server): Explicit retry of individual configs [INGEST-1220] #980

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): Retry project configs asynchronously [INGEST-1220] #1263

feat(server): Retry project configs asynchronously [INGEST-1220] #1263

flub commented May 6, 2022 •

edited by jan-auer

Loading

jjbayer May 11, 2022

jan-auer May 13, 2022

untitaker May 16, 2022

jan-auer May 16, 2022

jan-auer commented May 13, 2022

flub commented May 13, 2022

jan-auer commented May 16, 2022

untitaker May 16, 2022

untitaker May 16, 2022

jan-auer May 16, 2022

untitaker May 16, 2022

jan-auer May 16, 2022

jan-auer commented May 16, 2022

feat(server): Retry project configs asynchronously [INGEST-1220] #1263

feat(server): Retry project configs asynchronously [INGEST-1220] #1263

Conversation

flub commented May 6, 2022 • edited by jan-auer Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jan-auer commented May 13, 2022

flub commented May 13, 2022

jan-auer commented May 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jan-auer commented May 16, 2022

flub commented May 6, 2022 •

edited by jan-auer

Loading