Simplify upstream latency collector and measure gateway latency #4193

philipp-spiess · 2024-05-16T12:48:46Z

This PR changes the way we track upstream latency (= RTT to the Sourcegraph instance). Instead of measuring every 10 minutes and then exporting the median to the completion metadata, I thought that it might be simpler to include the last measurement with the completion data (and then we can run an aggregation on that). While coding, completion suggestions should happen more frequent then every 10 minutes so this will give us a higher resolution while we avoid to maintain a list.

Additionally, I've added a second parameter to measure the latency to Gateway. This is only turned on for Free/Pro users right now (to avoid outgoing traffic in air-gapped instances).

Furthermore I added tracing to the upstream pings. With our current rate limit, one two two traces every ~10 minutes should be fine (again I assume this happen far less frequently than completion suggestions) but we can revisit this later.

The benefit of exporting this as a trace is that we can later forward the trace to the SG instance and convert it to a metric.

Test plan

rafax · 2024-05-16T13:39:57Z

vscode/src/services/UpstreamHealthProvider.ts

+                headers.set('Authorization', `Bearer ${this.fastPathAccessToken}`)
+                addTraceparent(headers)
+                addCustomUserAgent(headers)
+                const uri = 'https://cody-gateway.sourcegraph.com/healthz'


The right URL seems to be https://cody-gateway.sourcegraph.com/-/healthz (your URL 404s for me)?

Hm this one also does not seem right. I wasn't able to find an endpoint that works so decided that a 404 endpoint from the same endpoint should still have the same round trip time.

vscode/src/services/UpstreamHealthProvider.ts

philipp-spiess · 2024-05-17T10:17:10Z

Had one thought yesterday while I couldn't sleep at 3am (🙃):

We should run the initial ping a bit delayed. During the extension startup, a lot of i/o and network is happening so it might not be the best sample if we start a ping right away. I think something like a 10 second delay for the initial ping should be enough, just to ensure there's not too much interference of other extensions booting up?

philipp-spiess added 2 commits May 16, 2024 14:42

Simplify upstream latency collector and measure gateway latency

ef79339

Remove unecessary omit

aee5608

philipp-spiess requested review from rafax and a team May 16, 2024 12:48

philipp-spiess self-assigned this May 16, 2024

philipp-spiess requested a review from RXminuS May 16, 2024 12:48

rafax reviewed May 16, 2024

View reviewed changes

vscode/src/services/UpstreamHealthProvider.ts Outdated Show resolved Hide resolved

rafax approved these changes May 16, 2024

View reviewed changes

vscode/src/services/UpstreamHealthProvider.ts Outdated Show resolved Hide resolved

vscode/src/services/UpstreamHealthProvider.ts Outdated Show resolved Hide resolved

philipp-spiess added 3 commits May 17, 2024 13:39

Cleanup and run first test 10sec delayed

935753e

Use /-/__version endpoint for CG that doesn't require auth

b80cba7

Disable health pings in agent tests

760d39a

philipp-spiess merged commit 857b548 into main May 17, 2024
18 of 19 checks passed

philipp-spiess deleted the ps/simplify-upstream-health-and-measure-gateway-latency branch May 17, 2024 13:15

rafax pushed a commit that referenced this pull request May 20, 2024

Simplify upstream latency collector and measure gateway latency (#4193)

dee8e4a

rafax mentioned this pull request May 20, 2024

[vscode] 1.18.1 patch release #4223

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify upstream latency collector and measure gateway latency #4193

Simplify upstream latency collector and measure gateway latency #4193

philipp-spiess commented May 16, 2024

rafax May 16, 2024

philipp-spiess May 16, 2024

philipp-spiess commented May 17, 2024 •

edited

Simplify upstream latency collector and measure gateway latency #4193

Simplify upstream latency collector and measure gateway latency #4193

Conversation

philipp-spiess commented May 16, 2024

Test plan

rafax May 16, 2024

Choose a reason for hiding this comment

philipp-spiess May 16, 2024

Choose a reason for hiding this comment

philipp-spiess commented May 17, 2024 • edited

philipp-spiess commented May 17, 2024 •

edited