-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify upstream latency collector and measure gateway latency #4193
Simplify upstream latency collector and measure gateway latency #4193
Conversation
headers.set('Authorization', `Bearer ${this.fastPathAccessToken}`) | ||
addTraceparent(headers) | ||
addCustomUserAgent(headers) | ||
const uri = 'https://cody-gateway.sourcegraph.com/healthz' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right URL seems to be https://cody-gateway.sourcegraph.com/-/healthz
(your URL 404s for me)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm this one also does not seem right. I wasn't able to find an endpoint that works so decided that a 404 endpoint from the same endpoint should still have the same round trip time.
Had one thought yesterday while I couldn't sleep at 3am (🙃):
|
Closes CODY-1605
This PR changes the way we track upstream latency (= RTT to the Sourcegraph instance). Instead of measuring every 10 minutes and then exporting the median to the completion metadata, I thought that it might be simpler to include the last measurement with the completion data (and then we can run an aggregation on that). While coding, completion suggestions should happen more frequent then every 10 minutes so this will give us a higher resolution while we avoid to maintain a list.
Additionally, I've added a second parameter to measure the latency to Gateway. This is only turned on for Free/Pro users right now (to avoid outgoing traffic in air-gapped instances).
Furthermore I added tracing to the upstream pings. With our current rate limit, one two two traces every ~10 minutes should be fine (again I assume this happen far less frequently than completion suggestions) but we can revisit this later.
The benefit of exporting this as a trace is that we can later forward the trace to the SG instance and convert it to a metric.
Test plan