-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extension registry GraphQL API is slow #10554
Comments
What's the GraphQL request? |
Just ran a postgres vacuum just in case that is a quick win. Hopefully
it is faster now? Otherwise we will likely need to tackle this since I
believe we recently made some changes to it to try make it faster.
|
@tsenart do you still have your "kubectl run" for pghero in your shell
history? We don't have it running anymore, and from looking at the
history it never existed as a deployment only as a pod. So I can't
recover it :)
|
@felixfbecker: We increased the RAM of the sourcegraph.com postgres database and tuned some parameters as part of #10598. I think this may impact the performance of this query. Would you be able to check? |
How would we do this clean up? Any data that can be deleted safely? |
@felixfbecker: I opened a PR to instrument the extension registry API with Prometheus metrics so that we can do that. |
This commit re-enables and extends the Prometheus instrumentation of the extension registry API, so that we can chart latency in Grafana. Related to #10554
* registry: Instrument API with Prometheus metrics This commit re-enables and extends the Prometheus instrumentation of the extension registry API, so that we can chart latency in Grafana. Related to #10554 * fixup! Address feedback
@slimsag: I can't seem to save a new dashboard on sourcegraph.com. It'd be useful to be able to play around with those dashboards in the UI to ensure we get the right design, and the export that as JSON to the right git repo for persistence and automation of the provisioning. Any leads on how to do that? I have this at the moment: {
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"links": [],
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": null,
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 0
},
"hiddenSeries": false,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"dataLinks": []
},
"percentage": false,
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum by (le)(rate(src_registry_requests_duration_seconds_bucket{code=~\"^2.+\", operation=\"list\"}[5m])))",
"interval": "",
"legendFormat": "2xx",
"refId": "A"
},
{
"expr": "histogram_quantile(0.99, sum by (le)(rate(src_registry_requests_duration_seconds_bucket{code!~\"^2.+\", operation=\"list\"}[5m])))",
"interval": "",
"legendFormat": "Non 2xx",
"refId": "B"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "Extension Registry API: List Latency",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": "seconds",
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"schemaVersion": 22,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
]
},
"timezone": "",
"title": "Extension Registry API",
"uid": null,
"variables": {
"list": []
},
"version": 0
} |
Does @sourcegraph/web team has more context of this? |
Sounds like we are storing past versions in that table. Those could be deleted I think because we always serve the latest version (and don't provide any way to introspect or fetch old versions) AFAIK. @lguychard @sqs is this correct? It is likely that big because we store in it a) the extension bundle and b) the source map for the extension bundle, which likely has inline sources, i.e. the entire contents of the repository plus dependencies in strings in a JSON file. I wonder if instead of in Postgres, we should just save these in files on disk... I wonder though why the |
Yes, we can delete old versions.
|
This approach is (intentionally) forbidden. You should add this to the frontend dashboard in the generator here: https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/monitoring In specific, you will want to add a hidden row like this: https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/monitoring/frontend.go#L250-290 You should define I know the rationale for these constraints is not well-documented, currently, it's on me to write better docs here and I am going to very soon. If you have any questions in the meantime, please don't hesitate to ask. |
How has your workflow been for iterating on a dashboard? Code first seems quite cumbersome. I want to be able to iterate on a dashboard with the Grafana UI and then export that as JSON and check-in to whatever provisioning mechanism we have. What concerns do you have with this approach?
I don't want to define any alerts yet. I just want a dashboard for visualization. How shall we proceed? |
@felixfbecker is this still an issue? Adding to our backlog but feel free to close if it's resolved. |
Yes it is, but it isn't a web task to fix it I believe (unless web owns backend code now as well). |
We can take another look at this next cycle. |
I read through the thread and it sounds like you added a dashboard to track the performance. Do you have a link to that dashboard? I'd be curious to take a look at it. |
Heads up @Joelkw @felixfbecker - the "team/extensibility" label was applied to this issue. |
The graphql?Extensions API is returning a Cache-Control header of |
It also looks like the response payload contains every extension's icon as a base64-encoded string, and the readme of the extension, neither of which are actually shown on most page loads. Perhaps those could be URLs in the response instead of embedded content, and only be loaded (with caching) if the page needs to show them? This would probably decrease the size of the response payload a lot (it's currently over a megabyte for me, by far the largest response payload on a page load). |
@tjkandala Tagging you as you recently did some performance work related to this I think? |
Reopening since #24626 said "Partially (but not fully) fixes" |
While we've arrived at a partial fix, the response time has dropped from 900ms to ~70ms and I think it's fair to consider 70ms fast enough :) Closing this issue since it's no longer relevant |
Currently, the extensions registry page usually takes a long time to load, sometimes even up to minute, because of the GraphQL request to query extensions.
Worse, this request is not only done on the registry page, but also when loading extensions.
And since private instances query extensions from sourcegraph.com, this experience is true for all private instances too.
The primary task here is a backend issue. It's likely that our DB tables and queries are not optimized (e.g. could it be that we have extensions and extension releases in the same table and run a
SELECT DISTINCT
on every query?).Related: #1983
cc @sourcegraph/core-services
The text was updated successfully, but these errors were encountered: