Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboard search performance regressed after upgraded to v9 #55332

Closed
yunkaisc opened this issue Sep 17, 2022 · 18 comments
Closed

Dashboard search performance regressed after upgraded to v9 #55332

yunkaisc opened this issue Sep 17, 2022 · 18 comments

Comments

@yunkaisc
Copy link

What happened:
After we upgraded Grafana from v8.5.4 to v9.1.4, we noticed a significant performance regression when searching the dashboard.

Based on the data points from logs,
In v8.5.4, the duration of a dashboard search request is much shorter. The percentiles are based on 2,500+ searches.

  • P50 0.60s
  • P90 0.87s
  • P95 1.02s
  • P99 1.56s

In V9.1.5, the duration of a dashboard search request is noticeably longer. The percentiles are based on 3,000+ searches.

  • P50 0.98s
  • P90 5.05s
  • P95 6.30s
  • P99 8.96s

What you expected to happen:
Expect the dashboard search is as fast as v8.

How to reproduce it (as minimally and precisely as possible):
Noticeably slower than before. Also

Anything else we need to know?:

Environment:

  • Grafana version: v9.1.4
  • Data source type & version: N/A
  • OS Grafana is installed on: k8s
  • User OS & Browser: mac & chrome
  • Grafana plugins: N/A
  • Others:
@torkelo
Copy link
Member

torkelo commented Sep 17, 2022

Try enabling our new search engine

[feature_toggles]
panelTitleSearch = true

@yunkaisc
Copy link
Author

Unfortunately, that is not an option for us. We tried to enable this feature toggle in our staging environment and Grafana just went into Crash loops with this error.

level=error msg="Server shutdown" error="*searchV2.StandardSearchService run error: can't build initial dashboard search index for org 1: can't build dashboard search index for org ID 1: error loading dashboards: context deadline exceeded"

Just to confirm, without enabling the panelTitleSearch toggle, Grafana v9 still uses the old search system?

@ryantxu
Copy link
Member

ryantxu commented Sep 20, 2022

Just to confirm, without enabling the panelTitleSearch toggle, Grafana v9 still uses the old search system?

Correct.

How many dashboards? Do you have many LARGE dashboards (perhaps snapshots?)

@yunkaisc
Copy link
Author

We have 12K dashboards. How do you define LARGE dashboards?

@ArturWierzbicki
Copy link
Contributor

Unfortunately, that is not an option for us. We tried to enable this feature toggle in our staging environment and Grafana just went into Crash loops with this error.

level=error msg="Server shutdown" error="*searchV2.StandardSearchService run error: can't build initial dashboard search index for org 1: can't build dashboard search index for org ID 1: error loading dashboards: context deadline exceeded"

Hey @yunkaisc!

First of all, thank you for your patience and for reporting this issue - we are still working on the new search, and are actively trying to make it better.

We have recently added a few configuration options that might help you with this problem. They are available in both 9.1.7 (https://github.com/grafana/grafana/blob/v9.1.x/conf/defaults.ini#L1289-L1300) and 9.2.0 releases. Would you be willing to try the following config

[search] 
dashboard_loading_batch_size = 50
full_reindex_interval = 15m

and letting us know if this issue still persists?

@gregvolk
Copy link

I have taken a similar search performance hit after upgrading from 8.5.3 to 9.2.4. I have 19,112 dashboards.
I also hit the "can't build dashboard search index for org ID 1: error loading dashboards: context deadline exceeded" problem too. Fortunately I am able to get past this error by launching the grafana-server process again but I'm concerned that one of these times it will keep erroring out and I'll be unable to get things going.

I turned on panelTitleSearch feature toggle but it doesn't seem to have helped the performance.

If I set full_reindex_interval=15m does that extend the context deadline mentioned above to 15 minutes?

@frankyi-gh
Copy link

I believe I'm also seeing this issue upgrading from 8.3.5 to 9.2.4. We have 15k dashboards and are using a Postgres DB. We saw large increases in Postgres CPU usage. I looked at the slow DB queries and queries like the one below were taking up like 98% of DB CPU time

SELECT
  dashboard.id,
  dashboard.uid,
  dashboard.title,
  dashboard.slug,
  dashboard_tag.term,
  dashboard.is_folder,
  dashboard.folder_id,
  folder.uid AS folder_uid,
  folder.slug AS folder_slug,
  folder.title AS folder_title 
FROM 
  ( SELECT dashboard.id FROM dashboard WHERE ((( dashboard.uid IN ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21,$22,$23,$24,$25,$26,$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$39,$40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$50,$51,$52,$53,$54,$55,$56,$57,$58,$59,$60,$61,$62,$63,$64,$65,$66,$67,$68,$69,$70,$71,$72,$73,$74,$75,$76,$77,$78,$79,$80,$81,$82,$83,$84,$85,$86,$87,$88,$89,$90,$91,$92,$93,$94,$95,$96,$97,$98,$99,$100,$101,$102,$103,$104,$105,$106,$107,$108,$109,$110,$111,$112,$113,$114,$115,$116,$117,$118,$119,$120,$121,$122,$123,$124,$125,$126,$127,$128,$129,$130,$131,$132,$133,$134,$135,$136,$137,$138,$139,$140,$141,$142,$143,$144,$145,$146,$147,$148,$149,$150,$151,$152,$153,$154,$155,$156,$157,$158,$159,$160,$161,$162,$163,$164,$165,$166,$167,$168,$169,$170,$171,$172,$173,$174,$175,$176,$177,$178,$179,$180,$181,$182,$183,$184,$185,$186,$187,$188,$189,$190,$191,$192,$193,$194,$195,$196,$197,$198,$199,$200,$201,$202,$203,$204,$205,$206,$207,$208,$209,$210,$211,$212,$213,$214,$215,$216,$217,$218,$219,$220,$221,$222,$223,$224,$225,$226,$227,$228,$229,$230,$231,$232,$233,$234,$235,$236,$237,$238,$239,$240,$241,$242,$243,$244,$245,$246,$247,$248,$249,$250,$251,$252,$253,$254,$255,$256,$257,$258,$259,$260,$261,$262,$263,$264,$265,$266,$267,$268,$269,$270,$271,$272,$273,$274,$275,$276,$277,$278,$279,$280,$281,$282,$283,$284,$285,$286) OR dashboard.folder_id IN(SELECT id FROM dashboard WHERE  dashboard.uid IN ($287,$288,$289,$290,$291,$292,$293,$294,$295,$296,$297,$298,$299,$300,$301,$302,$303,$304,$305,$306,$307,$308,$309,$310,$311,$312,$313,$314,$315,$316,$317,$318,$319,$320,$321,$322,$323,$324,$325,$326,$327,$328,$329,$330,$331,$332,$333,$334,$335,$336,$337,$338,$339,$340,$341,$342,$343,$344,$345,$346,$347,$348,$349,$350,$351,$352,$353,$354,$355,$356,$357,$358,$359,$360,$361,$362,$363,$364,$365,$366,$367,$368,$369,$370,$371,$372,$373,$374,$375,$376,$377,$378,$379,$380,$381,$382,$383,$384,$385,$386,$387,$388,$389,$390,$391,$392,$393,$394,$395,$396,$397,$398,$399,$400,$401,$402,$403,$404,$405,$406,$407,$408,$409,$410,$411,$412,$413,$414,$415,$416,$417,$418,$419,$420,$421,$422,$423,$424,$425,$426,$427,$428,$429,$430,$431,$432,$433,$434,$435,$436,$437,$438,$439,$440,$441,$442,$443,$444,$445,$446,$447,$448,$449,$450,$451,$452,$453,$454,$455,$456,$457,$458,$459,$460,$461,$462,$463,$464,$465,$466,$467,$468,$469,$470,$471,$472,$473,$474,$475,$476,$477,$478,$479,$480,$481,$482,$483,$484,$485,$486,$487,$488,$489,$490,$491,$492,$493,$494,$495,$496,$497,$498,$499,$500,$501,$502,$503,$504,$505,$506,$507,$508,$509,$510,$511,$512,$513,$514,$515,$516,$517,$518,$519,$520,$521,$522,$523,$524,$525,$526,$527,$528,$529,$530,$531,$532,$533,$534,$535,$536,$537,$538,$539,$540,$541,$542,$543,$544,$545,$546,$547,$548,$549,$550,$551,$552,$553,$554,$555,$556,$557,$558,$559,$560,$561,$562,$563,$564,$565,$566,$567,$568,$569,$570,$571,$572,$573,$574,$575,$576,$577,$578,$579,$580,$581,$582,$583,$584,$585,$586,$587,$588,$589,$590,$591,$592,$593,$594,$595,$596,$597,$598,$599,$600,$601,$602,$603,$604,$605,$606,$607,$608,$609,$610,$611,$612,$613,$614,$615,$616,$617,$618,$619,$620,$621,$622,$623,$624,$625,$626,$627,$628,$629,$630,$631,$632,$633,$634,$635,$636,$637,$638,$639,$640,$641,$642,$643,$644,$645,$646,$647,$648,$649,$650,$651,$652,$653,$654,$655,$656,$657,$658,$659,$660,$661,$662,$663,$664))) AND NOT dashboard.is_folder) OR ( dashboard.uid IN ($665,$666,$667,$668,$669,$670,$671,$672,$673,$674,$675,$676,$677,$678,$679,$680,$681,$682,$683,$684,$685,$686,$687,$688,$689,$690,$691,$692,$693,$694,$695,$696,$697,$698,$699,$700,$701,$702,$703,$704,$705,$706,$707,$708,$709,$710,$711,$712,$713,$714,$715,$716,$717,$718,$719,$720,$721,$722,$723,$724,$725,$726,$727,$728,$729,$730,$731,$732,$733,$734,$735,$736,$737,$738,$739,$740,$741,$742,$743,$744,$745,$746,$747,$748,$749,$750,$751,$752,$753,$754,$755,$756,$757,$758,$759,$760,$761,$762,$763,$764,$765,$766,$767,$768,$769,$770,$771,$772,$773,$774,$775,$776,$777,$778,$779,$780,$781,$782,$783,$784,$785,$786,$787,$788,$789,$790,$791,$792,$793,$794,$795,$796,$797,$798,$799,$800,$801,$802,$803,$804,$805,$806,$807,$808,$809,$810,$811,$812,$813,$814,$815,$816,$817,$818,$819,$820,$821,$822,$823,$824,$825,$826,$827,$828,$829,$830,$831,$832,$833,$834,$835,$836,$837,$838,$839,$840,$841,$842,$843,$844,$845,$846,$847,$848,$849,$850,$851,$852,$853,$854,$855,$856,$857,$858,$859,$860,$861,$862,$863,$864,$865,$866,$867,$868,$869,$870,$871,$872,$873,$874,$875,$876,$877,$878,$879,$880,$881,$882,$883,$884,$885,$886,$887,$888,$889,$890,$891,$892,$893,$894,$895,$896,$897,$898,$899,$900,$901,$902,$903,$904,$905,$906,$907,$908,$909,$910,$911,$912,$913,$914,$915,$916,$917,$918,$919,$920,$921,$922,$923,$924,$925,$926,$927,$928,$929,$930,$931,$932,$933,$934,$935,$936,$937,$938,$939,$940,$941,$942,$943,$944,$945,$946,$947,$948,$949,$950,$951,$952,$953,$954,$955,$956,$957,$958,$959,$960,$961,$962,$963,$964,$965,$966,$967,$968,$969,$970,$971,$972,$973,$974,$975,$976,$977,$978,$979,$980,$981,$982,$983,$984,$985,$986,$987,$988,
  $989,$990,$991,$992,$993,$994,$995,$996,$997,$998,$999,$1000,$1001,$1002,$1003,$1004,$1005,$1006,$1007,$1008,$1009,$1010,$1011,$1012,$1013,$1014,$1015,$1016,$1017,$1018,$1019,$1020,$
  1021,$1022,$1023,$1024,$1025,$1026,$1027,$1028,$1029,$1030,$1031,$1032,$1033,$1034,$1035,$1036,$1037,$1038,$1039,$1040,$1041,$1042,$1043) AND dashboard.is_folder)) AND dashboard.org_
  id=$1044 AND dashboard.is_folder = false AND dashboard.folder_id IN ($1045) ORDER BY dashboard.title ASC NULLS FIRST LIMIT 1000 OFFSET 0) AS ids
INNER JOIN dashboard ON ids.id = dashboard.id
LEFT OUTER JOIN dashboard AS folder ON folder.id = dashboard.folder_id
LEFT OUTER JOIN dashboard_tag ON dashboard.id = dashboard_tag.dashboard_id
ORDER BY dashboard.title ASC NULLS FIRST

@frankyi-gh
Copy link

scaling up Postgres 4x improved our search performance 4x

@gregvolk
Copy link

scaling up Postgres 4x improved our search performance 4x

@frankyi-gh when you say "scaling up Postgres 4x" do you mean 4x CPU, 4x memory, or something else? If 4x CPU, did you go from 2 CPUs to 8 CPUs? I'm just trying to get a feel for magnitude of your resource increase.

@frankyi-gh
Copy link

@gregvolk

4x CPU and 4x memory. Specifically, I went from an AWS db.r5.xlarge to db.r5.4xlarge

@frankyi-gh
Copy link

frankyi-gh commented Nov 22, 2022

Turning on panelTitleSearch with the default search settings seems to have actually increased DB load, which made general performance a little bit worse. It didn't do much for search times.

@gregvolk
Copy link

Thank you, I appreciate the tuning details.

I will say that I really do like the new panelTitleSearch feature as I have automation in place that stores a wealth of information (circuit ids, physical node names, etc.) in panel titles. It's really nice to be able to use Grafana's search to dig through that data.

@frankyi-gh
Copy link

I've been told that this issue will be fixed by #56813 in 9.3

@frankyi-gh
Copy link

9.3 with panelTitleSearch enabled doesn't seem any better...

@afeshti
Copy link

afeshti commented Dec 20, 2022

Hello, updating from grafana-8.5.5-1 to grafana-9.3.2-1 brought me to this case. We also are facing the same issue but what can I say is that the problem of slow dashboard lists impact only non-admin users.
We run mariadb-10.6.11 as grafana database and the new search engine nor the:

[search] 
dashboard_loading_batch_size = 50
full_reindex_interval = 15m

is helping me at the moment. The mariadb slow query log pointed us to the following query:

# Thread_id: 2136  Schema: grafana93  QC_hit: No
# Query_time: 23.690444  Lock_time: 0.000089  Rows_sent: 987  Rows_examined: 1222783
# Rows_affected: 0  Bytes_sent: 0
SET timestamp=1671524253;
SELECT `uid` FROM `dashboard` WHERE (((dashboard.uid IN (SELECT substr(scope, 16) FROM permission WHERE action IN ('dashboards:read') AND scope LIKE 'dashboards:uid:%' AND role_id IN(SELECT distinct id FROM role INNER JOIN (
                        SELECT ur.role_id
                        FROM user_role AS ur
                        WHERE ur.user_id = 175
                        AND (ur.org_id = 1 OR ur.org_id = 0)
                UNION
                        SELECT br.role_id FROM builtin_role AS br
                        WHERE br.role IN ('Editor')
                        AND (br.org_id = 1 OR br.org_id = 0)
                ) as all_role ON role.id = all_role.role_id) GROUP BY role_id, scope HAVING COUNT(action) = 1) AND NOT dashboard.is_folder) OR (dashboard.folder_id IN (SELECT id FROM dashboard as d WHERE d.uid IN (SELECT substr(scope, 13) FROM permission WHERE action IN ('dashboards:read') AND scope LIKE 'folders:uid:%' AND role_id IN(SELECT distinct id FROM role INNER JOIN (
                        SELECT ur.role_id
                        FROM user_role AS ur
                        WHERE ur.user_id = 175
                        AND (ur.org_id = 1 OR ur.org_id = 0)
                UNION
                        SELECT br.role_id FROM builtin_role AS br
                        WHERE br.role IN ('Editor')
                        AND (br.org_id = 1 OR br.org_id = 0)
                ) as all_role ON role.id = all_role.role_id) GROUP BY role_id, scope HAVING COUNT(action) = 1)) AND NOT dashboard.is_folder) OR (dashboard.uid IN (SELECT substr(scope, 13) FROM permission WHERE action IN ('folders:read') AND scope LIKE 'folders:uid:%' AND role_id IN(SELECT distinct id FROM role INNER JOIN (
                        SELECT ur.role_id
                        FROM user_role AS ur
                        WHERE ur.user_id = 175
                        AND (ur.org_id = 1 OR ur.org_id = 0)
                UNION
                        SELECT br.role_id FROM builtin_role AS br
                        WHERE br.role IN ('Editor')
                        AND (br.org_id = 1 OR br.org_id = 0)
                ) as all_role ON role.id = all_role.role_id) GROUP BY role_id, scope HAVING COUNT(action) = 1) AND dashboard.is_folder))) AND (org_id = 1);

If in that query we remove the:

GROUP BY role_id, scope HAVING COUNT(action) = 1

the query run super fast again.

@IevaVasiljeva
Copy link
Contributor

Hey all! We've worked on performance improvements for both the old and the new search (PR for the new search: #60729; PR for the old search: #60582). These will be released with Grafana v9.4. Let us know if that helps!

@afeshti
Copy link

afeshti commented Mar 2, 2023

Hello, updating from grafana-8.5.5-1 to grafana-9.3.2-1 brought me to this case. We also are facing the same issue but what can I say is that the problem of slow dashboard lists impact only non-admin users. We run mariadb-10.6.11 as grafana database and the new search engine nor the:

[search] 
dashboard_loading_batch_size = 50
full_reindex_interval = 15m

is helping me at the moment. The mariadb slow query log pointed us to the following query:

# Thread_id: 2136  Schema: grafana93  QC_hit: No
# Query_time: 23.690444  Lock_time: 0.000089  Rows_sent: 987  Rows_examined: 1222783
# Rows_affected: 0  Bytes_sent: 0
SET timestamp=1671524253;
SELECT `uid` FROM `dashboard` WHERE (((dashboard.uid IN (SELECT substr(scope, 16) FROM permission WHERE action IN ('dashboards:read') AND scope LIKE 'dashboards:uid:%' AND role_id IN(SELECT distinct id FROM role INNER JOIN (
                        SELECT ur.role_id
                        FROM user_role AS ur
                        WHERE ur.user_id = 175
                        AND (ur.org_id = 1 OR ur.org_id = 0)
                UNION
                        SELECT br.role_id FROM builtin_role AS br
                        WHERE br.role IN ('Editor')
                        AND (br.org_id = 1 OR br.org_id = 0)
                ) as all_role ON role.id = all_role.role_id) GROUP BY role_id, scope HAVING COUNT(action) = 1) AND NOT dashboard.is_folder) OR (dashboard.folder_id IN (SELECT id FROM dashboard as d WHERE d.uid IN (SELECT substr(scope, 13) FROM permission WHERE action IN ('dashboards:read') AND scope LIKE 'folders:uid:%' AND role_id IN(SELECT distinct id FROM role INNER JOIN (
                        SELECT ur.role_id
                        FROM user_role AS ur
                        WHERE ur.user_id = 175
                        AND (ur.org_id = 1 OR ur.org_id = 0)
                UNION
                        SELECT br.role_id FROM builtin_role AS br
                        WHERE br.role IN ('Editor')
                        AND (br.org_id = 1 OR br.org_id = 0)
                ) as all_role ON role.id = all_role.role_id) GROUP BY role_id, scope HAVING COUNT(action) = 1)) AND NOT dashboard.is_folder) OR (dashboard.uid IN (SELECT substr(scope, 13) FROM permission WHERE action IN ('folders:read') AND scope LIKE 'folders:uid:%' AND role_id IN(SELECT distinct id FROM role INNER JOIN (
                        SELECT ur.role_id
                        FROM user_role AS ur
                        WHERE ur.user_id = 175
                        AND (ur.org_id = 1 OR ur.org_id = 0)
                UNION
                        SELECT br.role_id FROM builtin_role AS br
                        WHERE br.role IN ('Editor')
                        AND (br.org_id = 1 OR br.org_id = 0)
                ) as all_role ON role.id = all_role.role_id) GROUP BY role_id, scope HAVING COUNT(action) = 1) AND dashboard.is_folder))) AND (org_id = 1);

If in that query we remove the:

GROUP BY role_id, scope HAVING COUNT(action) = 1

the query run super fast again.

Hello! Tested now with 9.4.2 and the dashboard list is OK, run very well. Thanks

@IevaVasiljeva
Copy link
Contributor

Thanks for confirming that the fix has improved the search performance @afeshti!

I'll close this issue now 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants