-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A better way to init stats #42160
Labels
Comments
xuyifangreeneyes
added
type/enhancement
sig/planner
SIG: Planner
component/statistics
labels
Mar 13, 2023
12 tasks
ti-chi-bot
pushed a commit
that referenced
this issue
Apr 3, 2023
This was referenced Apr 7, 2023
ti-chi-bot
pushed a commit
that referenced
this issue
Apr 11, 2023
12 tasks
ti-chi-bot
pushed a commit
that referenced
this issue
Apr 14, 2023
12 tasks
15 tasks
ti-chi-bot
pushed a commit
that referenced
this issue
Apr 19, 2023
Merged
12 tasks
This was referenced Apr 26, 2023
16 tasks
12 tasks
12 tasks
12 tasks
Close this enhancement since it's done. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Enhancement
Currently, when a tidb-server starts, one thing to do is to init stats. It loads stats meta and indexes' Histogram and TopN(It doesn't load columns' Histogram and TopN in order to reduce stats mem usage. Columns' Histogram and TopN are loaded when they are needed by the optimizer). If there are many tables and partitions, it may take minutes even one hour to init stats, especially when the cluster is under high pressure and the tidb-server restarts. During the period of init stats, no table stats can be used by the optimizer. The optimizer uses pseudo stats to generate plans, which are probably inefficient and put more pressure to the cluster(Besides, the wrong plans due to pseudo stats may be added into plan cache and the optimizer would reuse them even after real stats are loaded, which is another story).
We have the on-demand stats loading mechanism, which have two kinds, async loading and sync loading(turn on by default in v6.5). However, neither of the two kinds can work in the issue. Async loading cannot start until init stats(https://github.com/pingcap/tidb/blob/master/domain/domain.go#L2041-L2083) is finished. Sync loading would not trigger if table stats meta doesn't exist in stats cache(https://github.com/pingcap/tidb/blob/master/statistics/handle/handle_hist.go#L237-L262).
I add
time.Sleep(20 * time.Minute)
into(*Handle).InitStats
to simulate slow init stats. Here is an example to demonstrate the issue:Then restart the tidb-server.
@winoros comes up with a better way to init stats. When init stats, we only read stats meta. Specifically, we only read
mysql.stats_meta
andmysql.stats_histograms
. The two system tables are much smaller than other stats-related system tables such asmysql.stats_buckets
andmysql.stats_top_n
. There are some details:tidb_partition_prune_mode
isdynamic
(which is the default value), the optimizer only uses global stats and doesn't use partition stats. Currently, parts of partition stats(e.g., indexes' Histogram and TopN) are loaded into memory when init stats, which is unused and takes up much memory. If we use the new way to init stats, unused partition stats won't be loaded into memory.The text was updated successfully, but these errors were encountered: