Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL 如何估算 LIKE 的 return rows 數量 #2

Open
rueian opened this issue Jan 1, 2020 · 0 comments
Open

PostgreSQL 如何估算 LIKE 的 return rows 數量 #2

rueian opened this issue Jan 1, 2020 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@rueian
Copy link
Owner

rueian commented Jan 1, 2020

使用 LIKE 前,先看看 like_support.c

7e8a47dfc4d554b6f15d329bcbb69029_XL

接續在前篇文章中提過底層 scan node 的 row estimation 結果以及上層其他 plan node 操作會綜合影響資料庫最後使用哪一個 plan,所以 scan node 的 row estimation 相當重要。

PostgreSQL 如何估算 HashAggregate 的 Return Rows ,以及低估的後果
其中一個很重要的步驟就是預估 Return Rows 的數量,它被用來:

  1. 給底層的 Scan Node 比較各種 Access Method (Seq Scan, Index Scan, Bitmap Index Scan + Bitmap Heap Scan, …) 的 Cost
  2. 往往更重要的是 Return Rows 的估算對於上層其他 Plan Node 的 Cost 有巨大影響,例如 Nested LoopHash Join,因為預估它們就是會從底下的 Scan Node 拿出這麼多的 rows 數量來處理

這次我們遇到的案例則是 LIKE 的 row estimation 與實際差很多,想要一探究竟


案例

我們有一張表經常用 LIKE 進行 prefix pattern matching 查詢,表的結構如下:

CREATE TABLE IF NOT EXISTS counters (
  id TEXT PRIMARY KEY COLLATE "C",
  value BIGINT
);

注意 我們在 id 上面設定 COLLATE "C" 是為了要讓 PostgreSQL 能直接使用 PRIMARY KEY 的 B-tree 索引就能縮小 prefix pattern matching 的範圍,否則得額外建立一個 text_pattern_ops 的 B-tree 索引,參考: 11.10. Operator Classes and Operator Families

實際的 Query Plan:

postgres=# explain analyze select * from counters where id like 'ms:149895:%';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Index Scan using counters_pkey on counters  (cost=0.56..8.58 rows=488 width=36) (actual time=0.014..0.022 rows=10 loops=1)
   Index Cond: ((id >= 'ms:149895:'::text) AND (id < 'ms:149895;'::text))
   Filter: (id ~~ 'ms:149895:%'::text)
 Planning Time: 0.148 ms
 Execution Time: 0.034 ms
(5 rows)

可以看到原本預估回傳的是 488 個 rows,但是實際上只有 10 個,
雖然這次我們沒有把這個結果再拿去做其他處理,資料庫他依然如預期中將 prefix 改成了 range query 去用了 B-tree index scan,但又是為什麼會高估了快 48 倍呢?


讀原始碼

我們可以從 pg_operator 表中知道各個 operator 對應的 selectivity function,例如:

postgres=# select oid, oprname, oprcode, oprrest from pg_operator where oprname = '~~';
 oid  | oprname |  oprcode   | oprrest
------+---------+------------+---------
 1207 | ~~      | namelike   | likesel
 1209 | ~~      | textlike   | likesel
 1211 | ~~      | bpcharlike | likesel
 2016 | ~~      | bytealike  | likesel
(4 rows)

根據 oprrest 我們可以知道 LIKE (~~) 所使用的 selectivity function 就是 likesel, 我們可以進一步在 pg_proc 裡面找到它的訊息:

postgres=# select oid, proname, prosrc from pg_proc where oid = (select distinct oprrest from pg_operator where oprname = '~~');
oid  | proname | prosrc
------+---------+---------
 1819 | likesel | likesel
(1 row)

根據 prosrc 剛好它的 c function 名稱就是 likesel

其實 pg_operatorpg_proc 這兩張表的初始資料在 source code 中的 src/include/catalog/ 也找得到,詳細:Chapter 69. System Catalog Declarations and Initial Contents

上次提到 PostgreSQL 大部分 selectivity function 都可以在 selfuncs.c 裡面找到,
不過跟 like 相關的是其中一個例外,它們在 2019.2.14 49fa99e 被重構到了 like_support.c 裡面了

這次我們要找的就是其中的 likesel() 中使用到的 patternsel() 中的 patternsel_common()

patternsel_common() 主要基於欄位的 histogram 統計資料,具體流程:

  1. 先檢查我們的 pattern 是不是 exact match ,若是則轉用 = 的 selectivity function
    if (pstatus == Pattern_Prefix_Exact)
    {
    /*
    * Pattern specifies an exact match, so estimate as for '='
    */
    result = var_eq_const(&vardata, eqopr, prefix->constvalue,
    false, true, false);
    }
  2. 若我們超過(含) 100 個 histogram buckets,我們則直接將 pattern 拿去匹配去掉頭尾之後的每個 histogram boundary 當作 selectivity
    selec = histogram_selectivity(&vardata, &opproc, constval, true,
    10, 1, &hist_size);
    /* If not at least 100 entries, use the heuristic method */
    if (hist_size < 100)
    {
    Selectivity heursel;
    Selectivity prefixsel;

    double
    histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
    Datum constval, bool varonleft,
    int min_hist_size, int n_skip,
    int *hist_size)
    {
    double result;
    AttStatsSlot sslot;
    /* check sanity of parameters */
    Assert(n_skip >= 0);
    Assert(min_hist_size > 2 * n_skip);
    if (HeapTupleIsValid(vardata->statsTuple) &&
    statistic_proc_security_check(vardata, opproc->fn_oid) &&
    get_attstatsslot(&sslot, vardata->statsTuple,
    STATISTIC_KIND_HISTOGRAM, InvalidOid,
    ATTSTATSSLOT_VALUES))
    {
    *hist_size = sslot.nvalues;
    if (sslot.nvalues >= min_hist_size)
    {
    int nmatch = 0;
    int i;
    for (i = n_skip; i < sslot.nvalues - n_skip; i++)
    {
    if (varonleft ?
    DatumGetBool(FunctionCall2Coll(opproc,
    sslot.stacoll,
    sslot.values[i],
    constval)) :
    DatumGetBool(FunctionCall2Coll(opproc,
    sslot.stacoll,
    constval,
    sslot.values[i])))
    nmatch++;
    }
    result = ((double) nmatch) / ((double) (sslot.nvalues - 2 * n_skip));
    }
    else
    result = -1;
    free_attstatsslot(&sslot);
    }
    else
    {
    *hist_size = 0;
    result = -1;
    }
    return result;
    }
  3. 若 histogram buckets 不足 100,那我們將 pattern 拆成 prefix 與 非 prefix 兩部份分開計算 selectivity 再相乘合併
  • prefix selectivity 計算方式如同 range selectivity,估出 prefix 與 greater prefix 分別(例如 ms:149895:ms:149895;) 在兩個 histogram bucket 的大概位置然後再用兩個位置夾住的範圍占比當作 selectivity
    if (hist_size < 100)
    {
    Selectivity heursel;
    Selectivity prefixsel;
    if (pstatus == Pattern_Prefix_Partial)
    prefixsel = prefix_selectivity(root, &vardata,
    eqopr, ltopr, geopr,
    prefix);

    /*
    * Estimate the selectivity of a fixed prefix for a pattern match.
    *
    * A fixed prefix "foo" is estimated as the selectivity of the expression
    * "variable >= 'foo' AND variable < 'fop'".
    *
    * The selectivity estimate is with respect to the portion of the column
    * population represented by the histogram --- the caller must fold this
    * together with info about MCVs and NULLs.
    *
    * We use the specified btree comparison operators to do the estimation.
    * The given variable and Const must be of the associated datatype(s).
    *
    * XXX Note: we make use of the upper bound to estimate operator selectivity
    * even if the locale is such that we cannot rely on the upper-bound string.
    * The selectivity only needs to be approximately right anyway, so it seems
    * more useful to use the upper-bound code than not.
    */
    static Selectivity
    prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
    Oid eqopr, Oid ltopr, Oid geopr,
    Const *prefixcon)
    {
    Selectivity prefixsel;
    FmgrInfo opproc;
    AttStatsSlot sslot;
    Const *greaterstrcon;
    Selectivity eq_sel;
    /* Estimate the selectivity of "x >= prefix" */
    fmgr_info(get_opcode(geopr), &opproc);
    prefixsel = ineq_histogram_selectivity(root, vardata,
    &opproc, true, true,
    prefixcon->constvalue,
    prefixcon->consttype);
    if (prefixsel < 0.0)
    {
    /* No histogram is present ... return a suitable default estimate */
    return DEFAULT_MATCH_SEL;
    }
    /*-------
    * If we can create a string larger than the prefix, say
    * "x < greaterstr". We try to generate the string referencing the
    * collation of the var's statistics, but if that's not available,
    * use DEFAULT_COLLATION_OID.
    *-------
    */
    if (HeapTupleIsValid(vardata->statsTuple) &&
    get_attstatsslot(&sslot, vardata->statsTuple,
    STATISTIC_KIND_HISTOGRAM, InvalidOid, 0))
    /* sslot.stacoll is set up */ ;
    else
    sslot.stacoll = DEFAULT_COLLATION_OID;
    fmgr_info(get_opcode(ltopr), &opproc);
    greaterstrcon = make_greater_string(prefixcon, &opproc, sslot.stacoll);
    if (greaterstrcon)
    {
    Selectivity topsel;
    topsel = ineq_histogram_selectivity(root, vardata,
    &opproc, false, false,
    greaterstrcon->constvalue,
    greaterstrcon->consttype);
    /* ineq_histogram_selectivity worked before, it shouldn't fail now */
    Assert(topsel >= 0.0);
    /*
    * Merge the two selectivities in the same way as for a range query
    * (see clauselist_selectivity()). Note that we don't need to worry
    * about double-exclusion of nulls, since ineq_histogram_selectivity
    * doesn't count those anyway.
    */
    prefixsel = topsel + prefixsel - 1.0;
    }
    /*
    * If the prefix is long then the two bounding values might be too close
    * together for the histogram to distinguish them usefully, resulting in a
    * zero estimate (plus or minus roundoff error). To avoid returning a
    * ridiculously small estimate, compute the estimated selectivity for
    * "variable = 'foo'", and clamp to that. (Obviously, the resultant
    * estimate should be at least that.)
    *
    * We apply this even if we couldn't make a greater string. That case
    * suggests that the prefix is near the maximum possible, and thus
    * probably off the end of the histogram, and thus we probably got a very
    * small estimate from the >= condition; so we still need to clamp.
    */
    eq_sel = var_eq_const(vardata, eqopr, prefixcon->constvalue,
    false, true, false);
    prefixsel = Max(prefixsel, eq_sel);
    return prefixsel;
    }
  • 非 prefix 部分的 selectivity 則是用一些 magic number 猜測,例如多一個 % 的話 selectivity 就 * 5,多個 char selectivity 就 * 0.2
    /*
    * Pull out any fixed prefix implied by the pattern, and estimate the
    * fractional selectivity of the remainder of the pattern. Unlike many
    * other selectivity estimators, we use the pattern operator's actual
    * collation for this step. This is not because we expect the collation
    * to make a big difference in the selectivity estimate (it seldom would),
    * but because we want to be sure we cache compiled regexps under the
    * right cache key, so that they can be re-used at runtime.
    */
    patt = (Const *) other;
    pstatus = pattern_fixed_prefix(patt, ptype, collation,
    &prefix, &rest_selec);

    非 prefix 的部分是在 pattern_fixed_prefix() 時就取出儲存在 rest_selec,實際計算是在 like_selectivity()
    /*
    * Estimate the selectivity of a pattern of the specified type.
    * Note that any fixed prefix of the pattern will have been removed already,
    * so actually we may be looking at just a fragment of the pattern.
    *
    * For now, we use a very simplistic approach: fixed characters reduce the
    * selectivity a good deal, character ranges reduce it a little,
    * wildcards (such as % for LIKE or .* for regex) increase it.
    */
    #define FIXED_CHAR_SEL 0.20 /* about 1/5 */
    #define CHAR_RANGE_SEL 0.25
    #define ANY_CHAR_SEL 0.9 /* not 1, since it won't match end-of-string */
    #define FULL_WILDCARD_SEL 5.0
    #define PARTIAL_WILDCARD_SEL 2.0
    static Selectivity
    like_selectivity(const char *patt, int pattlen, bool case_insensitive)
    {
    Selectivity sel = 1.0;
    int pos;
    /* Skip any leading wildcard; it's already factored into initial sel */
    for (pos = 0; pos < pattlen; pos++)
    {
    if (patt[pos] != '%' && patt[pos] != '_')
    break;
    }
    for (; pos < pattlen; pos++)
    {
    /* % and _ are wildcard characters in LIKE */
    if (patt[pos] == '%')
    sel *= FULL_WILDCARD_SEL;
    else if (patt[pos] == '_')
    sel *= ANY_CHAR_SEL;
    else if (patt[pos] == '\\')
    {
    /* Backslash quotes the next character */
    pos++;
    if (pos >= pattlen)
    break;
    sel *= FIXED_CHAR_SEL;
    }
    else
    sel *= FIXED_CHAR_SEL;
    }
    /* Could get sel > 1 if multiple wildcards */
    if (sel > 1.0)
    sel = 1.0;
    return sel;
    }
  1. 若發現前面算出來的 selectivity 太大或太小,都捨棄,改成 0.0001 或 0.9999

    /* In any case, don't believe extremely small or large estimates. */
    if (selec < 0.0001)
    selec = 0.0001;
    else if (selec > 0.9999)
    selec = 0.9999;

  2. 最後再加上 mcv 統計與 null fraction 修正

    /*
    * If we have most-common-values info, add up the fractions of the MCV
    * entries that satisfy MCV OP PATTERN. These fractions contribute
    * directly to the result selectivity. Also add up the total fraction
    * represented by MCV entries.
    */
    mcv_selec = mcv_selectivity(&vardata, &opproc, constval, true,
    &sumcommon);
    /*
    * Now merge the results from the MCV and histogram calculations,
    * realizing that the histogram covers only the non-null values that
    * are not listed in MCV.
    */
    selec *= 1.0 - nullfrac - sumcommon;
    selec += mcv_selec;
    result = selec;

流程中可以看到 LIKE 的 selectivity 的計算主要基於欄位的 histogram 統計資料,再加上一些 mcv 與一些 magic number。

關於 histogram 與 mcv 可以參考 70.1. Row Estimation Examples 它們也是主要用來做 >, <, = 的統計資訊。

那我們的案例中預估 488 rows 是怎麼來的呢?

首先關於 histogram buckets 數量,這個是可以透過 ALTER TABLE SET STATISTICS 為個別欄位設定,而我們是使用預設的 default_statistics_target=100
14.2. Statistics Used by the Planner

而我們的 histogram 確實也有 100 個 buckets (101 個 boundary)

postgres=# SELECT array_length(histogram_bounds, 1) FROM pg_stats WHERE tablename='counters' AND attname='id';
 array_length
--------------
          101
(1 row)

也就是我們的案例是直接用 pattern 去匹配去掉頭尾的 101-2 = 99 個 boundary 當作 selectivity

而實際上我們的 boundary 大約是長這樣

postgres=# SELECT unnest(histogram_bounds::text::text[]) FROM pg_stats WHERE tablename='counters' AND attname='id';
                            unnest
--------------------------------------------------------------
 ms:103734:p
 ms:1074631:cl
 ms:1103301:c
 ms:1137423:pn
 ms:1168158:pr
 ms:1192862:p
 ms:1221019:cl
 ms:1243628:c
 ...
(101 rows)

我們的案例拿 ms:149895:% 去用這個 histogram_bounds 算 selectivity 出來會是 0,
然後經過上述流程第四步後會被改成 0.0001


驗證

為了驗證,我們看一下整張表的預估是多少 rows

postgres=# EXPLAIN SELECT * FROM counters;
                            QUERY PLAN
-------------------------------------------------------------------
 Seq Scan on counters  (cost=0.00..89169.11 rows=4880311 width=36)
(1 row)

是 4880311,LIKE 的估算確實是它的萬分之一

若我們將 0.0001 改為 0.00001,然後重新編譯後再跑一次 EXPLAIN

                 /* In any case, don't believe extremely small or large estimates. */
		if (selec < 0.00001)
			selec = 0.00001;
		else if (selec > 0.9999)
			selec = 0.9999;
postgres=# explain analyze select * from counters where id like 'ms:149895:%';
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Index Scan using counters_pkey on counters  (cost=0.56..8.58 rows=49 width=36) (actual time=0.015..0.023 rows=10 loops=1)
   Index Cond: ((id >= 'ms:149895:'::text) AND (id < 'ms:149895;'::text))
   Filter: (id ~~ 'ms:149895:%'::text)
 Planning Time: 0.146 ms
 Execution Time: 0.036 ms
(5 rows)

可以看到確實變成 49


解決方案

雖然在我們這次的案例中 row estimations 對於 scan table 的方式沒有影響,畢竟有適合的 index 可以用,就算是估萬分之一也不至於會到使用 full table scan,除非 random_page_cost 真的設到超級高

但若我們有再把這個結果集再拿去查詢其他 table 的話,這個萬分之一的估計可能就會導致掃描其他 table 的方式改變了。

在一開始的 EXPLAIN 中可以看到 PostgreSQL 幫我們 Query 改寫成 range query + like filter

postgres=# explain analyze select * from counters where id like 'ms:149895:%';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Index Scan using counters_pkey on counters  (cost=0.56..8.58 rows=488 width=36) (actual time=0.014..0.022 rows=10 loops=1)
   Index Cond: ((id >= 'ms:149895:'::text) AND (id < 'ms:149895;'::text))
   Filter: (id ~~ 'ms:149895:%'::text)
 Planning Time: 0.148 ms
 Execution Time: 0.034 ms
(5 rows)

不過實際上目前 PostgreSQL 是在估算完 return rows 之後才進行改寫,若我們真要讓他估算更準確的話,可以預先就寫成 range query + like filter:

postgres=# explain analyze select * from counters where id >= 'ms:149895:' and id <= 'ms:149895;' and id like 'ms:149895:%';
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using counters_pkey on counters  (cost=0.56..8.58 rows=1 width=36) (actual time=0.023..0.039 rows=10 loops=1)
   Index Cond: ((id >= 'ms:149895:'::text) AND (id <= 'ms:149895;'::text) AND (id >= 'ms:149895:'::text) AND (id < 'ms:149895;'::text))
   Filter: (id ~~ 'ms:149895:%'::text)
 Planning Time: 0.156 ms
 Execution Time: 0.036 ms
(5 rows)

總結

LIKE 的估算最少會是整張表萬分之一先估算再改寫 Query 只是目前 PostgreSQL 的實作細節,或許哪一天就改掉了。

不過目前來說若萬分之一的估算是嚴重高估,又直接把 LIKE 的結果再拿去查其他表的話,是有可能會因此挑選到比較差的 Query Plan,最好還是先 EXPLAIN 看一下是否有需要改寫 Query。


連結

前篇:PostgreSQL 如何估算 HashAggregate 的 Return Rows ,以及低估的後果

關於 histogram 與 mcv 統計在 PostgreSQL 手冊中的第 70.1 章有詳細說明
70.1. Row Estimation Examples

關於 default_statistics_target 除了在 PostgreSQL 手冊中的第 14.2 章有詳細說明 14.2. Statistics Used by the Planner 之外 cybertec 也有一篇 changing-histogram-sizes

另外這次我們看到的 LIKE 的 prefix pattern matching 被改寫成 range query + like filter 其實已經被重構到了 PostgreSQL 12 的新功能 SupportRequestIndexCondition 之下

else if (IsA(rawreq, SupportRequestIndexCondition))
{
/* Try to convert operator/function call to index conditions */
SupportRequestIndexCondition *req = (SupportRequestIndexCondition *) rawreq;
/*
* Currently we have no "reverse" match operators with the pattern on
* the left, so we only need consider cases with the indexkey on the
* left.
*/
if (req->indexarg != 0)
return NULL;
if (is_opclause(req->node))
{
OpExpr *clause = (OpExpr *) req->node;
Assert(list_length(clause->args) == 2);
ret = (Node *)
match_pattern_prefix((Node *) linitial(clause->args),
(Node *) lsecond(clause->args),
ptype,
clause->inputcollid,
req->opfamily,
req->indexcollation);
}
else if (is_funcclause(req->node)) /* be paranoid */
{
FuncExpr *clause = (FuncExpr *) req->node;
Assert(list_length(clause->args) == 2);
ret = (Node *)
match_pattern_prefix((Node *) linitial(clause->args),
(Node *) lsecond(clause->args),
ptype,
clause->inputcollid,
req->opfamily,
req->indexcollation);
}
}
return ret;

在 PostgreSQL 12 我們在創建 function 時可以透過 CREATE FUNCTION ... SUPPORT ... 來設定支援 SupportRequestIndexCondition, SupportRequestSelectivity ... 等的 c function 來客製化 planner 的行為。參考: 37.11. Function Optimization InformationCREATE FUNCTION

@rueian rueian added the documentation Improvements or additions to documentation label Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant