Skip to content

Commit

Permalink
Remove useless self-joins
Browse files Browse the repository at this point in the history
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result.  Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.

The SJE proof is based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
 1. At most one inner row matches the join clause.
 2. Each matched inner row must be (physically) the same row as the outer one.

In this patch we use the next approach to identify a self-join:
 1. Collect all merge-joinable join quals which look like a.x = b.x
 2. Add to the list above the baseretrictinfo of the inner table.
 3. Check innerrel_is_unique() for the qual list.  If it returns false, skip
    this pair of joining tables.
 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
    the possibility of self-join elimination inner and outer clauses must have
    an exact match.

The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins.  Tests, covering this feature,
were added to join.sql.  Some regression tests changed due to self-join removal
logic.

Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
  • Loading branch information
akorotkov committed Oct 25, 2023
1 parent 8f0fd47 commit d3d55ce
Show file tree
Hide file tree
Showing 14 changed files with 2,457 additions and 68 deletions.
16 changes: 16 additions & 0 deletions doc/src/sgml/config.sgml
Expand Up @@ -5306,6 +5306,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>

<varlistentry id="guc-enable_self_join_removal" xreflabel="enable_self_join_removal">
<term><varname>enable_self_join_removal</varname> (<type>boolean</type>)
<indexterm>
<primary><varname>enable_self_join_removal</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Enables or disables the query planner's optimization which analyses
the query tree and replaces self joins with semantically equivalent
single scans. Takes into consideration only plain tables.
The default is <literal>on</literal>.
</para>
</listitem>
</varlistentry>

<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>
Expand Down
39 changes: 39 additions & 0 deletions src/backend/optimizer/path/indxpath.c
Expand Up @@ -3494,6 +3494,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
{
return relation_has_unique_index_ext(root, rel, restrictlist,
exprlist, oprlist, NULL);
}

/*
* relation_has_unique_index_ext
* Same as relation_has_unique_index_for(), but supports extra_clauses
* parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
* which were used to derive uniqueness.
*/
bool
relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist,
List **extra_clauses)
{
ListCell *ic;

Expand Down Expand Up @@ -3549,6 +3565,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
List *exprs = NIL;

/*
* If the index is not unique, or not immediately enforced, or if it's
Expand Down Expand Up @@ -3600,6 +3617,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */

if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
{
MemoryContext oldMemCtx =
MemoryContextSwitchTo(root->planner_cxt);

/*
* Add filter clause into a list allowing caller to
* know if uniqueness have made not only by join
* clauses.
*/
Assert(bms_is_empty(rinfo->left_relids) ||
bms_is_empty(rinfo->right_relids));
if (extra_clauses)
exprs = lappend(exprs, rinfo);
MemoryContextSwitchTo(oldMemCtx);
}

break;
}
}
Expand Down Expand Up @@ -3642,7 +3677,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,

/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
{
if (extra_clauses)
*extra_clauses = exprs;
return true;
}
}

return false;
Expand Down

0 comments on commit d3d55ce

Please sign in to comment.