Skip to content

Commit

Permalink
Rework planning and execution of UPDATE and DELETE.
Browse files Browse the repository at this point in the history
This patch makes two closely related sets of changes:

1. For UPDATE, the subplan of the ModifyTable node now only delivers
the new values of the changed columns (i.e., the expressions computed
in the query's SET clause) plus row identity information such as CTID.
ModifyTable must re-fetch the original tuple to merge in the old
values of any unchanged columns.  The core advantage of this is that
the changed columns are uniform across all tables of an inherited or
partitioned target relation, whereas the other columns might not be.
A secondary advantage, when the UPDATE involves joins, is that less
data needs to pass through the plan tree.  The disadvantage of course
is an extra fetch of each tuple to be updated.  However, that seems to
be very nearly free in context; even worst-case tests don't show it to
add more than a couple percent to the total query cost.  At some point
it might be interesting to combine the re-fetch with the tuple access
that ModifyTable must do anyway to mark the old tuple dead; but that
would require a good deal of refactoring and it seems it wouldn't buy
all that much, so this patch doesn't attempt it.

2. For inherited UPDATE/DELETE, instead of generating a separate
subplan for each target relation, we now generate a single subplan
that is just exactly like a SELECT's plan, then stick ModifyTable
on top of that.  To let ModifyTable know which target relation a
given incoming row refers to, a tableoid junk column is added to
the row identity information.  This gets rid of the horrid hack
that was inheritance_planner(), eliminating O(N^2) planning cost
and memory consumption in cases where there were many unprunable
target relations.

Point 2 of course requires point 1, so that there is a uniform
definition of the non-junk columns to be returned by the subplan.
We can't insist on uniform definition of the row identity junk
columns however, if we want to keep the ability to have both
plain and foreign tables in a partitioning hierarchy.  Since
it wouldn't scale very far to have every child table have its
own row identity column, this patch includes provisions to merge
similar row identity columns into one column of the subplan result.
In particular, we can merge the whole-row Vars typically used as
row identity by FDWs into one column by pretending they are type
RECORD.  (It's still okay for the actual composite Datums to be
labeled with the table's rowtype OID, though.)

There is more that can be done to file down residual inefficiencies
in this patch, but it seems to be committable now.

FDW authors should note several API changes:

* The argument list for AddForeignUpdateTargets() has changed, and so
has the method it must use for adding junk columns to the query.  Call
add_row_identity_var() instead of manipulating the parse tree directly.
You might want to reconsider exactly what you're adding, too.

* PlanDirectModify() must now work a little harder to find the
ForeignScan plan node; if the foreign table is part of a partitioning
hierarchy then the ForeignScan might not be the direct child of
ModifyTable.  See postgres_fdw for sample code.

* To check whether a relation is a target relation, it's no
longer sufficient to compare its relid to root->parse->resultRelation.
Instead, check it against all_result_relids or leaf_result_relids,
as appropriate.

Amit Langote and Tom Lane

Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
  • Loading branch information
tglsfdc committed Mar 31, 2021
1 parent 055fee7 commit 86dc900
Show file tree
Hide file tree
Showing 55 changed files with 2,352 additions and 2,198 deletions.
17 changes: 9 additions & 8 deletions contrib/postgres_fdw/deparse.c
Expand Up @@ -1275,7 +1275,7 @@ deparseLockingClause(deparse_expr_cxt *context)
* that DECLARE CURSOR ... FOR UPDATE is supported, which it isn't
* before 8.3.
*/
if (relid == root->parse->resultRelation &&
if (bms_is_member(relid, root->all_result_relids) &&
(root->parse->commandType == CMD_UPDATE ||
root->parse->commandType == CMD_DELETE))
{
Expand Down Expand Up @@ -1867,6 +1867,7 @@ deparseUpdateSql(StringInfo buf, RangeTblEntry *rte,
* 'foreignrel' is the RelOptInfo for the target relation or the join relation
* containing all base relations in the query
* 'targetlist' is the tlist of the underlying foreign-scan plan node
* (note that this only contains new-value expressions and junk attrs)
* 'targetAttrs' is the target columns of the UPDATE
* 'remote_conds' is the qual clauses that must be evaluated remotely
* '*params_list' is an output list of exprs that will become remote Params
Expand All @@ -1888,8 +1889,9 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
deparse_expr_cxt context;
int nestlevel;
bool first;
ListCell *lc;
RangeTblEntry *rte = planner_rt_fetch(rtindex, root);
ListCell *lc,
*lc2;

/* Set up context struct for recursion */
context.root = root;
Expand All @@ -1908,14 +1910,13 @@ deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
nestlevel = set_transmission_modes();

first = true;
foreach(lc, targetAttrs)
forboth(lc, targetlist, lc2, targetAttrs)
{
int attnum = lfirst_int(lc);
TargetEntry *tle = get_tle_by_resno(targetlist, attnum);
TargetEntry *tle = lfirst_node(TargetEntry, lc);
int attnum = lfirst_int(lc2);

if (!tle)
elog(ERROR, "attribute number %d not found in UPDATE targetlist",
attnum);
/* update's new-value expressions shouldn't be resjunk */
Assert(!tle->resjunk);

if (!first)
appendStringInfoString(buf, ", ");
Expand Down
423 changes: 193 additions & 230 deletions contrib/postgres_fdw/expected/postgres_fdw.out

Large diffs are not rendered by default.

129 changes: 87 additions & 42 deletions contrib/postgres_fdw/postgres_fdw.c
Expand Up @@ -27,6 +27,7 @@
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
Expand Down Expand Up @@ -345,7 +346,8 @@ static void postgresBeginForeignScan(ForeignScanState *node, int eflags);
static TupleTableSlot *postgresIterateForeignScan(ForeignScanState *node);
static void postgresReScanForeignScan(ForeignScanState *node);
static void postgresEndForeignScan(ForeignScanState *node);
static void postgresAddForeignUpdateTargets(Query *parsetree,
static void postgresAddForeignUpdateTargets(PlannerInfo *root,
Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
static List *postgresPlanForeignModify(PlannerInfo *root,
Expand Down Expand Up @@ -1669,36 +1671,27 @@ postgresEndForeignScan(ForeignScanState *node)
* Add resjunk column(s) needed for update/delete on a foreign table
*/
static void
postgresAddForeignUpdateTargets(Query *parsetree,
postgresAddForeignUpdateTargets(PlannerInfo *root,
Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation)
{
Var *var;
const char *attrname;
TargetEntry *tle;

/*
* In postgres_fdw, what we need is the ctid, same as for a regular table.
*/

/* Make a Var representing the desired value */
var = makeVar(parsetree->resultRelation,
var = makeVar(rtindex,
SelfItemPointerAttributeNumber,
TIDOID,
-1,
InvalidOid,
0);

/* Wrap it in a resjunk TLE with the right name ... */
attrname = "ctid";

tle = makeTargetEntry((Expr *) var,
list_length(parsetree->targetList) + 1,
pstrdup(attrname),
true);

/* ... and add it to the query's targetlist */
parsetree->targetList = lappend(parsetree->targetList, tle);
/* Register it as a row-identity column needed by this target rel */
add_row_identity_var(root, var, rtindex, "ctid");
}

/*
Expand Down Expand Up @@ -1886,7 +1879,7 @@ postgresBeginForeignModify(ModifyTableState *mtstate,
rte,
resultRelInfo,
mtstate->operation,
mtstate->mt_plans[subplan_index]->plan,
outerPlanState(mtstate)->plan,
query,
target_attrs,
values_end_len,
Expand Down Expand Up @@ -2086,8 +2079,7 @@ postgresBeginForeignInsert(ModifyTableState *mtstate,
*/
if (plan && plan->operation == CMD_UPDATE &&
(resultRelInfo->ri_usesFdwDirectModify ||
resultRelInfo->ri_FdwState) &&
resultRelInfo > mtstate->resultRelInfo + mtstate->mt_whichplan)
resultRelInfo->ri_FdwState))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot route tuples into foreign table to be updated \"%s\"",
Expand Down Expand Up @@ -2283,6 +2275,65 @@ postgresRecheckForeignScan(ForeignScanState *node, TupleTableSlot *slot)
return true;
}

/*
* find_modifytable_subplan
* Helper routine for postgresPlanDirectModify to find the
* ModifyTable subplan node that scans the specified RTI.
*
* Returns NULL if the subplan couldn't be identified. That's not a fatal
* error condition, we just abandon trying to do the update directly.
*/
static ForeignScan *
find_modifytable_subplan(PlannerInfo *root,
ModifyTable *plan,
Index rtindex,
int subplan_index)
{
Plan *subplan = outerPlan(plan);

/*
* The cases we support are (1) the desired ForeignScan is the immediate
* child of ModifyTable, or (2) it is the subplan_index'th child of an
* Append node that is the immediate child of ModifyTable. There is no
* point in looking further down, as that would mean that local joins are
* involved, so we can't do the update directly.
*
* There could be a Result atop the Append too, acting to compute the
* UPDATE targetlist values. We ignore that here; the tlist will be
* checked by our caller.
*
* In principle we could examine all the children of the Append, but it's
* currently unlikely that the core planner would generate such a plan
* with the children out-of-order. Moreover, such a search risks costing
* O(N^2) time when there are a lot of children.
*/
if (IsA(subplan, Append))
{
Append *appendplan = (Append *) subplan;

if (subplan_index < list_length(appendplan->appendplans))
subplan = (Plan *) list_nth(appendplan->appendplans, subplan_index);
}
else if (IsA(subplan, Result) && IsA(outerPlan(subplan), Append))
{
Append *appendplan = (Append *) outerPlan(subplan);

if (subplan_index < list_length(appendplan->appendplans))
subplan = (Plan *) list_nth(appendplan->appendplans, subplan_index);
}

/* Now, have we got a ForeignScan on the desired rel? */
if (IsA(subplan, ForeignScan))
{
ForeignScan *fscan = (ForeignScan *) subplan;

if (bms_is_member(rtindex, fscan->fs_relids))
return fscan;
}

return NULL;
}

/*
* postgresPlanDirectModify
* Consider a direct foreign table modification
Expand All @@ -2297,13 +2348,13 @@ postgresPlanDirectModify(PlannerInfo *root,
int subplan_index)
{
CmdType operation = plan->operation;
Plan *subplan;
RelOptInfo *foreignrel;
RangeTblEntry *rte;
PgFdwRelationInfo *fpinfo;
Relation rel;
StringInfoData sql;
ForeignScan *fscan;
List *processed_tlist = NIL;
List *targetAttrs = NIL;
List *remote_exprs;
List *params_list = NIL;
Expand All @@ -2321,19 +2372,17 @@ postgresPlanDirectModify(PlannerInfo *root,
return false;

/*
* It's unsafe to modify a foreign table directly if there are any local
* joins needed.
* Try to locate the ForeignScan subplan that's scanning resultRelation.
*/
subplan = (Plan *) list_nth(plan->plans, subplan_index);
if (!IsA(subplan, ForeignScan))
fscan = find_modifytable_subplan(root, plan, resultRelation, subplan_index);
if (!fscan)
return false;
fscan = (ForeignScan *) subplan;

/*
* It's unsafe to modify a foreign table directly if there are any quals
* that should be evaluated locally.
*/
if (subplan->qual != NIL)
if (fscan->scan.plan.qual != NIL)
return false;

/* Safe to fetch data about the target foreign rel */
Expand All @@ -2354,32 +2403,28 @@ postgresPlanDirectModify(PlannerInfo *root,
*/
if (operation == CMD_UPDATE)
{
int col;
ListCell *lc,
*lc2;

/*
* We transmit only columns that were explicitly targets of the
* UPDATE, so as to avoid unnecessary data transmission.
* The expressions of concern are the first N columns of the processed
* targetlist, where N is the length of the rel's update_colnos.
*/
col = -1;
while ((col = bms_next_member(rte->updatedCols, col)) >= 0)
get_translated_update_targetlist(root, resultRelation,
&processed_tlist, &targetAttrs);
forboth(lc, processed_tlist, lc2, targetAttrs)
{
/* bit numbers are offset by FirstLowInvalidHeapAttributeNumber */
AttrNumber attno = col + FirstLowInvalidHeapAttributeNumber;
TargetEntry *tle;
TargetEntry *tle = lfirst_node(TargetEntry, lc);
AttrNumber attno = lfirst_int(lc2);

/* update's new-value expressions shouldn't be resjunk */
Assert(!tle->resjunk);

if (attno <= InvalidAttrNumber) /* shouldn't happen */
elog(ERROR, "system-column update is not supported");

tle = get_tle_by_resno(subplan->targetlist, attno);

if (!tle)
elog(ERROR, "attribute number %d not found in subplan targetlist",
attno);

if (!is_foreign_expr(root, foreignrel, (Expr *) tle->expr))
return false;

targetAttrs = lappend_int(targetAttrs, attno);
}
}

Expand Down Expand Up @@ -2430,7 +2475,7 @@ postgresPlanDirectModify(PlannerInfo *root,
case CMD_UPDATE:
deparseDirectUpdateSql(&sql, root, resultRelation, rel,
foreignrel,
((Plan *) fscan)->targetlist,
processed_tlist,
targetAttrs,
remote_exprs, &params_list,
returningList, &retrieved_attrs);
Expand Down
3 changes: 1 addition & 2 deletions doc/src/sgml/ddl.sgml
Expand Up @@ -4823,8 +4823,7 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate &gt;= DATE '2008-01-01';
well, provided that typical queries allow the query planner to prune all
but a small number of partitions. Planning times become longer and memory
consumption becomes higher when more partitions remain after the planner
performs partition pruning. This is particularly true for the
<command>UPDATE</command> and <command>DELETE</command> commands. Another
performs partition pruning. Another
reason to be concerned about having a large number of partitions is that
the server's memory consumption may grow significantly over
time, especially if many sessions touch large numbers of partitions.
Expand Down
65 changes: 37 additions & 28 deletions doc/src/sgml/fdwhandler.sgml
Expand Up @@ -424,7 +424,8 @@ GetForeignUpperPaths(PlannerInfo *root,
<para>
<programlisting>
void
AddForeignUpdateTargets(Query *parsetree,
AddForeignUpdateTargets(PlannerInfo *root,
Index rtindex,
RangeTblEntry *target_rte,
Relation target_relation);
</programlisting>
Expand All @@ -440,27 +441,31 @@ AddForeignUpdateTargets(Query *parsetree,
</para>

<para>
To do that, add <structname>TargetEntry</structname> items to
<literal>parsetree-&gt;targetList</literal>, containing expressions for the
extra values to be fetched. Each such entry must be marked
<structfield>resjunk</structfield> = <literal>true</literal>, and must have a distinct
<structfield>resname</structfield> that will identify it at execution time.
Avoid using names matching <literal>ctid<replaceable>N</replaceable></literal>,
<literal>wholerow</literal>, or
<literal>wholerow<replaceable>N</replaceable></literal>, as the core system can
generate junk columns of these names.
If the extra expressions are more complex than simple Vars, they
must be run through <function>eval_const_expressions</function>
before adding them to the target list.
</para>

<para>
Although this function is called during planning, the
information provided is a bit different from that available to other
planning routines.
<literal>parsetree</literal> is the parse tree for the <command>UPDATE</command> or
<command>DELETE</command> command, while <literal>target_rte</literal> and
<literal>target_relation</literal> describe the target foreign table.
To do that, construct a <structname>Var</structname> representing
an extra value you need, and pass it
to <function>add_row_identity_var</function>, along with a name for
the junk column. (You can do this more than once if several columns
are needed.) You must choose a distinct junk column name for each
different <structname>Var</structname> you need, except
that <structname>Var</structname>s that are identical except for
the <structfield>varno</structfield> field can and should share a
column name.
The core system uses the junk column names
<literal>tableoid</literal> for a
table's <structfield>tableoid</structfield> column,
<literal>ctid</literal>
or <literal>ctid<replaceable>N</replaceable></literal>
for <structfield>ctid</structfield>,
<literal>wholerow</literal>
for a whole-row <structname>Var</structname> marked with
<structfield>vartype</structfield> = <type>RECORD</type>,
and <literal>wholerow<replaceable>N</replaceable></literal>
for a whole-row <structname>Var</structname> with
<structfield>vartype</structfield> equal to the table's declared rowtype.
Re-use these names when you can (the planner will combine duplicate
requests for identical junk columns). If you need another kind of
junk column besides these, it might be wise to choose a name prefixed
with your extension name, to avoid conflicts against other FDWs.
</para>

<para>
Expand Down Expand Up @@ -495,8 +500,8 @@ PlanForeignModify(PlannerInfo *root,
<literal>resultRelation</literal> identifies the target foreign table by its
range table index. <literal>subplan_index</literal> identifies which target of
the <structname>ModifyTable</structname> plan node this is, counting from zero;
use this if you want to index into <literal>plan-&gt;plans</literal> or other
substructure of the <literal>plan</literal> node.
use this if you want to index into per-target-relation substructures of the
<literal>plan</literal> node.
</para>

<para>
Expand Down Expand Up @@ -703,10 +708,14 @@ ExecForeignUpdate(EState *estate,
<literal>slot</literal> contains the new data for the tuple; it will match the
row-type definition of the foreign table.
<literal>planSlot</literal> contains the tuple that was generated by the
<structname>ModifyTable</structname> plan node's subplan; it differs from
<literal>slot</literal> in possibly containing additional <quote>junk</quote>
columns. In particular, any junk columns that were requested by
<function>AddForeignUpdateTargets</function> will be available from this slot.
<structname>ModifyTable</structname> plan node's subplan. Unlike
<literal>slot</literal>, this tuple contains only the new values for
columns changed by the query, so do not rely on attribute numbers of the
foreign table to index into <literal>planSlot</literal>.
Also, <literal>planSlot</literal> typically contains
additional <quote>junk</quote> columns. In particular, any junk columns
that were requested by <function>AddForeignUpdateTargets</function> will
be available from this slot.
</para>

<para>
Expand Down

0 comments on commit 86dc900

Please sign in to comment.