Skip to content

Commit

Permalink
Recognize functional dependency on primary keys. This allows a table's
Browse files Browse the repository at this point in the history
other columns to be referenced without listing them in GROUP BY, so long as
the primary key column(s) are listed in GROUP BY.

Eventually we should also allow functional dependency on a UNIQUE constraint
when the columns are marked NOT NULL, but that has to wait until NOT NULL
constraints are represented in pg_constraint, because we need to have
pg_constraint OIDs for all the conditions needed to ensure functional
dependency.

Peter Eisentraut, reviewed by Alex Hunsaker and Tom Lane
  • Loading branch information
tglsfdc committed Aug 7, 2010
1 parent ce6ce1a commit e49ae8d
Show file tree
Hide file tree
Showing 16 changed files with 684 additions and 35 deletions.
19 changes: 14 additions & 5 deletions doc/src/sgml/queries.sgml
@@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/queries.sgml,v 1.58 2010/07/03 02:57:46 rhaas Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/queries.sgml,v 1.59 2010/08/07 02:44:05 tgl Exp $ -->

<chapter id="queries">
<title>Queries</title>
Expand Down Expand Up @@ -886,17 +886,26 @@ SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
In this example, the columns <literal>product_id</literal>,
<literal>p.name</literal>, and <literal>p.price</literal> must be
in the <literal>GROUP BY</> clause since they are referenced in
the query select list. (Depending on how the products
table is set up, name and price might be fully dependent on the
product ID, so the additional groupings could theoretically be
unnecessary, though this is not implemented.) The column
the query select list (but see below). The column
<literal>s.units</> does not have to be in the <literal>GROUP
BY</> list since it is only used in an aggregate expression
(<literal>sum(...)</literal>), which represents the sales
of a product. For each product, the query returns a summary row about
all sales of the product.
</para>

<para>
If the products table is set up so that,
say, <literal>product_id</literal> is the primary key, then it
would be enough to group by <literal>product_id</literal> in the
above example, since name and price would
be <firstterm>functionally
dependent</firstterm><indexterm><primary>functional
dependency</primary></indexterm> on the product ID, and so there
would be no ambiguity about which name and price value to return
for each product ID group.
</para>

<para>
In strict SQL, <literal>GROUP BY</> can only group by columns of
the source table but <productname>PostgreSQL</productname> extends
Expand Down
23 changes: 19 additions & 4 deletions doc/src/sgml/ref/select.sgml
@@ -1,5 +1,5 @@
<!--
$PostgreSQL: pgsql/doc/src/sgml/ref/select.sgml,v 1.131 2010/06/15 20:04:53 momjian Exp $
$PostgreSQL: pgsql/doc/src/sgml/ref/select.sgml,v 1.132 2010/08/07 02:44:05 tgl Exp $
PostgreSQL documentation
-->

Expand Down Expand Up @@ -520,9 +520,12 @@ GROUP BY <replaceable class="parameter">expression</replaceable> [, ...]
produces a single value computed across all the selected rows).
When <literal>GROUP BY</literal> is present, it is not valid for
the <command>SELECT</command> list expressions to refer to
ungrouped columns except within aggregate functions, since there
would be more than one possible value to return for an ungrouped
column.
ungrouped columns except within aggregate functions or if the
ungrouped column is functionally dependent on the grouped columns,
since there would otherwise be more than one possible value to
return for an ungrouped column. A functional dependency exists if
the grouped columns (or a subset thereof) are the primary key of
the table containing the ungrouped column.
</para>
</refsect2>

Expand Down Expand Up @@ -1590,6 +1593,18 @@ SELECT distributors.* WHERE distributors.name = 'Westward';
</para>
</refsect2>

<refsect2>
<title>Functional Dependencies</title>

<para>
<productname>PostgreSQL</productname> recognizes functional dependency
(allowing columns to be omitted from <literal>GROUP BY</>) only when
a table's primary key is included in the <literal>GROUP BY</> list.
The SQL standard specifies additional conditions that should be
recognized.
</para>
</refsect2>

<refsect2>
<title><literal>WINDOW</literal> Clause Restrictions</title>

Expand Down
17 changes: 13 additions & 4 deletions src/backend/catalog/dependency.c
Expand Up @@ -8,7 +8,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/catalog/dependency.c,v 1.96 2010/02/26 02:00:36 momjian Exp $
* $PostgreSQL: pgsql/src/backend/catalog/dependency.c,v 1.97 2010/08/07 02:44:06 tgl Exp $
*
*-------------------------------------------------------------------------
*/
Expand Down Expand Up @@ -1594,7 +1594,7 @@ find_expr_references_walker(Node *node,
{
/* Recurse into RTE subquery or not-yet-planned sublink subquery */
Query *query = (Query *) node;
ListCell *rtable;
ListCell *lc;
bool result;

/*
Expand All @@ -1604,9 +1604,9 @@ find_expr_references_walker(Node *node,
* of recursing into RTE_FUNCTION RTEs, subqueries, etc, so no need to
* do that here. But keep it from looking at join alias lists.)
*/
foreach(rtable, query->rtable)
foreach(lc, query->rtable)
{
RangeTblEntry *rte = (RangeTblEntry *) lfirst(rtable);
RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc);
ListCell *ct;

switch (rte->rtekind)
Expand All @@ -1627,6 +1627,15 @@ find_expr_references_walker(Node *node,
}
}

/*
* Add dependencies on constraints listed in query's constraintDeps
*/
foreach(lc, query->constraintDeps)
{
add_object_address(OCLASS_CONSTRAINT, lfirst_oid(lc), 0,
context->addrs);
}

/* query_tree_walker ignores ORDER BY etc, but we need those opers */
find_expr_references_walker((Node *) query->sortClause, context);
find_expr_references_walker((Node *) query->groupClause, context);
Expand Down
111 changes: 110 additions & 1 deletion src/backend/catalog/pg_constraint.c
Expand Up @@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/catalog/pg_constraint.c,v 1.54 2010/08/05 15:25:35 rhaas Exp $
* $PostgreSQL: pgsql/src/backend/catalog/pg_constraint.c,v 1.55 2010/08/07 02:44:06 tgl Exp $
*
*-------------------------------------------------------------------------
*/
Expand Down Expand Up @@ -783,3 +783,112 @@ get_constraint_oid(Oid relid, const char *conname, bool missing_ok)

return conOid;
}

/*
* Determine whether a relation can be proven functionally dependent on
* a set of grouping columns. If so, return TRUE and add the pg_constraint
* OIDs of the constraints needed for the proof to the *constraintDeps list.
*
* grouping_columns is a list of grouping expressions, in which columns of
* the rel of interest are Vars with the indicated varno/varlevelsup.
*
* Currently we only check to see if the rel has a primary key that is a
* subset of the grouping_columns. We could also use plain unique constraints
* if all their columns are known not null, but there's a problem: we need
* to be able to represent the not-null-ness as part of the constraints added
* to *constraintDeps. FIXME whenever not-null constraints get represented
* in pg_constraint.
*/
bool
check_functional_grouping(Oid relid,
Index varno, Index varlevelsup,
List *grouping_columns,
List **constraintDeps)
{
bool result = false;
Relation pg_constraint;
HeapTuple tuple;
SysScanDesc scan;
ScanKeyData skey[1];

/* Scan pg_constraint for constraints of the target rel */
pg_constraint = heap_open(ConstraintRelationId, AccessShareLock);

ScanKeyInit(&skey[0],
Anum_pg_constraint_conrelid,
BTEqualStrategyNumber, F_OIDEQ,
ObjectIdGetDatum(relid));

scan = systable_beginscan(pg_constraint, ConstraintRelidIndexId, true,
SnapshotNow, 1, skey);

while (HeapTupleIsValid(tuple = systable_getnext(scan)))
{
Form_pg_constraint con = (Form_pg_constraint) GETSTRUCT(tuple);
Datum adatum;
bool isNull;
ArrayType *arr;
int16 *attnums;
int numkeys;
int i;
bool found_col;

/* Only PK constraints are of interest for now, see comment above */
if (con->contype != CONSTRAINT_PRIMARY)
continue;

/* Extract the conkey array, ie, attnums of PK's columns */
adatum = heap_getattr(tuple, Anum_pg_constraint_conkey,
RelationGetDescr(pg_constraint), &isNull);
if (isNull)
elog(ERROR, "null conkey for constraint %u",
HeapTupleGetOid(tuple));
arr = DatumGetArrayTypeP(adatum); /* ensure not toasted */
numkeys = ARR_DIMS(arr)[0];
if (ARR_NDIM(arr) != 1 ||
numkeys < 0 ||
ARR_HASNULL(arr) ||
ARR_ELEMTYPE(arr) != INT2OID)
elog(ERROR, "conkey is not a 1-D smallint array");
attnums = (int16 *) ARR_DATA_PTR(arr);

found_col = false;
for (i = 0; i < numkeys; i++)
{
AttrNumber attnum = attnums[i];
ListCell *gl;

found_col = false;
foreach(gl, grouping_columns)
{
Var *gvar = (Var *) lfirst(gl);

if (IsA(gvar, Var) &&
gvar->varno == varno &&
gvar->varlevelsup == varlevelsup &&
gvar->varattno == attnum)
{
found_col = true;
break;
}
}
if (!found_col)
break;
}

if (found_col)
{
/* The PK is a subset of grouping_columns, so we win */
*constraintDeps = lappend_oid(*constraintDeps,
HeapTupleGetOid(tuple));
result = true;
break;
}
}

systable_endscan(scan);

heap_close(pg_constraint, AccessShareLock);

return result;
}
3 changes: 2 additions & 1 deletion src/backend/nodes/copyfuncs.c
Expand Up @@ -15,7 +15,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/nodes/copyfuncs.c,v 1.466 2010/07/25 23:21:21 rhaas Exp $
* $PostgreSQL: pgsql/src/backend/nodes/copyfuncs.c,v 1.467 2010/08/07 02:44:06 tgl Exp $
*
*-------------------------------------------------------------------------
*/
Expand Down Expand Up @@ -2272,6 +2272,7 @@ _copyQuery(Query *from)
COPY_NODE_FIELD(limitCount);
COPY_NODE_FIELD(rowMarks);
COPY_NODE_FIELD(setOperations);
COPY_NODE_FIELD(constraintDeps);

return newnode;
}
Expand Down
3 changes: 2 additions & 1 deletion src/backend/nodes/equalfuncs.c
Expand Up @@ -22,7 +22,7 @@
* Portions Copyright (c) 1994, Regents of the University of California
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/nodes/equalfuncs.c,v 1.386 2010/07/25 23:21:21 rhaas Exp $
* $PostgreSQL: pgsql/src/backend/nodes/equalfuncs.c,v 1.387 2010/08/07 02:44:06 tgl Exp $
*
*-------------------------------------------------------------------------
*/
Expand Down Expand Up @@ -877,6 +877,7 @@ _equalQuery(Query *a, Query *b)
COMPARE_NODE_FIELD(limitCount);
COMPARE_NODE_FIELD(rowMarks);
COMPARE_NODE_FIELD(setOperations);
COMPARE_NODE_FIELD(constraintDeps);

return true;
}
Expand Down
3 changes: 2 additions & 1 deletion src/backend/nodes/outfuncs.c
Expand Up @@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/nodes/outfuncs.c,v 1.387 2010/07/25 23:21:21 rhaas Exp $
* $PostgreSQL: pgsql/src/backend/nodes/outfuncs.c,v 1.388 2010/08/07 02:44:06 tgl Exp $
*
* NOTES
* Every node type that can appear in stored rules' parsetrees *must*
Expand Down Expand Up @@ -2020,6 +2020,7 @@ _outQuery(StringInfo str, Query *node)
WRITE_NODE_FIELD(limitCount);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(setOperations);
WRITE_NODE_FIELD(constraintDeps);
}

static void
Expand Down
3 changes: 2 additions & 1 deletion src/backend/nodes/readfuncs.c
Expand Up @@ -8,7 +8,7 @@
*
*
* IDENTIFICATION
* $PostgreSQL: pgsql/src/backend/nodes/readfuncs.c,v 1.232 2010/02/16 22:34:43 tgl Exp $
* $PostgreSQL: pgsql/src/backend/nodes/readfuncs.c,v 1.233 2010/08/07 02:44:07 tgl Exp $
*
* NOTES
* Path and Plan nodes do not have any readfuncs support, because we
Expand Down Expand Up @@ -218,6 +218,7 @@ _readQuery(void)
READ_NODE_FIELD(limitCount);
READ_NODE_FIELD(rowMarks);
READ_NODE_FIELD(setOperations);
READ_NODE_FIELD(constraintDeps);

READ_DONE();
}
Expand Down

0 comments on commit e49ae8d

Please sign in to comment.