Skip to content

Commit

Permalink
Support "expanded" objects, particularly arrays, for better performance.
Browse files Browse the repository at this point in the history
This patch introduces the ability for complex datatypes to have an
in-memory representation that is different from their on-disk format.
On-disk formats are typically optimized for minimal size, and in any case
they can't contain pointers, so they are often not well-suited for
computation.  Now a datatype can invent an "expanded" in-memory format
that is better suited for its operations, and then pass that around among
the C functions that operate on the datatype.  There are also provisions
(rudimentary as yet) to allow an expanded object to be modified in-place
under suitable conditions, so that operations like assignment to an element
of an array need not involve copying the entire array.

The initial application for this feature is arrays, but it is not hard
to foresee using it for other container types like JSON, XML and hstore.
I have hopes that it will be useful to PostGIS as well.

In this initial implementation, a few heuristics have been hard-wired
into plpgsql to improve performance for arrays that are stored in
plpgsql variables.  We would like to generalize those hacks so that
other datatypes can obtain similar improvements, but figuring out some
appropriate APIs is left as a task for future work.  (The heuristics
themselves are probably not optimal yet, either, as they sometimes
force expansion of arrays that would be better left alone.)

Preliminary performance testing shows impressive speed gains for plpgsql
functions that do element-by-element access or update of large arrays.
There are other cases that get a little slower, as a result of added array
format conversions; but we can hope to improve anything that's annoyingly
bad.  In any case most applications should see a net win.

Tom Lane, reviewed by Andres Freund
  • Loading branch information
tglsfdc committed May 14, 2015
1 parent 8a2e1ed commit 1dc5ebc
Show file tree
Hide file tree
Showing 27 changed files with 2,362 additions and 526 deletions.
42 changes: 40 additions & 2 deletions doc/src/sgml/storage.sgml
Expand Up @@ -503,8 +503,9 @@ comparison table, in which all the HTML pages were cut down to 7 kB to fit.
<acronym>TOAST</> pointers can point to data that is not on disk, but is
elsewhere in the memory of the current server process. Such pointers
obviously cannot be long-lived, but they are nonetheless useful. There
is currently just one sub-case:
pointers to <firstterm>indirect</> data.
are currently two sub-cases:
pointers to <firstterm>indirect</> data and
pointers to <firstterm>expanded</> data.
</para>

<para>
Expand All @@ -518,6 +519,43 @@ that the referenced data survives for as long as the pointer could exist,
and there is no infrastructure to help with this.
</para>

<para>
Expanded <acronym>TOAST</> pointers are useful for complex data types
whose on-disk representation is not especially suited for computational
purposes. As an example, the standard varlena representation of a
<productname>PostgreSQL</> array includes dimensionality information, a
nulls bitmap if there are any null elements, then the values of all the
elements in order. When the element type itself is variable-length, the
only way to find the <replaceable>N</>'th element is to scan through all the
preceding elements. This representation is appropriate for on-disk storage
because of its compactness, but for computations with the array it's much
nicer to have an <quote>expanded</> or <quote>deconstructed</>
representation in which all the element starting locations have been
identified. The <acronym>TOAST</> pointer mechanism supports this need by
allowing a pass-by-reference Datum to point to either a standard varlena
value (the on-disk representation) or a <acronym>TOAST</> pointer that
points to an expanded representation somewhere in memory. The details of
this expanded representation are up to the data type, though it must have
a standard header and meet the other API requirements given
in <filename>src/include/utils/expandeddatum.h</>. C-level functions
working with the data type can choose to handle either representation.
Functions that do not know about the expanded representation, but simply
apply <function>PG_DETOAST_DATUM</> to their inputs, will automatically
receive the traditional varlena representation; so support for an expanded
representation can be introduced incrementally, one function at a time.
</para>

<para>
<acronym>TOAST</> pointers to expanded values are further broken down
into <firstterm>read-write</> and <firstterm>read-only</> pointers.
The pointed-to representation is the same either way, but a function that
receives a read-write pointer is allowed to modify the referenced value
in-place, whereas one that receives a read-only pointer must not; it must
first create a copy if it wants to make a modified version of the value.
This distinction and some associated conventions make it possible to avoid
unnecessary copying of expanded values during query execution.
</para>

<para>
For all types of in-memory <acronym>TOAST</> pointer, the <acronym>TOAST</>
management code ensures that no such pointer datum can accidentally get
Expand Down
71 changes: 71 additions & 0 deletions doc/src/sgml/xtypes.sgml
Expand Up @@ -300,6 +300,77 @@ CREATE TYPE complex (
</para>
</note>

<para>
Another feature that's enabled by <acronym>TOAST</> support is the
possibility of having an <firstterm>expanded</> in-memory data
representation that is more convenient to work with than the format that
is stored on disk. The regular or <quote>flat</> varlena storage format
is ultimately just a blob of bytes; it cannot for example contain
pointers, since it may get copied to other locations in memory.
For complex data types, the flat format may be quite expensive to work
with, so <productname>PostgreSQL</> provides a way to <quote>expand</>
the flat format into a representation that is more suited to computation,
and then pass that format in-memory between functions of the data type.
</para>

<para>
To use expanded storage, a data type must define an expanded format that
follows the rules given in <filename>src/include/utils/expandeddatum.h</>,
and provide functions to <quote>expand</> a flat varlena value into
expanded format and <quote>flatten</> the expanded format back to the
regular varlena representation. Then ensure that all C functions for
the data type can accept either representation, possibly by converting
one into the other immediately upon receipt. This does not require fixing
all existing functions for the data type at once, because the standard
<function>PG_DETOAST_DATUM</> macro is defined to convert expanded inputs
into regular flat format. Therefore, existing functions that work with
the flat varlena format will continue to work, though slightly
inefficiently, with expanded inputs; they need not be converted until and
unless better performance is important.
</para>

<para>
C functions that know how to work with an expanded representation
typically fall into two categories: those that can only handle expanded
format, and those that can handle either expanded or flat varlena inputs.
The former are easier to write but may be less efficient overall, because
converting a flat input to expanded form for use by a single function may
cost more than is saved by operating on the expanded format.
When only expanded format need be handled, conversion of flat inputs to
expanded form can be hidden inside an argument-fetching macro, so that
the function appears no more complex than one working with traditional
varlena input.
To handle both types of input, write an argument-fetching function that
will detoast external, short-header, and compressed varlena inputs, but
not expanded inputs. Such a function can be defined as returning a
pointer to a union of the flat varlena format and the expanded format.
Callers can use the <function>VARATT_IS_EXPANDED_HEADER()</> macro to
determine which format they received.
</para>

<para>
The <acronym>TOAST</> infrastructure not only allows regular varlena
values to be distinguished from expanded values, but also
distinguishes <quote>read-write</> and <quote>read-only</> pointers to
expanded values. C functions that only need to examine an expanded
value, or will only change it in safe and non-semantically-visible ways,
need not care which type of pointer they receive. C functions that
produce a modified version of an input value are allowed to modify an
expanded input value in-place if they receive a read-write pointer, but
must not modify the input if they receive a read-only pointer; in that
case they have to copy the value first, producing a new value to modify.
A C function that has constructed a new expanded value should always
return a read-write pointer to it. Also, a C function that is modifying
a read-write expanded value in-place should take care to leave the value
in a sane state if it fails partway through.
</para>

<para>
For examples of working with expanded values, see the standard array
infrastructure, particularly
<filename>src/backend/utils/adt/array_expanded.c</>.
</para>

</sect2>

</sect1>
45 changes: 37 additions & 8 deletions src/backend/access/common/heaptuple.c
Expand Up @@ -60,6 +60,7 @@
#include "access/sysattr.h"
#include "access/tuptoaster.h"
#include "executor/tuptable.h"
#include "utils/expandeddatum.h"


/* Does att's datatype allow packing into the 1-byte-header varlena format? */
Expand Down Expand Up @@ -93,13 +94,15 @@ heap_compute_data_size(TupleDesc tupleDesc,
for (i = 0; i < numberOfAttributes; i++)
{
Datum val;
Form_pg_attribute atti;

if (isnull[i])
continue;

val = values[i];
atti = att[i];

if (ATT_IS_PACKABLE(att[i]) &&
if (ATT_IS_PACKABLE(atti) &&
VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
{
/*
Expand All @@ -108,11 +111,21 @@ heap_compute_data_size(TupleDesc tupleDesc,
*/
data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
}
else if (atti->attlen == -1 &&
VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
{
/*
* we want to flatten the expanded value so that the constructed
* tuple doesn't depend on it
*/
data_length = att_align_nominal(data_length, atti->attalign);
data_length += EOH_get_flat_size(DatumGetEOHP(val));
}
else
{
data_length = att_align_datum(data_length, att[i]->attalign,
att[i]->attlen, val);
data_length = att_addlength_datum(data_length, att[i]->attlen,
data_length = att_align_datum(data_length, atti->attalign,
atti->attlen, val);
data_length = att_addlength_datum(data_length, atti->attlen,
val);
}
}
Expand Down Expand Up @@ -203,10 +216,26 @@ heap_fill_tuple(TupleDesc tupleDesc,
*infomask |= HEAP_HASVARWIDTH;
if (VARATT_IS_EXTERNAL(val))
{
*infomask |= HEAP_HASEXTERNAL;
/* no alignment, since it's short by definition */
data_length = VARSIZE_EXTERNAL(val);
memcpy(data, val, data_length);
if (VARATT_IS_EXTERNAL_EXPANDED(val))
{
/*
* we want to flatten the expanded value so that the
* constructed tuple doesn't depend on it
*/
ExpandedObjectHeader *eoh = DatumGetEOHP(values[i]);

data = (char *) att_align_nominal(data,
att[i]->attalign);
data_length = EOH_get_flat_size(eoh);
EOH_flatten_into(eoh, data, data_length);
}
else
{
*infomask |= HEAP_HASEXTERNAL;
/* no alignment, since it's short by definition */
data_length = VARSIZE_EXTERNAL(val);
memcpy(data, val, data_length);
}
}
else if (VARATT_IS_SHORT(val))
{
Expand Down
36 changes: 36 additions & 0 deletions src/backend/access/heap/tuptoaster.c
Expand Up @@ -37,6 +37,7 @@
#include "catalog/catalog.h"
#include "common/pg_lzcompress.h"
#include "miscadmin.h"
#include "utils/expandeddatum.h"
#include "utils/fmgroids.h"
#include "utils/rel.h"
#include "utils/typcache.h"
Expand Down Expand Up @@ -130,6 +131,19 @@ heap_tuple_fetch_attr(struct varlena * attr)
result = (struct varlena *) palloc(VARSIZE_ANY(attr));
memcpy(result, attr, VARSIZE_ANY(attr));
}
else if (VARATT_IS_EXTERNAL_EXPANDED(attr))
{
/*
* This is an expanded-object pointer --- get flat format
*/
ExpandedObjectHeader *eoh;
Size resultsize;

eoh = DatumGetEOHP(PointerGetDatum(attr));
resultsize = EOH_get_flat_size(eoh);
result = (struct varlena *) palloc(resultsize);
EOH_flatten_into(eoh, (void *) result, resultsize);
}
else
{
/*
Expand Down Expand Up @@ -196,6 +210,15 @@ heap_tuple_untoast_attr(struct varlena * attr)
attr = result;
}
}
else if (VARATT_IS_EXTERNAL_EXPANDED(attr))
{
/*
* This is an expanded-object pointer --- get flat format
*/
attr = heap_tuple_fetch_attr(attr);
/* flatteners are not allowed to produce compressed/short output */
Assert(!VARATT_IS_EXTENDED(attr));
}
else if (VARATT_IS_COMPRESSED(attr))
{
/*
Expand Down Expand Up @@ -263,6 +286,11 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
return heap_tuple_untoast_attr_slice(redirect.pointer,
sliceoffset, slicelength);
}
else if (VARATT_IS_EXTERNAL_EXPANDED(attr))
{
/* pass it off to heap_tuple_fetch_attr to flatten */
preslice = heap_tuple_fetch_attr(attr);
}
else
preslice = attr;

Expand Down Expand Up @@ -344,6 +372,10 @@ toast_raw_datum_size(Datum value)

return toast_raw_datum_size(PointerGetDatum(toast_pointer.pointer));
}
else if (VARATT_IS_EXTERNAL_EXPANDED(attr))
{
result = EOH_get_flat_size(DatumGetEOHP(value));
}
else if (VARATT_IS_COMPRESSED(attr))
{
/* here, va_rawsize is just the payload size */
Expand Down Expand Up @@ -400,6 +432,10 @@ toast_datum_size(Datum value)

return toast_datum_size(PointerGetDatum(toast_pointer.pointer));
}
else if (VARATT_IS_EXTERNAL_EXPANDED(attr))
{
result = EOH_get_flat_size(DatumGetEOHP(value));
}
else if (VARATT_IS_SHORT(attr))
{
result = VARSIZE_SHORT(attr);
Expand Down
12 changes: 4 additions & 8 deletions src/backend/executor/execQual.c
Expand Up @@ -4248,7 +4248,6 @@ ExecEvalArrayCoerceExpr(ArrayCoerceExprState *astate,
{
ArrayCoerceExpr *acoerce = (ArrayCoerceExpr *) astate->xprstate.expr;
Datum result;
ArrayType *array;
FunctionCallInfoData locfcinfo;

result = ExecEvalExpr(astate->arg, econtext, isNull, isDone);
Expand All @@ -4265,14 +4264,12 @@ ExecEvalArrayCoerceExpr(ArrayCoerceExprState *astate,
if (!OidIsValid(acoerce->elemfuncid))
{
/* Detoast input array if necessary, and copy in any case */
array = DatumGetArrayTypePCopy(result);
ArrayType *array = DatumGetArrayTypePCopy(result);

ARR_ELEMTYPE(array) = astate->resultelemtype;
PG_RETURN_ARRAYTYPE_P(array);
}

/* Detoast input array if necessary, but don't make a useless copy */
array = DatumGetArrayTypeP(result);

/* Initialize function cache if first time through */
if (astate->elemfunc.fn_oid == InvalidOid)
{
Expand Down Expand Up @@ -4302,15 +4299,14 @@ ExecEvalArrayCoerceExpr(ArrayCoerceExprState *astate,
*/
InitFunctionCallInfoData(locfcinfo, &(astate->elemfunc), 3,
InvalidOid, NULL, NULL);
locfcinfo.arg[0] = PointerGetDatum(array);
locfcinfo.arg[0] = result;
locfcinfo.arg[1] = Int32GetDatum(acoerce->resulttypmod);
locfcinfo.arg[2] = BoolGetDatum(acoerce->isExplicit);
locfcinfo.argnull[0] = false;
locfcinfo.argnull[1] = false;
locfcinfo.argnull[2] = false;

return array_map(&locfcinfo, ARR_ELEMTYPE(array), astate->resultelemtype,
astate->amstate);
return array_map(&locfcinfo, astate->resultelemtype, astate->amstate);
}

/* ----------------------------------------------------------------
Expand Down
47 changes: 47 additions & 0 deletions src/backend/executor/execTuples.c
Expand Up @@ -88,6 +88,7 @@
#include "nodes/nodeFuncs.h"
#include "storage/bufmgr.h"
#include "utils/builtins.h"
#include "utils/expandeddatum.h"
#include "utils/lsyscache.h"
#include "utils/typcache.h"

Expand Down Expand Up @@ -812,6 +813,52 @@ ExecCopySlot(TupleTableSlot *dstslot, TupleTableSlot *srcslot)
return ExecStoreTuple(newTuple, dstslot, InvalidBuffer, true);
}

/* --------------------------------
* ExecMakeSlotContentsReadOnly
* Mark any R/W expanded datums in the slot as read-only.
*
* This is needed when a slot that might contain R/W datum references is to be
* used as input for general expression evaluation. Since the expression(s)
* might contain more than one Var referencing the same R/W datum, we could
* get wrong answers if functions acting on those Vars thought they could
* modify the expanded value in-place.
*
* For notational reasons, we return the same slot passed in.
* --------------------------------
*/
TupleTableSlot *
ExecMakeSlotContentsReadOnly(TupleTableSlot *slot)
{
/*
* sanity checks
*/
Assert(slot != NULL);
Assert(slot->tts_tupleDescriptor != NULL);
Assert(!slot->tts_isempty);

/*
* If the slot contains a physical tuple, it can't contain any expanded
* datums, because we flatten those when making a physical tuple. This
* might change later; but for now, we need do nothing unless the slot is
* virtual.
*/
if (slot->tts_tuple == NULL)
{
Form_pg_attribute *att = slot->tts_tupleDescriptor->attrs;
int attnum;

for (attnum = 0; attnum < slot->tts_nvalid; attnum++)
{
slot->tts_values[attnum] =
MakeExpandedObjectReadOnly(slot->tts_values[attnum],
slot->tts_isnull[attnum],
att[attnum]->attlen);
}
}

return slot;
}


/* ----------------------------------------------------------------
* convenience initialization routines
Expand Down

0 comments on commit 1dc5ebc

Please sign in to comment.