Skip to content

Commit

Permalink
Rethink definition of pg_attribute.attcompression.
Browse files Browse the repository at this point in the history
Redefine '\0' (InvalidCompressionMethod) as meaning "if we need to
compress, use the current setting of default_toast_compression".
This allows '\0' to be a suitable default choice regardless of
datatype, greatly simplifying code paths that initialize tupledescs
and the like.  It seems like a more user-friendly approach as well,
because now the default compression choice doesn't migrate into table
definitions, meaning that changing default_toast_compression is
usually sufficient to flip an installation's behavior; one needn't
tediously issue per-column ALTER SET COMPRESSION commands.

Along the way, fix a few minor bugs and documentation issues
with the per-column-compression feature.  Adopt more robust
APIs for SetIndexStorageProperties and GetAttributeCompression.

Bump catversion because typical contents of attcompression will now
be different.  We could get away without doing that, but it seems
better to ensure v14 installations all agree on this.  (We already
forced initdb for beta2, anyway.)

Discussion: https://postgr.es/m/626613.1621787110@sss.pgh.pa.us
  • Loading branch information
tglsfdc committed May 27, 2021
1 parent a717e5c commit e6241d8
Show file tree
Hide file tree
Showing 29 changed files with 257 additions and 380 deletions.
12 changes: 8 additions & 4 deletions doc/src/sgml/catalogs.sgml
Original file line number Diff line number Diff line change
Expand Up @@ -1261,10 +1261,14 @@
<structfield>attcompression</structfield> <type>char</type>
</para>
<para>
The current compression method of the column. If it is an invalid
compression method (<literal>'\0'</literal>) then column data will not
be compressed. Otherwise, <literal>'p'</literal> = pglz compression or
<literal>'l'</literal> = <productname>LZ4</productname> compression.
The current compression method of the column. Typically this is
<literal>'\0'</literal> to specify use of the current default setting
(see <xref linkend="guc-default-toast-compression"/>). Otherwise,
<literal>'p'</literal> selects pglz compression, while
<literal>'l'</literal> selects <productname>LZ4</productname>
compression. However, this field is ignored
whenever <structfield>attstorage</structfield> does not allow
compression.
</para></entry>
</row>

Expand Down
15 changes: 8 additions & 7 deletions doc/src/sgml/config.sgml
Original file line number Diff line number Diff line change
Expand Up @@ -8256,13 +8256,14 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
<para>
This variable sets the default
<link linkend="storage-toast">TOAST</link>
compression method for columns of newly-created tables. The
<command>CREATE TABLE</command> statement can override this default
by specifying the <literal>COMPRESSION</literal> column option.

The supported compression methods are <literal>pglz</literal> and,
if <productname>PostgreSQL</productname> was compiled with
<literal>--with-lz4</literal>, <literal>lz4</literal>.
compression method for values of compressible columns.
(This can be overridden for individual columns by setting
the <literal>COMPRESSION</literal> column option in
<command>CREATE TABLE</command> or
<command>ALTER TABLE</command>.)
The supported compression methods are <literal>pglz</literal> and
(if <productname>PostgreSQL</productname> was compiled with
<option>--with-lz4</option>) <literal>lz4</literal>.
The default is <literal>pglz</literal>.
</para>
</listitem>
Expand Down
4 changes: 2 additions & 2 deletions doc/src/sgml/func.sgml
Original file line number Diff line number Diff line change
Expand Up @@ -26253,10 +26253,10 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
<primary>pg_column_compression</primary>
</indexterm>
<function>pg_column_compression</function> ( <type>"any"</type> )
<returnvalue>integer</returnvalue>
<returnvalue>text</returnvalue>
</para>
<para>
Shows the compression algorithm that was used to compress a
Shows the compression algorithm that was used to compress
an individual variable-length value. Returns <literal>NULL</literal>
if the value is not compressed.
</para></entry>
Expand Down
30 changes: 16 additions & 14 deletions doc/src/sgml/ref/alter_table.sgml
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,6 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( <replaceable>sequence_options</replaceable> ) ] |
UNIQUE <replaceable class="parameter">index_parameters</replaceable> |
PRIMARY KEY <replaceable class="parameter">index_parameters</replaceable> |
COMPRESSION <replaceable class="parameter">compression_method</replaceable> |
REFERENCES <replaceable class="parameter">reftable</replaceable> [ ( <replaceable class="parameter">refcolumn</replaceable> ) ] [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ]
[ ON DELETE <replaceable class="parameter">referential_action</replaceable> ] [ ON UPDATE <replaceable class="parameter">referential_action</replaceable> ] }
[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
Expand Down Expand Up @@ -391,24 +390,27 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
</term>
<listitem>
<para>
This sets the compression method to be used for data inserted into a column.

This form sets the compression method for a column, determining how
values inserted in future will be compressed (if the storage mode
permits compression at all).
This does not cause the table to be rewritten, so existing data may still
be compressed with other compression methods. If the table is rewritten with
<command>VACUUM FULL</command> or <command>CLUSTER</command>, or restored
with <application>pg_restore</application>, then all tuples are rewritten
with the configured compression methods.

Also, note that when data is inserted from another relation (for example,
by <command>INSERT ... SELECT</command>), tuples from the source data are
not necessarily detoasted, and any previously compressed data is retained
with its existing compression method, rather than recompressing with the
compression methods of the target columns.

with <application>pg_restore</application>, then all values are rewritten
with the configured compression method.
However, when data is inserted from another relation (for example,
by <command>INSERT ... SELECT</command>), values from the source table are
not necessarily detoasted, so any previously compressed data may retain
its existing compression method, rather than being recompressed with the
compression method of the target column.
The supported compression
methods are <literal>pglz</literal> and <literal>lz4</literal>.
<literal>lz4</literal> is available only if <literal>--with-lz4</literal>
was used when building <productname>PostgreSQL</productname>.
(<literal>lz4</literal> is available only if <option>--with-lz4</option>
was used when building <productname>PostgreSQL</productname>.) In
addition, <replaceable class="parameter">compression_method</replaceable>
can be <literal>default</literal>, which selects the default behavior of
consulting the <xref linkend="guc-default-toast-compression"/> setting
at the time of data insertion to determine the method to use.
</para>
</listitem>
</varlistentry>
Expand Down
25 changes: 15 additions & 10 deletions doc/src/sgml/ref/create_table.sgml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] <replaceable class="parameter">table_name</replaceable> ( [
{ <replaceable class="parameter">column_name</replaceable> <replaceable class="parameter">data_type</replaceable> [ COLLATE <replaceable>collation</replaceable> ] [ COMPRESSION <replaceable>compression_method</replaceable> ] [ <replaceable class="parameter">column_constraint</replaceable> [ ... ] ]
{ <replaceable class="parameter">column_name</replaceable> <replaceable class="parameter">data_type</replaceable> [ COMPRESSION <replaceable>compression_method</replaceable> ] [ COLLATE <replaceable>collation</replaceable> ] [ <replaceable class="parameter">column_constraint</replaceable> [ ... ] ]
| <replaceable>table_constraint</replaceable>
| LIKE <replaceable>source_table</replaceable> [ <replaceable>like_option</replaceable> ... ] }
[, ... ]
Expand Down Expand Up @@ -293,17 +293,22 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
<listitem>
<para>
The <literal>COMPRESSION</literal> clause sets the compression method
for a column. Compression is supported only for variable-width data
types, and is used only for columns whose storage type is main or
extended. (See <xref linkend="sql-altertable"/> for information on
column storage types.) Setting this property for a partitioned table
for the column. Compression is supported only for variable-width data
types, and is used only when the column's storage mode
is <literal>main</literal> or <literal>extended</literal>.
(See <xref linkend="sql-altertable"/> for information on
column storage modes.) Setting this property for a partitioned table
has no direct effect, because such tables have no storage of their own,
but the configured value is inherited by newly-created partitions.
but the configured value will be inherited by newly-created partitions.
The supported compression methods are <literal>pglz</literal> and
<literal>lz4</literal>. <literal>lz4</literal> is available only if
<literal>--with-lz4</literal> was used when building
<productname>PostgreSQL</productname>. The default
is <literal>pglz</literal>.
<literal>lz4</literal>. (<literal>lz4</literal> is available only if
<option>--with-lz4</option> was used when building
<productname>PostgreSQL</productname>.) In addition,
<replaceable class="parameter">compression_method</replaceable>
can be <literal>default</literal> to explicitly specify the default
behavior, which is to consult the
<xref linkend="guc-default-toast-compression"/> setting at the time of
data insertion to determine the method to use.
</para>
</listitem>
</varlistentry>
Expand Down
4 changes: 2 additions & 2 deletions doc/src/sgml/ref/pg_dump.sgml
Original file line number Diff line number Diff line change
Expand Up @@ -975,8 +975,8 @@ PostgreSQL documentation
<para>
Do not output commands to set <acronym>TOAST</acronym> compression
methods.
With this option, all objects will be created using whichever
compression method is the default during restore.
With this option, all columns will be restored with the default
compression setting.
</para>
</listitem>
</varlistentry>
Expand Down
6 changes: 3 additions & 3 deletions doc/src/sgml/ref/pg_dumpall.sgml
Original file line number Diff line number Diff line change
Expand Up @@ -464,12 +464,12 @@ PostgreSQL documentation
<para>
Do not output commands to set <acronym>TOAST</acronym> compression
methods.
With this option, all objects will be created using whichever
compression method is the default during restore.
With this option, all columns will be restored with the default
compression setting.
</para>
</listitem>
</varlistentry>

<varlistentry>
<term><option>--no-unlogged-table-data</option></term>
<listitem>
Expand Down
17 changes: 10 additions & 7 deletions doc/src/sgml/storage.sgml
Original file line number Diff line number Diff line change
Expand Up @@ -376,6 +376,16 @@ but the varlena header does not tell whether it has occurred &mdash;
the content of the <acronym>TOAST</acronym> pointer tells that, instead.
</para>

<para>
The compression technique used for either in-line or out-of-line compressed
data can be selected for each column by setting
the <literal>COMPRESSION</literal> column option in <command>CREATE
TABLE</command> or <command>ALTER TABLE</command>. The default for columns
with no explicit setting is to consult the
<xref linkend="guc-default-toast-compression"/> parameter at the time data is
inserted.
</para>

<para>
As mentioned, there are multiple types of <acronym>TOAST</acronym> pointer datums.
The oldest and most common type is a pointer to out-of-line data stored in
Expand All @@ -392,13 +402,6 @@ useful for avoiding copying and redundant processing of large data values.
Further details appear in <xref linkend="storage-toast-inmemory"/>.
</para>

<para>
The compression technique used for either in-line or out-of-line compressed
data can be selected using the <literal>COMPRESSION</literal> option on a per-column
basis when creating a table. The default for columns with no explicit setting
is taken from the value of <xref linkend="guc-default-toast-compression" />.
</para>

<sect2 id="storage-toast-ondisk">
<title>Out-of-Line, On-Disk TOAST Storage</title>

Expand Down
5 changes: 2 additions & 3 deletions src/backend/access/brin/brin_tuple.c
Original file line number Diff line number Diff line change
Expand Up @@ -232,11 +232,10 @@ brin_form_tuple(BrinDesc *brdesc, BlockNumber blkno, BrinMemTuple *tuple,
* same compression method. Otherwise we have to use the
* default method.
*/
if (att->atttypid == atttype->type_id &&
CompressionMethodIsValid(att->attcompression))
if (att->atttypid == atttype->type_id)
compression = att->attcompression;
else
compression = GetDefaultToastCompression();
compression = InvalidCompressionMethod;

cvalue = toast_compress_datum(value, compression);

Expand Down
13 changes: 2 additions & 11 deletions src/backend/access/common/indextuple.c
Original file line number Diff line number Diff line change
Expand Up @@ -104,18 +104,9 @@ index_form_tuple(TupleDesc tupleDescriptor,
att->attstorage == TYPSTORAGE_MAIN))
{
Datum cvalue;
char compression = att->attcompression;

/*
* If the compression method is not valid, use the default. We
* don't expect this to happen for regular index columns, which
* inherit the setting from the corresponding table column, but we
* do expect it to happen whenever an expression is indexed.
*/
if (!CompressionMethodIsValid(compression))
compression = GetDefaultToastCompression();

cvalue = toast_compress_datum(untoasted_values[i], compression);
cvalue = toast_compress_datum(untoasted_values[i],
att->attcompression);

if (DatumGetPointer(cvalue) != NULL)
{
Expand Down
6 changes: 4 additions & 2 deletions src/backend/access/common/toast_internals.c
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,12 @@ toast_compress_datum(Datum value, char cmethod)
Assert(!VARATT_IS_EXTERNAL(DatumGetPointer(value)));
Assert(!VARATT_IS_COMPRESSED(DatumGetPointer(value)));

Assert(CompressionMethodIsValid(cmethod));

valsize = VARSIZE_ANY_EXHDR(DatumGetPointer(value));

/* If the compression method is not valid, use the current default */
if (!CompressionMethodIsValid(cmethod))
cmethod = default_toast_compression;

/*
* Call appropriate compression routine for the compression method.
*/
Expand Down
7 changes: 2 additions & 5 deletions src/backend/access/common/tupdesc.c
Original file line number Diff line number Diff line change
Expand Up @@ -642,10 +642,7 @@ TupleDescInitEntry(TupleDesc desc,
att->attbyval = typeForm->typbyval;
att->attalign = typeForm->typalign;
att->attstorage = typeForm->typstorage;
if (IsStorageCompressible(typeForm->typstorage))
att->attcompression = GetDefaultToastCompression();
else
att->attcompression = InvalidCompressionMethod;
att->attcompression = InvalidCompressionMethod;
att->attcollation = typeForm->typcollation;

ReleaseSysCache(tuple);
Expand Down Expand Up @@ -711,7 +708,7 @@ TupleDescInitBuiltinEntry(TupleDesc desc,
att->attbyval = false;
att->attalign = TYPALIGN_INT;
att->attstorage = TYPSTORAGE_EXTENDED;
att->attcompression = GetDefaultToastCompression();
att->attcompression = InvalidCompressionMethod;
att->attcollation = DEFAULT_COLLATION_OID;
break;

Expand Down
12 changes: 9 additions & 3 deletions src/backend/access/heap/heapam_handler.c
Original file line number Diff line number Diff line change
Expand Up @@ -2483,10 +2483,10 @@ reform_and_rewrite_tuple(HeapTuple tuple,
* perform the compression here; we just need to decompress. That
* will trigger recompression later on.
*/

struct varlena *new_value;
ToastCompressionId cmid;
char cmethod;
char targetmethod;

new_value = (struct varlena *) DatumGetPointer(values[i]);
cmid = toast_get_compression_id(new_value);
Expand All @@ -2495,7 +2495,7 @@ reform_and_rewrite_tuple(HeapTuple tuple,
if (cmid == TOAST_INVALID_COMPRESSION_ID)
continue;

/* convert compression id to compression method */
/* convert existing compression id to compression method */
switch (cmid)
{
case TOAST_PGLZ_COMPRESSION_ID:
Expand All @@ -2506,10 +2506,16 @@ reform_and_rewrite_tuple(HeapTuple tuple,
break;
default:
elog(ERROR, "invalid compression method id %d", cmid);
cmethod = '\0'; /* keep compiler quiet */
}

/* figure out what the target method is */
targetmethod = TupleDescAttr(newTupDesc, i)->attcompression;
if (!CompressionMethodIsValid(targetmethod))
targetmethod = default_toast_compression;

/* if compression method doesn't match then detoast the value */
if (TupleDescAttr(newTupDesc, i)->attcompression != cmethod)
if (targetmethod != cmethod)
{
values[i] = PointerGetDatum(detoast_attr(new_value));
values_free[i] = true;
Expand Down
7 changes: 2 additions & 5 deletions src/backend/bootstrap/bootstrap.c
Original file line number Diff line number Diff line change
Expand Up @@ -701,6 +701,7 @@ DefineAttr(char *name, char *type, int attnum, int nullness)
attrtypes[attnum]->attbyval = Ap->am_typ.typbyval;
attrtypes[attnum]->attalign = Ap->am_typ.typalign;
attrtypes[attnum]->attstorage = Ap->am_typ.typstorage;
attrtypes[attnum]->attcompression = InvalidCompressionMethod;
attrtypes[attnum]->attcollation = Ap->am_typ.typcollation;
/* if an array type, assume 1-dimensional attribute */
if (Ap->am_typ.typelem != InvalidOid && Ap->am_typ.typlen < 0)
Expand All @@ -715,6 +716,7 @@ DefineAttr(char *name, char *type, int attnum, int nullness)
attrtypes[attnum]->attbyval = TypInfo[typeoid].byval;
attrtypes[attnum]->attalign = TypInfo[typeoid].align;
attrtypes[attnum]->attstorage = TypInfo[typeoid].storage;
attrtypes[attnum]->attcompression = InvalidCompressionMethod;
attrtypes[attnum]->attcollation = TypInfo[typeoid].collation;
/* if an array type, assume 1-dimensional attribute */
if (TypInfo[typeoid].elem != InvalidOid &&
Expand All @@ -724,11 +726,6 @@ DefineAttr(char *name, char *type, int attnum, int nullness)
attrtypes[attnum]->attndims = 0;
}

if (IsStorageCompressible(attrtypes[attnum]->attstorage))
attrtypes[attnum]->attcompression = GetDefaultToastCompression();
else
attrtypes[attnum]->attcompression = InvalidCompressionMethod;

/*
* If a system catalog column is collation-aware, force it to use C
* collation, so that its behavior is independent of the database's
Expand Down
4 changes: 1 addition & 3 deletions src/backend/catalog/genbki.pl
Original file line number Diff line number Diff line change
Expand Up @@ -899,9 +899,7 @@ sub morph_row_for_pgattr
$row->{attbyval} = $type->{typbyval};
$row->{attalign} = $type->{typalign};
$row->{attstorage} = $type->{typstorage};

$row->{attcompression} =
$type->{typstorage} ne 'p' && $type->{typstorage} ne 'e' ? 'p' : '\0';
$row->{attcompression} = '\0';

# set attndims if it's an array type
$row->{attndims} = $type->{typcategory} eq 'A' ? '1' : '0';
Expand Down
2 changes: 0 additions & 2 deletions src/backend/catalog/heap.c
Original file line number Diff line number Diff line change
Expand Up @@ -1719,8 +1719,6 @@ RemoveAttributeById(Oid relid, AttrNumber attnum)
/* Unset this so no one tries to look up the generation expression */
attStruct->attgenerated = '\0';

attStruct->attcompression = InvalidCompressionMethod;

/*
* Change the column name to something that isn't likely to conflict
*/
Expand Down
Loading

0 comments on commit e6241d8

Please sign in to comment.