Skip to content

Commit ee50ba7

Browse files
committed
Refactor how we form HeapTuples for CatalogTuple(Insert|Update)
This commit provides a set of macros that should be used when inserting or updating tuples. These macros help declare state necessary when tracking updates and forming new tuples. There are macros used to set fields to new values or to NULL. Finally there are macros for inserting or updating these tuples. There are a few reasons to standardize catalog modification code and move it away from direct Form/GETSTRUCT or values/nulls/replaces-style access. First, these models can be error prone and require attention to a variety of details that can easily be overlooked. Second, at present while we know what fields are modified we don't preserve that information and pass it on to the heap_update() code. Third, at some point we'll need in-memory representations completely decoupled from the disk to support upgradeable catalogs. Previous this patch there are two methods for accomplishing this task; Form/GETSTRUCT, and values/nulls/replaces. This new method provides a more intuitive and less error-prone approach without changing the fundamentals of the process, meaning this should remain backward compatible. It is now possible to retain knowledge of the set of mutated attributes when working with catalog tuples. A follow-on patch will use this to avoid the overhead of HeapDetermineColumnsInfo() in heap_update() where (while holding a lock) we re-discover the set of modified by comparing old/new HeapTuple Datums when trying to identify indexed attributes that have new values and should prevent HOT updates. The "Form/GETSTRUCT" model allows for direct access to the tuple data that is then modified, copied, and then updated via CatalogTupleUpdate(). Old: Form_pg_index form = (Form_pg_index) GETSTRUCT(tuple); form->inisclustered = false; CatalogTupleUpdate(relation, &tuple->t_self, tuple); New: CatalogUpdateFieldContext(pg_index, ctx); CatalogSetForm(pg_index, ctx, tuple); CatalogTupleUpdateField(ctx, pg_index, indisclustered, false); ModifyCatalogTupleField(relation, tuple, ctx); The "values/nulls/replaces" model collects the necessary information to either update or create a heap tuple using heap_modify_tuple() or heap_form_tuple() or sometimes heap_modify_tuple_by_cols(). While all those functions remain unchanged and can be used there is a new model. Old: bool nulls[Natts_pg_type]; bool replaces[Natts_pg_type]; Datum values[Natts_pg_type]; values[Anum_pg_type_typtype - 1] = CharGetDatum(typeType); nulls[Anum_pg_type_typdefaultbin - 1] = true; replaces[Anum_pg_type_oid - 1] = false; tup = heap_modify_tuple(tuple, desc, values, nulls, replaces); CatalogTupleUpdate(relation, &tuple->t_self, tuple); New: CatalogUpdateValuesContext(pg_type, ctx); CatalogTupleUpdateValue(ctx, pg_type, typtype, CharGetDatum(typeType)); ModifyCatalogTupleValues(relation, tuple, ctx); The heap_update_tuple() function is functionally equivalent to heap_modify_tuple(), but takes a Bitmapset called "updated" rather than an array of bool generally called "replaces" as a method for indicating what was modified. Additionally, this new function tries to balance the tradeoffs of calling heap_getattr() versus heap_deform_tuple() based on the ratio of attributes updated and their known runtime complexities. Both paths are functionally equivalent. The changes also include initialization of the values/nulls arrays rather than loops or memset(). There is no impact to non-catalog related paths.
1 parent b9b780a commit ee50ba7

File tree

10 files changed

+704
-395
lines changed

10 files changed

+704
-395
lines changed

src/backend/access/common/heaptuple.c

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1325,6 +1325,125 @@ heap_modify_tuple_by_cols(HeapTuple tuple,
13251325
return newTuple;
13261326
}
13271327

1328+
/*
1329+
* heap_update_tuple
1330+
* Form a new tuple from an old tuple and a set of replacement values.
1331+
*
1332+
* Creates a new HeapTuple by selectively replacing attributes from the original
1333+
* tuple with new values. The 'updated' Bitmapset specifies which attributes
1334+
* (by attribute number, 1-based adjusted by FirstLowInvalidHeapAttributeNumber)
1335+
* should be replaced with corresponding entries from new_values and new_isnull
1336+
* arrays (0-based indices).
1337+
*
1338+
* Performance strategy:
1339+
* - If updating many attributes (> 2*natts/3), use heap_getattr() to extract
1340+
* only the few non-updated attributes. This is O(k*n) where k is the number
1341+
* of non-updated attributes, which is small when updating many.
1342+
* - If updating few attributes (<= 2*natts/3), use heap_deform_tuple() to
1343+
* extract all attributes at once (O(n)), then replace the updated ones.
1344+
* This avoids the O(n^2) cost of many heap_getattr() calls.
1345+
*
1346+
* The threshold of 2*natts/3 balances the fixed O(n) cost of heap_deform_tuple
1347+
* against the variable O(k*n) cost of heap_getattr, where k = natts - num_updated.
1348+
*/
1349+
HeapTuple
1350+
heap_update_tuple(HeapTuple tuple,
1351+
TupleDesc desc,
1352+
const Datum *new_values,
1353+
const bool *new_nulls,
1354+
const Bitmapset *updated)
1355+
{
1356+
int natts = desc->natts;
1357+
int num_updated;
1358+
Datum *values;
1359+
bool *nulls;
1360+
HeapTuple new_tuple;
1361+
1362+
Assert(!bms_is_empty(updated));
1363+
1364+
num_updated = bms_num_members(updated);
1365+
Assert(num_updated <= natts);
1366+
1367+
values = (Datum *) palloc0(natts * sizeof(Datum));
1368+
nulls = (bool *) palloc0(natts * sizeof(bool));
1369+
1370+
/*
1371+
* Choose strategy based on update density. When updating most attributes,
1372+
* it's cheaper to extract the few unchanged ones individually.
1373+
*/
1374+
if (num_updated > (2 * natts) / 3)
1375+
{
1376+
/* Updating many: use heap_getattr for the few non-updated attributes */
1377+
for (int i = 0; i < natts; i++)
1378+
{
1379+
/*
1380+
* Convert array index to attribute number, then to bitmapset
1381+
* member. Array index i (0-based) -> attnum (1-based) -> bms
1382+
* member.
1383+
*/
1384+
AttrNumber attnum = i + 1;
1385+
int member = attnum - FirstLowInvalidHeapAttributeNumber;
1386+
1387+
if (bms_is_member(member, updated))
1388+
{
1389+
/* Use replacement value */
1390+
if (unlikely(!new_values || !new_nulls))
1391+
values[i] = heap_getattr(tuple, attnum, desc, &nulls[i]);
1392+
1393+
if (likely(new_values))
1394+
values[i] = new_values[i];
1395+
1396+
if (likely(new_nulls))
1397+
nulls[i] = new_nulls[i];
1398+
}
1399+
else
1400+
{
1401+
/* Extract original value using heap_getattr (1-based) */
1402+
values[i] = heap_getattr(tuple, attnum, desc, &nulls[i]);
1403+
}
1404+
}
1405+
}
1406+
else
1407+
{
1408+
int member = -1;
1409+
1410+
/* Updating few: deform entire tuple, then replace updated attributes */
1411+
heap_deform_tuple(tuple, desc, values, nulls);
1412+
1413+
while ((member = bms_next_member(updated, member)) >= 0)
1414+
{
1415+
/*
1416+
* Convert bitmapset member to attribute number, then to array
1417+
* index. bms_member -> attnum (1-based) -> array index i
1418+
* (0-based).
1419+
*/
1420+
AttrNumber attnum = member + FirstLowInvalidHeapAttributeNumber;
1421+
int i = attnum - 1;
1422+
1423+
Assert(i >= 0 && i < natts);
1424+
1425+
if (likely(new_values))
1426+
values[i] = new_values[i];
1427+
1428+
if (likely(new_nulls))
1429+
nulls[i] = new_nulls[i];
1430+
}
1431+
}
1432+
1433+
/* Create the new tuple */
1434+
new_tuple = heap_form_tuple(desc, values, nulls);
1435+
1436+
pfree(values);
1437+
pfree(nulls);
1438+
1439+
/* Preserve tuple identity and location information from the original */
1440+
new_tuple->t_data->t_ctid = tuple->t_data->t_ctid;
1441+
new_tuple->t_self = tuple->t_self;
1442+
new_tuple->t_tableOid = tuple->t_tableOid;
1443+
1444+
return new_tuple;
1445+
}
1446+
13281447
/*
13291448
* heap_deform_tuple
13301449
* Given a tuple, extract data into values/isnull arrays; this is

src/backend/catalog/indexing.c

Lines changed: 71 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -225,58 +225,67 @@ CatalogTupleCheckConstraints(Relation heapRel, HeapTuple tup)
225225
*
226226
* This is a convenience routine for the common case of inserting a single
227227
* tuple in a system catalog; it inserts a new heap tuple, keeping indexes
228-
* current. Avoid using it for multiple tuples, since opening the indexes
229-
* and building the index info structures is moderately expensive.
230-
* (Use CatalogTupleInsertWithInfo in such cases.)
228+
* current.
229+
*
230+
* If 'indstate' is NULL, the function opens and closes the indexes internally.
231+
* This is convenient for single-tuple updates but has overhead from opening the
232+
* indexes and building index info structures.
233+
*
234+
* If 'indstate' is provided (non-NULL), it is used directly without opening or
235+
* closing indexes. This allows callers to amortize index management costs across
236+
* multiple tuple updates. Callers must use CatalogOpenIndexes() before the first
237+
* update and CatalogCloseIndexes() after the last update.
238+
*
239+
* XXX: At some point we might cache the CatalogIndexState data somewhere (perhaps
240+
* in the relcache) so that callers needn't trouble over this.
231241
*/
232242
void
233-
CatalogTupleInsert(Relation heapRel, HeapTuple tup)
243+
CatalogTupleInsert(Relation heapRel, HeapTuple tup,
244+
CatalogIndexState indstate)
234245
{
235-
CatalogIndexState indstate;
236-
237-
CatalogTupleCheckConstraints(heapRel, tup);
238-
239-
indstate = CatalogOpenIndexes(heapRel);
246+
bool close_indexes = false;
240247

241-
simple_heap_insert(heapRel, tup);
242-
243-
CatalogIndexInsert(indstate, tup, TU_All);
244-
CatalogCloseIndexes(indstate);
245-
}
248+
/* Open indexes if not provided by caller */
249+
if (indstate == NULL)
250+
{
251+
indstate = CatalogOpenIndexes(heapRel);
252+
close_indexes = true;
253+
}
246254

247-
/*
248-
* CatalogTupleInsertWithInfo - as above, but with caller-supplied index info
249-
*
250-
* This should be used when it's important to amortize CatalogOpenIndexes/
251-
* CatalogCloseIndexes work across multiple insertions. At some point we
252-
* might cache the CatalogIndexState data somewhere (perhaps in the relcache)
253-
* so that callers needn't trouble over this ... but we don't do so today.
254-
*/
255-
void
256-
CatalogTupleInsertWithInfo(Relation heapRel, HeapTuple tup,
257-
CatalogIndexState indstate)
258-
{
259255
CatalogTupleCheckConstraints(heapRel, tup);
260256

261257
simple_heap_insert(heapRel, tup);
262258

263259
CatalogIndexInsert(indstate, tup, TU_All);
260+
261+
/* Close indexes only if we opened them ourselves */
262+
if (close_indexes)
263+
CatalogCloseIndexes(indstate);
264264
}
265265

266266
/*
267-
* CatalogTuplesMultiInsertWithInfo - as above, but for multiple tuples
267+
* CatalogTuplesMultiInsert - as above, but for multiple tuples
268268
*
269269
* Insert multiple tuples into the given catalog relation at once, with an
270270
* amortized cost of CatalogOpenIndexes.
271271
*/
272272
void
273-
CatalogTuplesMultiInsertWithInfo(Relation heapRel, TupleTableSlot **slot,
274-
int ntuples, CatalogIndexState indstate)
273+
CatalogTuplesMultiInsert(Relation heapRel, TupleTableSlot **slot,
274+
int ntuples, CatalogIndexState indstate)
275275
{
276+
bool close_indexes = false;
277+
276278
/* Nothing to do */
277279
if (ntuples <= 0)
278280
return;
279281

282+
/* Open indexes if not provided by caller */
283+
if (indstate == NULL)
284+
{
285+
indstate = CatalogOpenIndexes(heapRel);
286+
close_indexes = true;
287+
}
288+
280289
heap_multi_insert(heapRel, slot, ntuples,
281290
GetCurrentCommandId(true), 0, NULL);
282291

@@ -296,54 +305,55 @@ CatalogTuplesMultiInsertWithInfo(Relation heapRel, TupleTableSlot **slot,
296305
if (should_free)
297306
heap_freetuple(tuple);
298307
}
308+
309+
/* Close indexes only if we opened them ourselves */
310+
if (close_indexes)
311+
CatalogCloseIndexes(indstate);
299312
}
300313

301314
/*
302315
* CatalogTupleUpdate - do heap and indexing work for updating a catalog tuple
303316
*
304317
* Update the tuple identified by "otid", replacing it with the data in "tup".
305318
*
306-
* This is a convenience routine for the common case of updating a single
307-
* tuple in a system catalog; it updates one heap tuple, keeping indexes
308-
* current. Avoid using it for multiple tuples, since opening the indexes
309-
* and building the index info structures is moderately expensive.
310-
* (Use CatalogTupleUpdateWithInfo in such cases.)
319+
* This function updates a heap tuple in a system catalog and keeps its indexes
320+
* current. The 'updated' bitmapset specifies which columns were modified.
321+
*
322+
* If 'indstate' is NULL, the function opens and closes the indexes internally.
323+
* This is convenient for single-tuple updates but has overhead from opening the
324+
* indexes and building index info structures.
325+
*
326+
* If 'indstate' is provided (non-NULL), it is used directly without opening or
327+
* closing indexes. This allows callers to amortize index management costs across
328+
* multiple tuple updates. Callers must use CatalogOpenIndexes() before the first
329+
* update and CatalogCloseIndexes() after the last update.
330+
*
331+
* XXX: At some point we might cache the CatalogIndexState data somewhere (perhaps
332+
* in the relcache) so that callers needn't trouble over this.
311333
*/
312334
void
313-
CatalogTupleUpdate(Relation heapRel, const ItemPointerData *otid, HeapTuple tup)
335+
CatalogTupleUpdate(Relation heapRel, const ItemPointerData *otid, HeapTuple tuple,
336+
const Bitmapset *updated, CatalogIndexState indstate)
314337
{
315-
CatalogIndexState indstate;
316338
TU_UpdateIndexes updateIndexes = TU_All;
339+
bool close_indexes = false;
317340

318-
CatalogTupleCheckConstraints(heapRel, tup);
319-
320-
indstate = CatalogOpenIndexes(heapRel);
321-
322-
simple_heap_update(heapRel, otid, tup, &updateIndexes);
323-
324-
CatalogIndexInsert(indstate, tup, updateIndexes);
325-
CatalogCloseIndexes(indstate);
326-
}
341+
CatalogTupleCheckConstraints(heapRel, tuple);
327342

328-
/*
329-
* CatalogTupleUpdateWithInfo - as above, but with caller-supplied index info
330-
*
331-
* This should be used when it's important to amortize CatalogOpenIndexes/
332-
* CatalogCloseIndexes work across multiple updates. At some point we
333-
* might cache the CatalogIndexState data somewhere (perhaps in the relcache)
334-
* so that callers needn't trouble over this ... but we don't do so today.
335-
*/
336-
void
337-
CatalogTupleUpdateWithInfo(Relation heapRel, const ItemPointerData *otid, HeapTuple tup,
338-
CatalogIndexState indstate)
339-
{
340-
TU_UpdateIndexes updateIndexes = TU_All;
343+
/* Open indexes if not provided by caller */
344+
if (indstate == NULL)
345+
{
346+
indstate = CatalogOpenIndexes(heapRel);
347+
close_indexes = true;
348+
}
341349

342-
CatalogTupleCheckConstraints(heapRel, tup);
350+
simple_heap_update(heapRel, otid, tuple, &updateIndexes);
343351

344-
simple_heap_update(heapRel, otid, tup, &updateIndexes);
352+
CatalogIndexInsert(indstate, tuple, updateIndexes);
345353

346-
CatalogIndexInsert(indstate, tup, updateIndexes);
354+
/* Close indexes only if we opened them ourselves */
355+
if (close_indexes)
356+
CatalogCloseIndexes(indstate);
347357
}
348358

349359
/*
@@ -355,11 +365,6 @@ CatalogTupleUpdateWithInfo(Relation heapRel, const ItemPointerData *otid, HeapTu
355365
* cleanup will be done later by VACUUM. However, callers of this function
356366
* shouldn't have to know that; we'd like a uniform abstraction for all
357367
* catalog tuple changes. Hence, provide this currently-trivial wrapper.
358-
*
359-
* The abstraction is a bit leaky in that we don't provide an optimized
360-
* CatalogTupleDeleteWithInfo version, because there is currently nothing to
361-
* optimize. If we ever need that, rather than touching a lot of call sites,
362-
* it might be better to do something about caching CatalogIndexState.
363368
*/
364369
void
365370
CatalogTupleDelete(Relation heapRel, const ItemPointerData *tid)

0 commit comments

Comments
 (0)