Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ notifications:
on_failure: always

env:
- PG_VERSION=18
- PG_VERSION=18 LEVEL=hardcore
- PG_VERSION=17
- PG_VERSION=17 LEVEL=hardcore
- PG_VERSION=16
Expand All @@ -32,6 +34,4 @@ env:
- PG_VERSION=14
- PG_VERSION=14 LEVEL=hardcore
- PG_VERSION=13
- PG_VERSION=13 LEVEL=hardcore
- PG_VERSION=12
- PG_VERSION=12 LEVEL=hardcore
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
RUM is released under the PostgreSQL License, a liberal Open Source license, similar to the BSD or MIT licenses.

Portions Copyright (c) 2015-2024, Postgres Professional
Portions Copyright (c) 2015-2025, Postgres Professional
Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
Portions Copyright (c) 1994, The Regents of the University of California

Expand Down
36 changes: 20 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,29 +8,33 @@

## Introduction

The **rum** module provides an access method to work with a `RUM` index. It is based
on the `GIN` access method's code.
The **rum** module provides access method to work with the `RUM` indexes. It is based
on the `GIN` access method code.

A `GIN` index allows performing fast full-text search using `tsvector` and
`tsquery` types. But full-text search with a GIN index has several problems:
`GIN` index allows you to perform fast full-text search using `tsvector` and
`tsquery` types. However, full-text search with `GIN` index has some performance
issues because positional and other additional information is not stored.

- Slow ranking. It needs positional information about lexemes to do ranking. A `GIN`
index doesn't store positions of lexemes. So after index scanning, we need an
additional heap scan to retrieve lexeme positions.
- Slow phrase search with a `GIN` index. This problem relates to the previous
problem. It needs positional information to perform phrase search.
- Slow ordering by timestamp. A `GIN` index can't store some related information
in the index with lexemes. So it is necessary to perform an additional heap scan.
`RUM` solves these issues by storing additional information in a posting tree.
As compared to `GIN`, `RUM` index has the following benefits:

`RUM` solves these problems by storing additional information in a posting tree.
For example, positional information of lexemes or timestamps. You can get an
idea of `RUM` with the following diagram:
- Faster ranking. Ranking requires positional information. And after the
index scan we do not need an additional heap scan to retrieve lexeme positions
because `RUM` index stores them.
- Faster phrase search. This improvement is related to the previous one as
phrase search also needs positional information.
- Faster ordering by timestamp. `RUM` index stores additional information together
with lexemes, so it is not necessary to perform a heap scan.
- A possibility to perform depth-first search and therefore return first
results immediately.

You can get an idea of `RUM` with the following diagram:

[![How RUM stores additional information](img/gin_rum.svg)](https://postgrespro.ru/docs/enterprise/current/rum?lang=en)

A drawback of `RUM` is that it has slower build and insert times than `GIN`.
The drawback of `RUM` is that it has slower build and insert time as compared to `GIN`
This is because we need to store additional information besides keys and because
`RUM` uses generic Write-Ahead Log (WAL) records.
because `RUM` stores additional information together with keys and uses generic WAL records.

## License

Expand Down
2 changes: 1 addition & 1 deletion TODO
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
1. with naturalOrder=true make scan the rest to be consistent with seqscan [done]
2. add leftlink to data page to privide backward scan on index (<=| op) [done]
3. Compression of ItemPointer for use_alternative_order
3. ItemPointer compression for indexes with order_by_attach
4. Compression addInfo
5. Remove FROM_STRATEGY ugly magick [done]

Expand Down
2 changes: 1 addition & 1 deletion src/disable_core_macro.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
* disable_core_macro.h
* Support including tuplesort.c from postgresql core code.
*
* Copyright (c) 2022-2024, Postgres Professional
* Copyright (c) 2022-2025, Postgres Professional
*
*-------------------------------------------------------------------------
*/
Expand Down
32 changes: 16 additions & 16 deletions src/rum.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
* rum.h
* Exported definitions for RUM index.
*
* Portions Copyright (c) 2015-2024, Postgres Professional
* Portions Copyright (c) 2015-2025, Postgres Professional
* Portions Copyright (c) 2006-2022, PostgreSQL Global Development Group
*
*-------------------------------------------------------------------------
Expand Down Expand Up @@ -46,7 +46,7 @@ typedef struct RumPageOpaqueData
BlockNumber rightlink; /* next page if any */
OffsetNumber maxoff; /* number entries on RUM_DATA page: number of
* heap ItemPointers on RUM_DATA|RUM_LEAF page
* or number of PostingItems on RUM_DATA &
* or number of RumPostingItems on RUM_DATA &
* ~RUM_LEAF page. On RUM_LIST page, number of
* heap tuples. */
OffsetNumber freespace;
Expand Down Expand Up @@ -150,19 +150,19 @@ typedef struct RumMetaPageData
* (which is InvalidBlockNumber/0) as well as from all normal item
* pointers (which have item numbers in the range 1..MaxHeapTuplesPerPage).
*/
#define ItemPointerSetMin(p) \
#define RumItemPointerSetMin(p) \
ItemPointerSet((p), (BlockNumber)0, (OffsetNumber)0)
#define ItemPointerIsMin(p) \
#define RumItemPointerIsMin(p) \
(RumItemPointerGetOffsetNumber(p) == (OffsetNumber)0 && \
RumItemPointerGetBlockNumber(p) == (BlockNumber)0)
#define ItemPointerSetMax(p) \
#define RumItemPointerSetMax(p) \
ItemPointerSet((p), InvalidBlockNumber, (OffsetNumber)0xfffe)
#define ItemPointerIsMax(p) \
#define RumItemPointerIsMax(p) \
(RumItemPointerGetOffsetNumber(p) == (OffsetNumber)0xfffe && \
RumItemPointerGetBlockNumber(p) == InvalidBlockNumber)
#define ItemPointerSetLossyPage(p, b) \
ItemPointerSet((p), (b), (OffsetNumber)0xffff)
#define ItemPointerIsLossyPage(p) \
#define RumItemPointerIsLossyPage(p) \
(RumItemPointerGetOffsetNumber(p) == (OffsetNumber)0xffff && \
RumItemPointerGetBlockNumber(p) != InvalidBlockNumber)

Expand All @@ -175,7 +175,7 @@ typedef struct RumItem

#define RumItemSetMin(item) \
do { \
ItemPointerSetMin(&((item)->iptr)); \
RumItemPointerSetMin(&((item)->iptr)); \
(item)->addInfoIsNull = true; \
(item)->addInfo = (Datum) 0; \
} while (0)
Expand All @@ -188,12 +188,12 @@ typedef struct
/* We use BlockIdData not BlockNumber to avoid padding space wastage */
BlockIdData child_blkno;
RumItem item;
} PostingItem;
} RumPostingItem;

#define PostingItemGetBlockNumber(pointer) \
#define RumPostingItemGetBlockNumber(pointer) \
BlockIdGetBlockNumber(&(pointer)->child_blkno)

#define PostingItemSetBlockNumber(pointer, blockNumber) \
#define RumPostingItemSetBlockNumber(pointer, blockNumber) \
BlockIdSet(&((pointer)->child_blkno), (blockNumber))

/*
Expand Down Expand Up @@ -265,8 +265,8 @@ typedef signed char RumNullCategory;
* Data (posting tree) pages
*/
/*
* FIXME -- Currently RumItem is placed as a pages right bound and PostingItem
* is placed as a non-leaf pages item. Both RumItem and PostingItem stores
* FIXME -- Currently RumItem is placed as a pages right bound and RumPostingItem
* is placed as a non-leaf pages item. Both RumItem and RumPostingItem stores
* AddInfo as a raw Datum, which is bogus. It is fine for pass-by-value
* attributes, but it isn't for pass-by-reference, which may have variable
* length of data. This AddInfo is used only by order_by_attach indexes, so it
Expand All @@ -278,12 +278,12 @@ typedef signed char RumNullCategory;
#define RumDataPageGetData(page) \
(PageGetContents(page) + MAXALIGN(sizeof(RumItem)))
#define RumDataPageGetItem(page,i) \
(RumDataPageGetData(page) + ((i)-1) * sizeof(PostingItem))
(RumDataPageGetData(page) + ((i)-1) * sizeof(RumPostingItem))

#define RumDataPageGetFreeSpace(page) \
(BLCKSZ - MAXALIGN(SizeOfPageHeaderData) \
- MAXALIGN(sizeof(RumItem)) /* right bound */ \
- RumPageGetOpaque(page)->maxoff * sizeof(PostingItem) \
- RumPageGetOpaque(page)->maxoff * sizeof(RumPostingItem) \
- MAXALIGN(sizeof(RumPageOpaqueData)))

#define RumMaxLeafDataItems \
Expand Down Expand Up @@ -513,7 +513,7 @@ typedef struct RumBtreeData
uint32 nitem;
uint32 curitem;

PostingItem pitem;
RumPostingItem pitem;
} RumBtreeData;

extern RumBtreeStack *rumPrepareFindLeafPage(RumBtree btree, BlockNumber blkno);
Expand Down
2 changes: 1 addition & 1 deletion src/rum_arr_utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
* rum_arr_utils.c
* various anyarray-search functions
*
* Portions Copyright (c) 2015-2024, Postgres Professional
* Portions Copyright (c) 2015-2025, Postgres Professional
* Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
*
*-------------------------------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion src/rum_ts_utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
* rum_ts_utils.c
* various text-search functions
*
* Portions Copyright (c) 2015-2024, Postgres Professional
* Portions Copyright (c) 2015-2025, Postgres Professional
* Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
*
*-------------------------------------------------------------------------
Expand Down
4 changes: 2 additions & 2 deletions src/rumbtree.c
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
* page utilities routines for the postgres inverted index access method.
*
*
* Portions Copyright (c) 2015-2024, Postgres Professional
* Portions Copyright (c) 2015-2025, Postgres Professional
* Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
Expand Down Expand Up @@ -102,7 +102,7 @@ rumReFindLeafPage(RumBtree btree, RumBtreeStack * stack)
* item pointer is less than item pointer previous to rightmost.
*/
if (compareRumItem(btree->rumstate, btree->entryAttnum,
&(((PostingItem *) RumDataPageGetItem(page, maxoff - 1))->item),
&(((RumPostingItem *) RumDataPageGetItem(page, maxoff - 1))->item),
&btree->items[btree->curitem]) >= 0)
{
break;
Expand Down
2 changes: 1 addition & 1 deletion src/rumbulk.c
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
* routines for fast build of inverted index
*
*
* Portions Copyright (c) 2015-2024, Postgres Professional
* Portions Copyright (c) 2015-2025, Postgres Professional
* Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
Expand Down
Loading