Skip to content

postgres‐vacume‐pg_repack‐cloudsql pg_squeeze

ghdrako edited this page May 8, 2024 · 2 revisions

pg_repack

pg_repack works by creating a new copy of the table being processed, setting up triggers to replicate new data while it fills up with the existing data. An exclusive full table lock is required both at the start and finish of the process when swapping the old and new tables.

pg_repack is very effective in reclaiming space and can work with all types of bloat. It's available out of the box both on Amazon RDS and Google Cloud SQL.

The drawback you might experience when terminating the process (which may be necessary for various reasons, such as impact on the running environment) is that it won’t clean up all the fragments as it won't remain connected to the target database.

psql -U postgres -d testdb;  # connect as cloudsqlsuperuser typical postgres
GRANT testuser TO postgres;
CREATE EXTENSION pg_repack;
$> pg_repack -h <hostname> -d testdb -U csuper1 -k -t t1
REVOKE testuser FROM csuper1;

pg_squeeze

pg_squeeze is built differently then pg_repack, relying on logical decoding instead of triggers. Its main benefit is a lower impact on the host system during table rebuilding, improving availability and stability.

Like pg_repack, pg_squeeze creates a new table and copies the existing data from the bloated table. Logical replication is involved in streaming changes from the original table to the newly created one in real-time. This allows the new table to stay up-to-date during the process without unnecessary impact on the regular operations performed on the bloated table. Thus, pg_squeeze significantly reduces the need for locking. The exclusive lock is needed only during the final phase of the operation, when the old table is swapped out for the new, optimized table. The duration of the exclusive lock can also be configured.

While pg_squeeze is also available as an extension, its deployment necessitates configuration changes involving wal_level, max_replication_slots, and shared_preload_libraries. Due to this, a restart of the cluster is required.

On the other hand, pg_squeeze is designed for regular, rather than ad-hoc processing only. You can register a table for regular processing, and whenever the table meets the criteria to be "squeezed," a task will be added to a queue, where it will be sequentially processed in the order they were created.

While pg_squeeze might be considered superior to pg_repack in terms of maintenance operations and impact, it comes with a significant caveat when reclaiming space—compared to pg_repack, it copies the full rows as they are. This behaviour renders it ineffective at removing bloat created due to dropped columns.

As already mentioned, pg_squeeze can use a specified index for clustering when needed. The limitation you might find is that clustering cannot be performed on a partial index. Compared to pg_repack, it always seems to clean up all the artefacts accordingly (thanks to the always-running worker process).

Unfortunately at the moment of writing the article pg_squeeze is not available for Amazon RDS, only Google Cloud SQL.

Test

Clone this wiki locally