Skip to content

Long transaction in PostgreSQL when marking large number of splits for deletion #5923

@earlbread

Description

@earlbread

When marking a large number of splits for deletion, MetaStore(PostgreSQL) experiences long-running transactions that cause database lock contention and performance degradation.

The mark_splits_for_deletion operation processes all splits in a single transaction without batching, which can lock thousands of rows simultaneously.

    // retention_policy_execution.rs
    let mark_splits_for_deletion_request =
        MarkSplitsForDeletionRequest::new(index_uid, expired_split_ids);
    ctx.protect_future(metastore.mark_splits_for_deletion(mark_splits_for_deletion_request))
        .await?;
2025-09-30 09:00:34.807 | 2025-09-30T00:00:34.807Z ERROR quickwit_janitor::actors::retention_policy_executor: Failed to execute the retention policy on the index. index_id=log.common.application_log_v1_quickwit error=request timed out: client
2025-09-30 09:00:02.393 | 2025-09-30T00:00:02.393Z  INFO quickwit_janitor::retention_policy_execution: Marking 245742 splits for deletion based on retention policy. index_id=log.common.application_log_v1_quickwit split_ids=["01K3W82WGQVVPQBX67JPA51715", "01K3YMY9VYP55QZPX01VHNXJBE", "01K3W6G2FZCEPD6QBM6KHEHE2X", "01K3WDBKTNC3K09KPNV9SVCM4M", "01K3W82VYEN7Q96WFEK6RE5P3Z", and 245737 more]

This problem occurs periodically in the morning hours when retention policies are evaluated and large numbers of splits need to be marked for deletion.

To avoid long-running transactions and database lock contention, these operations should be processed in smaller batches.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions