Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
7bd513a
Add files_cleanup.max_committed_ledger_chunks config option
achamayou Mar 27, 2026
1c3a0de
Use incremental SHA-256 hashing for ledger chunk digest comparison
achamayou Mar 27, 2026
83deb39
Address review: add missing includes, raise log level
achamayou Mar 27, 2026
ba1ea79
Merge branch 'main' into cleanup_ledger_files
achamayou Mar 30, 2026
3c8a6d3
Fix clang-tidy warnings in files_cleanup_timer.h
achamayou Mar 30, 2026
80d714d
fix
achamayou Mar 30, 2026
a570315
fix
achamayou Mar 30, 2026
9664dca
Merge branch 'main' into cleanup_ledger_files
achamayou Mar 30, 2026
5eba1af
Update src/host/files_cleanup_timer.h
achamayou Mar 30, 2026
598373a
Merge branch 'main' into cleanup_ledger_files
achamayou Mar 30, 2026
d1aa8fd
Review improvements: extract helpers for testability, add unit tests
achamayou Mar 30, 2026
8500556
Apply clang-format to new files
achamayou Mar 30, 2026
a6e9281
Remove accidentally committed binary
achamayou Mar 30, 2026
fa1b745
Add atomic guard to prevent overlapping cleanup tasks
achamayou Mar 31, 2026
2edc08f
Document periodic file cleanup cycle and concurrency guard
achamayou Mar 31, 2026
59ec2f4
Fix e2e cleanup tests: ignore expected FAIL log patterns
achamayou Mar 31, 2026
453948b
Merge branch 'main' into cleanup_ledger_files
achamayou Mar 31, 2026
99240ab
Apply suggestion from @achamayou
achamayou Mar 31, 2026
95e45ac
Remove information-free comments from unit tests
achamayou Mar 31, 2026
8a0a4de
Add snapshot watermark to protect recent ledger chunks
achamayou Mar 31, 2026
56f361f
Tighten suffix rules
achamayou Mar 31, 2026
bd954cb
fmt
achamayou Mar 31, 2026
2c17f13
Address PR review comments: 5 fixes
achamayou Mar 31, 2026
d0ea0e9
.
achamayou Mar 31, 2026
9d431b5
Fix error_code handling: distinguish I/O errors from missing files
achamayou Mar 31, 2026
67d10f8
Apply suggestion from @achamayou
achamayou Mar 31, 2026
4d03ada
Add e2e tests for ledger chunk cleanup edge cases
achamayou Mar 31, 2026
d359a83
Fix e2e tests: check stderr for startup failure, handle shutdown errors
achamayou Mar 31, 2026
86e8348
Simplify startup failure test: direct start with bad config
achamayou Mar 31, 2026
2049c4d
Merge branch 'main' into cleanup_ledger_files
achamayou Mar 31, 2026
349805c
Refactor digest check to tri-state enum for clearer concurrent deleti…
achamayou Mar 31, 2026
96ee7c3
Fix clang-tidy performance-enum-size warning on DigestCheckResult
achamayou Mar 31, 2026
28fb3ba
Merge branch 'main' into cleanup_ledger_files
achamayou Apr 1, 2026
83da928
Fix snapshot watermark test to re-query latest snapshot after chunk g…
achamayou Apr 1, 2026
17c30db
Increase schema_test timeout to 900s
achamayou Apr 1, 2026
112e0b2
Merge branch 'main' into cleanup_ledger_files
eddyashton Apr 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

### Added

- Added `files_cleanup.max_committed_ledger_chunks` configuration option to limit the number of committed ledger chunk files retained in the main ledger directory. When the number of committed chunks exceeds this value, the oldest chunks (by sequence number) are automatically deleted, but only after verifying that an identical copy (by SHA-256 digest) exists in at least one `ledger.read_only_directories` entry. Committed ledger chunks that contain entries at or beyond the sequence number of the newest committed snapshot are never deleted, ensuring a complete ledger history from that snapshot for disaster recovery. At least one read-only ledger directory must be configured; the node will refuse to start otherwise.
- Added `files_cleanup.max_snapshots` configuration option to limit the number of committed snapshot files retained on disk. When the number of committed snapshots exceeds this value, the oldest snapshots (by sequence number) are automatically deleted. The value must be at least 1 if set.
- Added `files_cleanup.interval` configuration option (default `"30s"`) to periodically scan the snapshot directory and delete old committed snapshots exceeding `max_snapshots`. This ensures backup nodes (which receive snapshots via `backup_fetch`) also prune old snapshots. Only effective when `max_snapshots` is set.
- Added `POST /node/snapshot:create`, gated by the `SnapshotCreate` RPC interface operator feature, to create a snapshot via an operator endpoint rather than a governance action.
Expand Down
6 changes: 6 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -643,6 +643,11 @@ if(BUILD_TESTS)
${CMAKE_CURRENT_SOURCE_DIR}/src/host/test/ledger.cpp
)

add_unit_test(
files_cleanup_test
${CMAKE_CURRENT_SOURCE_DIR}/src/host/test/files_cleanup_test.cpp
)

add_unit_test(
raft_test
${CMAKE_CURRENT_SOURCE_DIR}/src/consensus/aft/test/main.cpp
Expand Down Expand Up @@ -1231,6 +1236,7 @@ if(BUILD_TESTS)
--historical-testdata
${CMAKE_SOURCE_DIR}/tests/testdata
)
set_tests_properties(schema_test PROPERTIES TIMEOUT 900)

add_e2e_test(
NAME snp_platform_tests
Expand Down
7 changes: 6 additions & 1 deletion doc/host_config_schema/cchost_config.json
Original file line number Diff line number Diff line change
Expand Up @@ -555,10 +555,15 @@
"description": "Maximum number of committed snapshot files to retain. When the number of committed snapshots exceeds this value, the oldest snapshots are deleted. Must be at least 1 if set. If null or unset, no automated snapshot garbage collection is performed.",
"minimum": 1
},
"max_committed_ledger_chunks": {
"type": ["integer", "null"],
"default": null,
"description": "Maximum number of committed ledger chunk files to retain in the main ledger directory. When the number of committed chunks exceeds this value, the oldest chunks are deleted, but only after verifying that an identical copy (by SHA-256 digest) exists in at least one read-only ledger directory. Chunks whose entries extend to or beyond the sequence number of the newest committed snapshot are never deleted, ensuring a complete ledger history from that snapshot for disaster recovery. Requires at least one ledger.read_only_directories entry; the node will refuse to start otherwise. If null or unset, no automated ledger chunk garbage collection is performed."
},
"interval": {
"type": "string",
"default": "30s",
"description": "Time interval at which to scan the snapshot directory and delete old committed snapshots in excess of max_snapshots. This periodic cleanup executes regardless of the node's status (primary or backup)."
"description": "Time interval at which to scan and delete old committed files (snapshots and ledger chunks) that exceed the configured retention limits. This periodic cleanup executes regardless of the node's status (primary or backup)."
}
},
"description": "This section includes configuration for periodic cleanup of old files (snapshots, ledger chunks)",
Expand Down
17 changes: 16 additions & 1 deletion doc/operations/ledger_snapshot.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ Ledger files that still contain some uncommitted entries are named ``ledger_<sta

.. warning:: Removing `uncommitted` ledger files from the ``ledger.directory`` directory may cause a node to crash. It is however safe to move `committed` ledger files to another directory, accessible to a CCF node via the ``ledger.read_only_directories`` configuration entry.

.. note:: The ``files_cleanup.max_committed_ledger_chunks`` configuration entry can be used to limit the number of committed ledger chunk files retained in the main ledger directory. When the number of committed chunks exceeds this value, the oldest chunks (by sequence number) are automatically deleted, but only after verifying that an identical copy (by SHA-256 digest) exists in at least one ``ledger.read_only_directories`` entry. At least one read-only ledger directory must be configured when this option is set; the node will refuse to start otherwise. Committed ledger chunks that contain entries at or beyond the sequence number of the newest committed snapshot are never deleted, regardless of the retention limit - this guarantees that a complete ledger history exists from the newest snapshot onwards, which is required for disaster recovery. Ledger chunk cleanup runs as part of the same periodic cleanup cycle as snapshot cleanup (see :ref:`operations/ledger_snapshot:Periodic File Cleanup`).

It is important to note that while all entries stored in ledger files ending in ``.committed`` are committed, not all committed entries are stored in such a file at any given time. A number of them are typically in the in-progress files, waiting to be flushed to a ``.committed`` file once the size threshold (``ledger.chunk_size``) is met.

The listing below is an example of what a ledger directory may look like:
Expand Down Expand Up @@ -178,7 +180,7 @@ Committed snapshot files are named ``snapshot_<seqno>_<evidence_seqno>.committed

Uncommitted snapshot files, i.e. those whose evidence has not yet been committed, are named ``snapshot_<seqno>_<evidence_seqno>``. These files will be ignored by CCF when joining or recovering a service as no evidence can attest of their validity.

.. note:: The ``files_cleanup.max_snapshots`` configuration entry can be used to limit the number of committed snapshot files retained on disk. When the number of committed snapshots exceeds this value, the oldest snapshots (by sequence number) are automatically deleted. This is useful to control the local persistent storage footprint of a node. The value must be at least 1 if set.
.. note:: The ``files_cleanup.max_snapshots`` configuration entry can be used to limit the number of committed snapshot files retained on disk. When the number of committed snapshots exceeds this value, the oldest snapshots (by sequence number) are automatically deleted. This is useful to control the local persistent storage footprint of a node. The value must be at least 1 if set. Snapshot cleanup runs as part of the same periodic cleanup cycle as ledger chunk cleanup (see :ref:`operations/ledger_snapshot:Periodic File Cleanup`).

Join or Recover From Snapshot
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -301,3 +303,16 @@ Invariants
3. Snapshots are always generated for the ``seqno`` of a signature transaction (but not all signature transactions trigger the generation of snapshot).

4. When a snapshot is generated, it must coincide with the end of a ledger file. Since a node can join using solely a snapshot, the first ledger file on that node will start just after the ``seqno`` of the snapshot. By 2., all nodes must have the same ledger files, so the generation of that snapshot on the primary must trigger the creation of a new ledger file starting at the next ``seqno`` to ensure the primary's ledger files are consistent with the joining node's files.

Periodic File Cleanup
---------------------

Both snapshot and committed ledger chunk retention are managed by a single periodic cleanup cycle, controlled by the ``files_cleanup`` configuration section. The cleanup interval is set by ``files_cleanup.interval`` (default: ``30s``). On each cycle, the node checks whether committed snapshots or committed ledger chunks exceed their configured retention limits (``files_cleanup.max_snapshots`` and ``files_cleanup.max_committed_ledger_chunks`` respectively) and deletes the oldest files that qualify for removal.

Snapshots qualify for removal if their number is in excess of the limit, starting from the ones with the lowest sequence numbers.

Ledger chunks qualify for removal if their number is in excess of the limit, and if two other conditions apply. First, there must be at least one identical file in a read only ledger directory (contents are captured in a SHA-256 digest and compared). Second, as a safety measure, ledger chunks whose entries extend to or beyond the sequence number of the newest committed snapshot never qualify. This ensures that a complete ledger history is always available from the newest snapshot onwards, which is required for disaster recovery.

If no committed snapshots exist, no ledger chunks are protected by this rule, but the existing backup-verification requirement still applies.

Only one cleanup cycle can run at a time. If a cleanup task is still in progress when the next timer fires, the new cycle is skipped and a failure-level log message is emitted. This prevents overlapping cleanup operations, which could be wasteful, cause contention on the filesystem and produce spurious failures in the log. Under normal conditions each cleanup cycle completes well within the configured interval, so skipped cycles indicate that the interval may be too short or the node has an unusually large number of files to process.
1 change: 1 addition & 0 deletions include/ccf/node/startup_config.h
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ namespace ccf
struct FilesCleanup
{
std::optional<size_t> max_snapshots = std::nullopt;
std::optional<size_t> max_committed_ledger_chunks = std::nullopt;
ccf::ds::TimeString interval = {"30s"};

bool operator==(const FilesCleanup&) const = default;
Expand Down
5 changes: 4 additions & 1 deletion src/common/configuration.h
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,10 @@ namespace ccf
DECLARE_JSON_TYPE_WITH_OPTIONAL_FIELDS(CCFConfig::FilesCleanup);
DECLARE_JSON_REQUIRED_FIELDS(CCFConfig::FilesCleanup);
DECLARE_JSON_OPTIONAL_FIELDS(
CCFConfig::FilesCleanup, max_snapshots, interval);
CCFConfig::FilesCleanup,
max_snapshots,
max_committed_ledger_chunks,
interval);

DECLARE_JSON_TYPE_WITH_OPTIONAL_FIELDS(CCFConfig);
DECLARE_JSON_REQUIRED_FIELDS(CCFConfig, network);
Expand Down
Loading