work on database internals section

renetapopova · renetapopova · commit 4983600af616 · 2023-06-08T11:26:32.000+01:00
diff --git a/modules/ROOT/pages/database-internals/checkpointing.adoc b/modules/ROOT/pages/database-internals/checkpointing.adoc
@@ -1,21 +1,24 @@
 [[checkpointing-log-pruning]]
 = Checkpointing and log pruning
 
-Checkpointing refers to the procedure of transferring all pending updates of pages from the page cache to the storage files.
-This action is crucial to limit the number of transactions that need to be replayed during the recovery process, particularly in order to minimize the time required for recovery after an improper shutdown.
+Checkpointing is the process of flushing all pending updates of pages from the page cache to the storage files.
+This action is crucial to limit the number of transactions that need to be replayed during the recovery process, particularly to minimize the time required for recovery after an improper shutdown.
 
 Despite the presence of checkpoints, database operations remain secure, as any transactions that have not been confirmed to have their modifications persisted to storage will be replayed upon the next database startup.
-However, this assurance is contingent upon the availability of the collection of changes comprising these transactions, which is maintained in the transaction logs.
+However, this assurance is contingent upon the availability of the collection of changes comprising these transactions, which is maintained in the xref:database-internals/transaction-logs.adoc[transaction logs].
 
 Maintaining a long list of unapplied transactions (due to infrequent checkpoints) leads to the accumulation of transaction logs, as they are essential for recovery purposes.
-Checkpointing involves the inclusion of a special "Checkpointing" entry in the transaction log, marking the last transaction at which checkpointing occurred.
+Checkpointing involves the inclusion of a special _Checkpointing_ entry in the transaction log, marking the last transaction at which checkpointing occurred.
 This entry serves the purpose of identifying transaction logs that are no longer necessary, as all the transactions they contain have been securely stored in the storage files.
 
-The process of eliminating transaction logs that are no longer required for recovery is known as pruning. From the aforementioned explanation, it becomes evident that pruning is reliant on checkpointing.
-In other words, checkpointing determines which logs can be pruned and determines the occurrence of pruning, as the absence of a checkpoint implies that the set of transaction log files available for pruning cannot have changed.
+The process of eliminating transaction logs that are no longer required for recovery is known as _pruning_.
+Pruning is reliant on checkpointing.
+Checkpointing determines which logs can be pruned and determines the occurrence of pruning, as the absence of a checkpoint implies that the set of transaction log files available for pruning cannot have changed.
 Consequently, pruning is triggered whenever checkpointing takes place, with or without a specific verification of their existence.
 
-== Triggering of checkpointing (and pruning) events
+== Configure the checkpointing and pruning events
+
+Depending on your needs, you can  This is done periodically and is used to recover the database in case of a crash. The checkpoint settings control the frequency of checkpoints, and the amount of data that is written to disk in each checkpoint.
 
 The checkpointing policy, which is the driving event for pruning is configured by xref:configuration/configuration-settings.adoc#config_db.checkpoint[`db.checkpoint`] and can be triggered in a few different ways:
 
@@ -27,21 +30,22 @@ Note that no checkpointing is being performed implying no pruning happens.
 This is the default behavior and the only one available in Community Edition.
 
 * `CONTINUOUS` label:enterprise[Enterprise Edition]
-This policy constantly checks if a checkpoint is possible (i.e if any transactions committed since the last successful checkpoint) and if so, it performs it.
-* Pruning is triggered immediately after it completes, just like in the periodic policy.
+This policy constantly checks for transactions committed after the last successful checkpoint and when there are some, it performs the checkpointing.
+The log pruning is triggered immediately after the checkpointing completes, just like in the periodic policy.
+
+* `VOLUME` label:enterprise[Enterprise Edition]
 
 * `VOLUMETRIC` label:enterprise[Enterprise Edition]
-This checkpointing policy checks every 10 seconds if any logs are available for pruning and, if so, it triggers a checkpoint and subsequently, it prunes the logs.
+This policy checks every 10 seconds if there is enough volume of logs available for pruning and, if so, it triggers a checkpoint and subsequently, it prunes the logs.
+By default, the volume is set to 250MiB, but it can be configured using the setting xref:configuration/configuration-settings.adoc#config_db.checkpoint.tx_log.volume_threshold[`db.checkpoint.tx_log.volume_threshold`].
 This policy appears to invert the control between checkpointing and pruning, but in reality, it only changes the criteria for when checkpointing must happen.
-Instead of relying on a time trigger, as in the previous two, it relies on a pruning check.
-Pruning will still happen after checkpointing has occurred, as with the other two policies.
-Nevertheless, since the check depends on the existence of prunable transaction log files, this policy depends on pruning configuration.
+The pruning is still triggered by the checkpointing event.
 
 [[transaction-logging-log-pruning]]
 == Configure log pruning
 
 Transaction log pruning refers to the safe and automatic removal of old, unnecessary transaction log files.
-The transaction log can be pruned when o=ne or more files fall outside of the configured retention policy.
+The transaction log can be pruned when one or more files fall outside of the configured retention policy.
 
 Two things are necessary for a file to be removed:
 
@@ -70,7 +74,7 @@ The interval between checkpoints can be configured using:
 
 == Controlling transaction log pruning
 
-Transaction log pruning configuration primarily deals with specifing the number of transaction logs that should remain available. The primary reason for leaving more than the absolute minimum amount required for recovery comes from requirements of clustered deployments and online backup. Since database updates are communicated between cluster members and backup clients through the transaction logs, keeping more than the minimum amount necessary allows for transferring just the incremental changes (in the form of transactions) instead of the whole store files, which can lead to substantial savings in time and network bandwidth. This is true for HA deployments, backups and Read Replicas in Causal Clusters. However, in the case of Core members in Causal Clustering it is not the transaction logs that matter, but rather the Raft log contents. That scenario is covered in a separate KB article.
+Transaction log pruning configuration primarily deals with specifying the number of transaction logs that should remain available. The primary reason for leaving more than the absolute minimum amount required for recovery comes from the requirements of clustered deployments and online backup. Since database updates are communicated between cluster members and backup clients through the transaction logs, keeping more than the minimum amount necessary allows for transferring just the incremental changes (in the form of transactions) instead of the whole store files, which can lead to substantial savings in time and network bandwidth. This is true for HA deployments, backups and Read Replicas in Causal Clusters. However, in the case of Core members in Causal Clustering it is not the transaction logs that matter, but rather the Raft log contents. That scenario is covered in a separate KB article.
 
 The amount of transaction logs left after a pruning operation is controlled by the setting `dbms.tx_log.rotation.retention_policy` and it can take a variety of values. They are of the form `<numerical value> <measurement>`.