diff --git a/modules/ROOT/content-nav.adoc b/modules/ROOT/content-nav.adoc index af6a41404..5f08328a0 100644 --- a/modules/ROOT/content-nav.adoc +++ b/modules/ROOT/content-nav.adoc @@ -76,7 +76,6 @@ ** xref:configuration/set-initial-password.adoc[] ** xref:configuration/password-and-user-recovery.adoc[] ** xref:configuration/dynamic-settings.adoc[] -** xref:configuration/transaction-logs.adoc[] ** xref:configuration/configuration-settings.adoc[] *** xref:configuration/configuration-settings.adoc#_checkpoint_settings[Checkpoint settings] *** xref:configuration/configuration-settings.adoc#_cluster_settings[Cluster settings] @@ -108,6 +107,12 @@ *** xref:composite-databases/sharding-with-copy.adoc[] *** xref:composite-databases/queries.adoc[] +* xref:database-internals/index.adoc[] +** xref:database-internals/transaction-management.adoc[] +** xref:database-internals/locks-deadlocks.adoc[] +** xref:database-internals/transaction-logs.adoc[] +** xref:database-internals/checkpointing.adoc[] + * xref:clustering/index.adoc[] ** xref:clustering/introduction.adoc[] ** Set up a cluster @@ -161,7 +166,6 @@ ** xref:performance/gc-tuning.adoc[] ** xref:performance/bolt-thread-pool-configuration.adoc[] ** xref:performance/linux-file-system-tuning.adoc[] -** xref:performance/locks-deadlocks.adoc[] ** xref:performance/disks-ram-and-other-tips.adoc[] ** xref:performance/statistics-execution-plans.adoc[] ** xref:performance/space-reuse.adoc[] @@ -173,8 +177,6 @@ *** xref:monitoring/metrics/enable.adoc[] *** xref:monitoring/metrics/expose.adoc[] *** xref:monitoring/metrics/reference.adoc[] -** xref:monitoring/query-management.adoc[] -** xref:monitoring/transaction-management.adoc[] ** xref:monitoring/connection-management.adoc[] ** xref:monitoring/background-jobs.adoc[] // ** xref:monitoring/cluster/index.adoc[] diff --git a/modules/ROOT/pages/backup-restore/online-backup.adoc b/modules/ROOT/pages/backup-restore/online-backup.adoc index 4138f4be8..dcdf91002 100644 --- a/modules/ROOT/pages/backup-restore/online-backup.adoc +++ b/modules/ROOT/pages/backup-restore/online-backup.adoc @@ -227,7 +227,7 @@ For example, if your current database has a `Total mapped size` of `128GB` as pe === Computational resources configurations Transaction log files:: -The xref:configuration/transaction-logs.adoc[transaction log files], which keep track of recent changes, are rotated and pruned based on a provided configuration. +The xref:database-internals/transaction-logs.adoc[transaction log files], which keep track of recent changes, are rotated and pruned based on a provided configuration. For example, setting `db.tx_log.rotation.retention_policy=3` files keeps 3 transaction log files in the backup. Because recovered servers do not need all of the transaction log files that have already been applied, it is possible to further reduce storage size by reducing the size of the files to the bare minimum. This can be done by setting `db.tx_log.rotation.size=1M` and `db.tx_log.rotation.retention_policy=3` files. diff --git a/modules/ROOT/pages/configuration/configuration-settings.adoc b/modules/ROOT/pages/configuration/configuration-settings.adoc index eab340a0e..e9cb01fc5 100644 --- a/modules/ROOT/pages/configuration/configuration-settings.adoc +++ b/modules/ROOT/pages/configuration/configuration-settings.adoc @@ -4391,8 +4391,8 @@ m|+++neo4j+++ == Transaction settings -The transaction settings help you manage the transactions in your database, for example, the transaction timeout, the lock acquisition timeout, the maximum number of concurrently running transactions, etc. -For more information, see xref:monitoring/transaction-management.adoc[Manage transactions] and xref:performance/locks-deadlocks.adoc[Locks and deadlocks]. +The transaction settings helps you manage the transactions in your database, for example, the transaction timeout, the lock acquisition timeout, the maximum number of concurrently running transactions, etc. +For more information, see xref:/database-internals/transaction-management.adoc#_manage-transactions[Manage transactions] and xref:/database-internals/locks-deadlocks.adoc[Locks and deadlocks]. [[config_db.lock.acquisition.timeout]] === `db.lock.acquisition.timeout` diff --git a/modules/ROOT/pages/configuration/index.adoc b/modules/ROOT/pages/configuration/index.adoc index 897f778a9..3b13a96be 100644 --- a/modules/ROOT/pages/configuration/index.adoc +++ b/modules/ROOT/pages/configuration/index.adoc @@ -12,7 +12,6 @@ The topics described are: * xref:configuration/set-initial-password.adoc[Set initial password] -- How to set an initial password. * xref:configuration/password-and-user-recovery.adoc[Password and user recovery] -- How to recover after a lost admin password. * xref:configuration/dynamic-settings.adoc[Update dynamic settings] -- How to configure certain Neo4j parameters while Neo4j is running. -* xref:configuration/transaction-logs.adoc[Transaction logs] -- The transaction logs record all write operations in the database. * xref:configuration/configuration-settings.adoc[Configuration settings] -- A complete reference of all configuration settings. For a complete reference of Neo4j configuration settings, see xref:configuration/configuration-settings.adoc[All configuration settings]. diff --git a/modules/ROOT/pages/configuration/transaction-logs.adoc b/modules/ROOT/pages/configuration/transaction-logs.adoc deleted file mode 100644 index fd5e42a36..000000000 --- a/modules/ROOT/pages/configuration/transaction-logs.adoc +++ /dev/null @@ -1,234 +0,0 @@ -:description: The transaction logs record all write operations in the database. -[[transaction-logs]] -= Transaction log - -The transaction logs record all write operations in the database. -They are the "source of truth" in scenarios where the database needs to be recovered. -The transaction logs can be used to provide differential backups, as well as for cluster operations. -For any given configuration, at least the latest non-empty transaction log is kept. - -Each database keeps its own directory with _transaction logs_. -The root directory where the transaction log folders are located is configured by xref:configuration/configuration-settings.adoc#config_server.directories.transaction.logs.root[`server.directories.transaction.logs.root`]. - -[NOTE] -==== -The transaction log has nothing to do with log monitoring. -==== - -[[transaction-logging]] -== Transaction logging - -The transaction logs record all write operations in the database. -This includes additions or modifications to data, as well as the addition or modification of any indexes or constraints. - -* The transaction logs are the "source of truth" in scenarios where the database needs to be recovered. - -* The transaction logs are used for providing differential backups, as well as for cluster operations. - -* For any given configuration, at least the latest non-empty transaction log will be kept. - -An overview of configuration settings for transaction logging: - -[cols="3", options="header"] -|=== -| The _transaction log_ configuration -| Default value -| Description - -| xref:configuration/configuration-settings.adoc#config_server.directories.transaction.logs.root[`server.directories.transaction.logs.root`] -| `transactions` -| Root location where Neo4j will store transaction logs for configured databases. - -| xref:configuration/configuration-settings.adoc#config_db.tx_log.preallocate[`db.tx_log.preallocate`] -| `true` -| Specify if Neo4j should try to preallocate logical log file in advance. - -| xref:configuration/configuration-settings.adoc#config_db.tx_log.rotation.retention_policy[`db.tx_log.rotation.retention_policy`] -| `2 days` -a| -Make Neo4j keep the logical transaction logs for being able to back up the database. -Can be used for specifying the threshold to prune logical logs after. - -| xref:configuration/configuration-settings.adoc#config_db.tx_log.rotation.size[`db.tx_log.rotation.size`] -| `250M` -a| -Specifies at which file size the logical log will auto-rotate. -Minimum accepted value is `128K` (128 KiB). - -|=== - - -The retention and rotation policies for the Neo4j transaction logs, and how to configure them. - - -[[transaction-logging-log-location]] -== Log location - -By default, transaction logs for a database are located at _/data/transactions/_. -Each database keeps its own directory with transaction logs. - -The root directory where those folders are located is configured by xref:configuration/configuration-settings.adoc#config_server.directories.transaction.logs.root[`server.directories.transaction.logs.root`]. -For maximum performance, it is recommended to configure transaction logs to be stored on a dedicated device. - - -[[transaction-logging-log-rotation]] -== Log rotation - -Log rotation is configured using the parameter xref:configuration/configuration-settings.adoc#config_db.tx_log.rotation.size[`db.tx_log.rotation.size`]. -By default, log switches happen when log sizes surpass 250 MB. - - -[[transaction-logging-log-retention]] -== Log retention - -[WARNING] -==== -Manually deleting transaction log files is not supported. -==== - -You can control the number of transaction logs that Neo4j keeps using the parameter xref:configuration/configuration-settings.adoc#config_db.tx_log.rotation.retention_policy[`db.tx_log.rotation.retention_policy`]. -It is set to `2 days` by default, which means Neo4j keeps logical logs that contain any transaction committed within 2 days. -The configuration is dynamic, so if you need to update it, you do not have to restart Neo4j for the change to take effect. - -Other possible values are: - -* `true` or `keep_all` -- keep transaction logs indefinitely. -+ -[NOTE] -==== -This option is not recommended due to the effectively unbounded storage usage. -Old transaction logs cannot be safely archived or removed by external jobs since safe log pruning requires knowledge about the most recent successful checkpoint. -==== - -* `false` or `keep_none` -- keep only the most recent non-empty log. -+ -Log pruning is called only after checkpoint completion to ensure at least one checkpoint and points to a valid place in the transaction log data. -In reality, this means that all transaction logs created between checkpoints will be kept for some time, and only after a checkpoint, the pruning strategy will remove them. -For more details on how to speed up checkpointing, see xref:configuration/transaction-logs.adoc#transaction-logging-log-pruning[Log pruning]. -To force a checkpoint, run the procedure xref:reference/procedures.adoc#procedure_db_checkpoint[`call db.checkpoint()`]. -+ -[NOTE] -==== -This option is not recommended in production Enterprise Edition environments, as <> rely on the presence of the transaction logs since the last backup. -==== - -* ` ` where valid units are `k`, `M`, and `G`, and valid types are `files`, `size`, `txs`, `entries`, `hours`, and `days`. -+ -.Types that can be used to control log retention -[options="header",cols="<15,<60,<25"] -|============================================ - -| Type -| Description -| Example - -| files -| The number of the most recent logical log files to keep. -| "10 files" - -| size -| Max disk size to allow log files to occupy. -| "300M size" or "1G size". - -| txs -| The number of transactions to keep. -| "250k txs" or "5M txs". - -| hours -| Keep logs that contain any transaction committed within N hours from the current time. -| "10 hours" - -| days -| Keep logs that contain any transaction committed within N days from the current time. -| "50 days" - -|============================================ -+ -.Configure log retention policy -==== -This example shows some different ways to configure the log retention policy. - -* Keep transaction logs indefinitely: -+ -[source, properties, role="noheader"] ----- -db.tx_log.rotation.retention_policy=true ----- -+ -or -+ -[source, properties, role="noheader"] ----- -db.tx_log.rotation.retention_policy=keep_all ----- - -* Keep only the most recent non-empty log: -+ -[source, properties, role="noheader"] ----- -db.tx_log.rotation.retention_policy=false ----- -+ -or -+ -[source, properties, role="noheader"] ----- -db.tx_log.rotation.retention_policy=keep_none ----- - -* Keep logical logs that contain any transaction committed within 30 days: -+ -[source, properties, role="noheader"] ----- -db.tx_log.rotation.retention_policy=30 days ----- - -* Keep logical logs that contain any of the most recent 500 000 transactions: -+ -[source, properties, role="noheader"] ----- -db.tx_log.rotation.retention_policy=500k txs ----- -==== - - -[[transaction-logging-log-pruning]] -== Log pruning - -Transaction log pruning refers to the safe and automatic removal of old, unnecessary transaction log files. -The transaction log can be pruned when one or more files fall outside of the configured retention policy. - -Two things are necessary for a file to be removed: - -* The file must have been rotated. -* At least one checkpoint must have happened in a more recent log file. - -Observing that you have more transaction log files than you expected is likely due to checkpoints either not happening frequently enough, or taking too long. -This is a temporary condition and the gap between the expected and the observed number of log files will be closed on the next successful checkpoint. -The interval between checkpoints can be configured using: - -[cols="3", options="header"] -|=== -| Checkpoint configuration -| Default value -| Description - -| xref:configuration/configuration-settings.adoc#config_db.checkpoint.interval.time[`db.checkpoint.interval.time`] -| `15m` -| Configures the time interval between checkpoints. - -| xref:configuration/configuration-settings.adoc#config_db.checkpoint.interval.tx[`db.checkpoint.interval.tx`] -| `100000` -| Configures the transaction interval between checkpoints. -|=== - - -If your goal is to have the least amount of transaction log data, it can also help to speed up the checkpoint process itself. -The configuration parameter xref:configuration/configuration-settings.adoc#config_db.checkpoint.iops.limit[`db.checkpoint.iops.limit`] controls the number of IOs per second the checkpoint process is allowed to use. -Setting the value of this parameter to `-1` allows unlimited IOPS, which can speed up checkpointing. - -[NOTE] -==== -Disabling the IOPS limit can cause transaction processing to slow down a bit. -For more information, see xref:performance/disks-ram-and-other-tips.adoc#performance-checkpoint-iops-limit[Checkpoint IOPS limit]. -==== diff --git a/modules/ROOT/pages/database-internals/checkpointing.adoc b/modules/ROOT/pages/database-internals/checkpointing.adoc new file mode 100644 index 000000000..b51e56d5d --- /dev/null +++ b/modules/ROOT/pages/database-internals/checkpointing.adoc @@ -0,0 +1,155 @@ +[[checkpointing-log-pruning]] += Checkpointing and log pruning + +Checkpointing is the process of flushing all pending updates from volatile memory to non-volatile data storage. +This action is crucial to limit the number of transactions that need to be replayed during the recovery process, particularly to minimize the time required for recovery after an improper shutdown of the database or a crash. + +Independent of the presence of checkpoints, database operations remain secure, as any transactions that have not been confirmed to have their modifications persisted to storage will be replayed upon the next database startup. +However, this assurance is contingent upon the availability of the collection of changes comprising these transactions, which is maintained in the xref:database-internals/transaction-logs.adoc[transaction logs]. + +Maintaining a long list of unapplied transactions (due to infrequent checkpoints) leads to the accumulation of transaction logs, as they are essential for recovery purposes. +Checkpointing involves the inclusion of a special _Checkpointing_ entry in the transaction log, marking the last transaction at which checkpointing occurred. +This entry serves the purpose of identifying transaction logs that are no longer necessary, as all the transactions they contain have been securely stored in the storage files. + +The process of eliminating transaction logs that are no longer required for recovery is known as _pruning_. +Pruning is reliant on checkpointing. +Checkpointing determines which logs can be pruned and determines the occurrence of pruning, as the absence of a checkpoint implies that the set of transaction log files available for pruning cannot have changed. +Consequently, pruning is triggered whenever checkpointing occurs. + +[[checkpointing-policy]] +== Configure the checkpointing policy + +The checkpointing policy, which is the driving event for log pruning is configured by xref:configuration/configuration-settings.adoc#config_db.checkpoint[`db.checkpoint`]. +Depending on your needs, the checkpoint can run on a periodic basis, which is the default, when a certain amount of data has been written to the transaction log, or continuously. + +.Available checkpointing policies +[options="header", cols="1m,3a"] +|=== +|Policy +|Description + +|PERIODIC +|label:default[Default] +This policy checks every 10 minutes whether there are changes pending flushing and if so, it performs a checkpoint and subsequently triggers a log prune. +The periodic policy is specified by the `<>` and `<>` settings and the checkpointing is triggered when either of them is reached. +See <> for more details. + +|VOLUME +|This policy runs a checkpoint when the size of the transaction logs reaches the value specified by the `<>` setting. +By default, it is set to `250.00MiB`. + +|CONTINUOUS +|label:enterprise[Enterprise Edition] +This policy ignores `<>` and `<>` settings and runs the checkpoint process all the time. +The log pruning is triggered immediately after the checkpointing completes, just like in the periodic policy. + +|VOLUMETRIC +|label:enterprise[Enterprise Edition] +This policy checks every 10 seconds if there is enough volume of logs available for pruning and, if so, it triggers a checkpoint and subsequently, it prunes the logs. +By default, the volume is set to 256MiB, but it can be configured using the setting xref:configuration/configuration-settings.adoc#config_db.tx_log.rotation.retention_policy[`db.tx_log.rotation.retention_policy`] and xref:configuration/configuration-settings.adoc#config_db.tx_log.rotation.size[`db.tx_log.rotation.size`]. +For more information, see xref:database-internals/transaction-logs.adoc#transaction-logging-log-rotation[Configure transaction log rotation size]. +|=== + +[[checkpoint-interval]] +== Configure the checkpoint interval + +Observing that you have more transaction log files than you expected is likely due to checkpoints either not happening frequently enough, or taking too long. +This is a temporary condition and the gap between the expected and the observed number of log files will be closed on the next successful checkpoint. +The interval between checkpoints can be configured using: + +.Checkpoint interval configuration +[options="header", cols="2a,1a,3a"] +|=== +| Checkpoint configuration +| Default value +| Description + +| xref:configuration/configuration-settings.adoc#config_db.checkpoint.interval.time[`db.checkpoint.interval.time`] +| `15m` +| Configures the time interval between checkpoints. + +| xref:configuration/configuration-settings.adoc#config_db.checkpoint.interval.tx[`db.checkpoint.interval.tx`] +| `100000` +| Configures the transaction interval between checkpoints. +|=== + +[[control-log-pruning]] +== Control transaction log pruning + +Transaction log pruning refers to the safe and automatic removal of old, unnecessary transaction log files. +Two things are necessary for a file to be removed: + +* The file must have been rotated. +* At least one checkpoint must have happened in a more recent log file. + +Transaction log pruning configuration primarily deals with specifying the number of transaction logs that should remain available. +The primary reason for leaving more than the absolute minimum amount required for recovery comes from the requirements of clustered deployments and online backup. +Since database updates are communicated between cluster members and backup clients through the transaction logs, keeping more than the minimum amount necessary allows for transferring just the incremental changes (in the form of transactions) instead of the whole store files, which can lead to substantial savings in time and network bandwidth. + +The number of transaction logs left after a pruning operation is controlled by the setting `dbms.tx_log.rotation.retention_policy`. +The default value is `2 days`, which means that Neo4j keeps logical logs that contain any transaction committed within 2 days and prunes the ones that only contain transactions older than 2 days. +For more information, see xref:database-internals/transaction-logs.adoc#transaction-logging-log-retention[Configure transaction log retention policy]. + +Having the least amount of transaction log data speeds up the checkpoint process. +To configure the number of IOs per second the checkpoint process is allowed to use, use the configuration parameter xref:configuration/configuration-settings.adoc#config_db.checkpoint.iops.limit[`db.checkpoint.iops.limit`]. + +[NOTE] +==== +Disabling the IOPS limit can cause transaction processing to slow down a bit. +For more information, see xref:performance/disks-ram-and-other-tips.adoc#performance-checkpoint-iops-limit[Checkpoint IOPS limit] and xref:configuration/configuration-settings.adoc#_transaction_log_settings[Transaction log settings]. +==== + +[[checkpoint-logging-and-metrics]] +== Checkpoint logging and metrics + +The following details the expected messages to appear in the _logs\debug.log_ upon a checkpoint event: + +* Checkpoint based upon `db.checkpoint.interval.time`: ++ +.... +2023-05-28 12:55:05.174+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Scheduled checkpoint for time threshold" @ txId: 49 checkpoint started... +2023-05-28 12:55:05.253+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Scheduled checkpoint for time threshold" @ txId: 49 checkpoint completed in 79ms +.... + +* Checkpoint based upon `db.checkpoint.interval.tx`: ++ +.... +2023-05-28 13:08:51.603+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Scheduled checkpoint for tx count threshold" @ txId: 118 checkpoint started... +2023-05-28 13:08:51.669+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Scheduled checkpoint for tx count threshold" @ txId: 118 checkpoint completed in 66ms +.... + +* Checkpoint when `db.checkpoint=continuous`: ++ +.... +2023-05-28 13:17:21.927+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Scheduled checkpoint for continuous threshold" @ txId: 171 checkpoint started... +2023-05-28 13:17:21.941+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Scheduled checkpoint for continuous threshold" @ txId: 171 checkpoint completed in 13ms +.... + +* Checkpoint as a result of database shutdown: ++ +.... +2023-05-28 12:35:56.272+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Database shutdown" @ txId: 47 checkpoint started... +2023-05-28 12:35:56.306+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Database shutdown" @ txId: 47 checkpoint completed in 34ms +.... + +* Checkpoint as a result of `CALL db.checkpoint()`: ++ +.... +2023-05-28 12:31:56.463+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Call to db.checkpoint() procedure" @ txId: 47 checkpoint started... +2023-05-28 12:31:56.490+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Call to db.checkpoint() procedure" @ txId: 47 checkpoint completed in 27ms +.... + +* Checkpoint as a result of a backup run: ++ +.... +2023-05-28 12:33:30.489+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Full backup" @ txId: 47 checkpoint started... +2023-05-28 12:33:30.509+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by "Full backup" @ txId: 47 checkpoint completed in 20ms +.... + +https://neo4j.com/docs/operations-manual/current/monitoring/metrics/reference/#metrics-general-purpose[Checkpoint Metrics] are also available and are detailed in the following files, in the _metrics/_ directory: + +.... +neo4j.check_point.duration.csv +neo4j.check_point.total_time.csv +neo4j.check_point.events.csv +.... \ No newline at end of file diff --git a/modules/ROOT/pages/database-internals/index.adoc b/modules/ROOT/pages/database-internals/index.adoc new file mode 100644 index 000000000..8d26b59fe --- /dev/null +++ b/modules/ROOT/pages/database-internals/index.adoc @@ -0,0 +1,26 @@ += Database internals and transactional behavior +:description: Database internals and transactional behavior + +To maintain data integrity and ensure reliable transactional behavior, Neo4j DBMS supports transactions with full ACID properties, and it uses a write-ahead transaction log to ensure durability. + +* **Atomicity** -- If a part of a transaction fails, the database state is left unchanged. +* **Consistency** -- Every transaction leaves the database in a consistent state. +* **Isolation** -- During a transaction, modified data cannot be accessed by other operations. +* **Durability** -- The DBMS can always recover the results of a committed transaction. + +Neo4j DBMS supports the following transactional behavior: + +* All database operations that access the graph, indexes, or schema must be performed in a transaction. +* The default isolation level is _read-committed_ isolation level. +* Write locks are acquired automatically at the node and relationship levels. +However, you can also manually acquire write locks if you want to achieve a higher level of isolation -- _serializable_ isolation level. +* Data retrieved by traversals is not protected from modification by other transactions. +* Non-repeatable reads may occur (i.e., only write locks are acquired and held until the end of the transaction). +* Deadlock detection is built into the core transaction management. + +The following sections describe the transactional behavior in detail and how to control it: + +* xref:database-internals/transaction-management.adoc[] +* xref:database-internals/locks-deadlocks.adoc[] +* xref:database-internals/transaction-logs.adoc[] +* xref:database-internals/checkpointing.adoc[] \ No newline at end of file diff --git a/modules/ROOT/pages/database-internals/locks-deadlocks.adoc b/modules/ROOT/pages/database-internals/locks-deadlocks.adoc new file mode 100644 index 000000000..cc12cf362 --- /dev/null +++ b/modules/ROOT/pages/database-internals/locks-deadlocks.adoc @@ -0,0 +1,392 @@ += Locks and deadlocks +:description: This page discusses how locks are used in Neo4j, isolation levels, default locking behavior, deadlocks and strategies to avoid deadlocks, delete semantics, creating unique nodes, and transaction events. + +When a write transaction occurs, Neo4j takes locks to preserve data consistency while updating. + +== Locks + +Locks are taken automatically by the queries that users run. +They ensure that a node/relationship is locked to one particular transaction until that transaction is completed. +In other words, a lock on a node or a relationship by one transaction pauses other transactions to concurrently modify the same node or relationship. +As such, locks prevent concurrent modifications of shared resources between transactions. + +=== Isolation levels + +Locks are used in Neo4j to ensure data consistency and isolation levels. +They not only protect logical entities (such as nodes and relationships) but also the integrity of internal data structures. + +Neo4j supports the following isolation levels: + +_read-committed isolation level_:: label:default[] A transaction that reads a node/relationship does not block another transaction from writing to that node/relationship before the first transaction finishes. +This type of isolation is weaker than _serializable isolation level_ but offers significant performance advantages while being sufficient for the overwhelming majority of cases. + +_serializable isolation level_:: Explicit locking of nodes and relationships. +Using locks allows for simulating the effects of higher levels of isolation by obtaining and releasing locks explicitly. +For example, if a write lock is taken on a common node or relationship, then all transactions are serialized on that lock -- giving the effect of a _serializable isolation level_. +For more information on how to manually acquire write locks, see <>. + +[[transactions-isolation-lostupdates]] +=== Lost updates in Cypher + +In Cypher, it is possible to acquire write locks to simulate improved isolation in some cases. +Consider the case where multiple concurrent Cypher queries increment the value of a property. +Due to the limitations of the _read-committed isolation level_, the increments might not result in a deterministic final value. +If there is a direct dependency, Cypher automatically acquires a write lock before reading. +A direct dependency is when the right-hand side of `SET` has a dependent property read in the expression or the value of a key-value pair in a literal map. + +For example, if you run the following query by one hundred concurrent clients, it is very likely not to increment the property `n.prop` to 100, unless a write lock is acquired before reading the property value. +This is because all queries read the value of `n.prop` within their own transaction, and cannot see the incremented value from any other transaction that has not yet been committed. +In the worst-case scenario, the final value would be as low as 1 if all threads perform the read before any has committed their transaction. + +.Cypher can acquire a write lock +==== +The following example requires a write lock, and Cypher automatically acquires one: + +[source, cypher, role="noheader"] +---- +MATCH (n:Example {id: 42}) +SET n.prop = n.prop + 1 +---- +==== + +.Cypher can acquire a write lock +==== +This example also requires a write lock, and Cypher automatically acquires one: + +[source, cypher, role="noheader"] +---- +MATCH (n) +SET n += {prop: n.prop + 1} +---- +==== + +Due to the complexity of determining such a dependency in the general case, Cypher does not cover any of the following example cases: + +.Complex Cypher +==== +Variable depending on results from reading the property in an earlier statement: + +[source, cypher, role="noheader"] +---- +MATCH (n) +WITH n.prop AS p +// ... operations depending on p, producing k +SET n.prop = k + 1 +---- +==== + +.Complex Cypher +==== +Circular dependency between properties read and written in the same query: + +[source, cypher, role="noheader"] +---- +MATCH (n) +SET n += {propA: n.propB + 1, propB: n.propA + 1} +---- +==== + +To ensure deterministic behavior also in the more complex cases, it is necessary to explicitly acquire a write lock on the node in question. +In Cypher there is no explicit support for this, but it is possible to work around this limitation by writing to a temporary property. + +.Explicitly acquire a write lock +==== +This example acquires a write lock for the node by writing to a dummy property before reading the requested value: + +[source, cypher, role="noheader"] +---- +MATCH (n:Example {id: 42}) +SET n._LOCK_ = true +WITH n.prop AS p +// ... operations depending on p, producing k +SET n.prop = k + 1 +REMOVE n._LOCK_ +---- +==== + +The existence of the `+SET n._LOCK_+` statement before the read of the `n.prop` read ensures the lock is acquired before the read action, and no updates are lost due to enforced serialization of all concurrent queries on that specific node. + + +[[transactions-locking]] +=== Default locking behavior + +The locks are added to the transaction and released when the transaction finishes. +If the transaction is rolled back, the locks are released immediately. + +The following is the default locking behavior for different operations: + +* When adding, changing, or removing a property on a node or relationship, a write lock is taken on the specific node or relationship. +* When creating or deleting a node a write lock is taken for the specific node. +* When creating or deleting a relationship a write lock is taken on the specific relationship and both its nodes. + +To view all active locks held by the transaction executing a query with the `queryId`, use the `CALL dbms.listActiveLocks(queryId)` procedure. +You need to be an administrator to be able to run this procedure. + +.Procedure output +[options="header", cols="1m,1m,2"] +|=== +| Name | Type | Description +| mode | String | Lock mode corresponding to the transaction. +| resourceType | String | Resource type of the locked resource. +| resourceId | Integer | Resource ID of the locked resource. +|=== + +.Viewing active locks for a query +==== + +The following example shows the active locks held by the transaction executing a given query. + +. To get the IDs of the currently executing queries, yield the `currentQueryId` from the `SHOW TRANSACTIONS` command: ++ +[source, cypher, role=nocopy noplay] +---- +SHOW TRANSACTIONS YIELD currentQueryId, currentQuery +---- + +. Run `CALL dbms.listActiveLocks` passing the `currentQueryId` of interest (`query-614` in this example): ++ +[source, cypher, role=nocopy noplay] +---- +CALL dbms.listActiveLocks( "query-614" ) +---- + +[queryresult] +---- +╒════════╤══════════════╤════════════╕ +│"mode" │"resourceType"│"resourceId"│ +╞════════╪══════════════╪════════════╡ +│"SHARED"│"SCHEMA" │0 │ +└────────┴──────────────┴────────────┘ +1 row +---- + +==== + +[[lock-contention]] +=== Lock contention + +Lock contention may arise if an application needs to perform concurrent updates on the same nodes/relationships. +In such a scenario, to be completed, transactions must wait for locks held by other transactions to be released. +If two or more transactions attempt to modify the same data concurrently, it will increase the likelihood of a <>. +In larger graphs, it is less likely that two transactions modify the same data concurrently, and so the likelihood of a deadlock is reduced. +That said, even in large graphs, a deadlock can occur if two or more transactions are attempting to modify the same data concurrently. + +=== Types of acquired locks + +The following table shows the type of lock acquired depending on the graph modification: + +.Obtained locks for graph modifications +[cols="1,3a"] +|=== +| Modification | Acquired lock + +| Creating a node | No lock +| Updating a node label |`NODE` lock +| Updating a node property | `NODE` lock +| Deleting a node | `NODE` lock +| Creating a relationship* | If the node is sparse: `NODE` lock. + +If a node is dense: `NODE DELETE` prevention lock. +| Updating a relationship property | `RELATIONSHIP` lock +| Deleting a relationship* | If the node is sparse: `NODE` lock. + +If a node is dense: `NODE DELETE` prevention lock. + +`RELATIONSHIP` lock for both sparse and dense nodes. +|=== +*_Applies for both source nodes and target nodes._ + +Additional locks are often taken to maintain indexes and other internal structures depending on how other data in the graph is affected by a transaction. +For these additional locks, no assumptions or guarantees can be made concerning which lock will or will not be taken. + +=== Locks for dense nodes + +A node is considered dense if it at any point has had 50 or more relationships (i.e. it will still be considered dense even if it comes to have less than 50 relationships at any point in the future). +A node is considered sparse if it has never had more than 50 relationships. +You can configure the relationship count threshold for when a node is considered dense by setting xref:configuration/configuration-settings.adoc#config_db.relationship_grouping_threshold[`db.relationship_grouping_threshold`] configuration parameter. + +When creating or deleting relationships in Neo4j, dense nodes are not exclusively locked during a transaction. +Rather, internally shared locks prevent the deletion of nodes, and shared degree locks are acquired for synchronizing with concurrent label changes for those nodes to ensure correct count updates. + +At commit time, relationships are inserted into their relationship chains at places that are currently uncontested (i.e. not currently modified by another transaction), and the surrounding relationships are exclusively locked. + +In other words, relationship modifications acquire coarse-grained shared node locks when doing the operation in the transaction, and then acquire precise exclusive relationship locks during commit. + +The locking is very similar for sparse and dense nodes. +The biggest contention for sparse nodes is the update of the degree (i.e. number of relationships) for the node. +Dense nodes store this data in a concurrent data structure, and so can avoid exclusive node locks in almost all cases for relationship modifications. + +[[transaction-management-lock-acquisition-timeout]] +=== Configure lock acquisition timeout + +An executing transaction may get stuck while waiting for some lock to be released by another transaction. +To kill that transaction and remove the lock, set xref:configuration/configuration-settings.adoc#config_db.lock.acquisition.timeout[`db.lock.acquisition.timeout`] to some positive time interval value (e.g., `10s`) denoting the maximum time interval within which any particular lock should be acquired, before failing the transaction. +Setting `db.lock.acquisition.timeout` to `0` -- which is the default value -- disables the lock acquisition timeout. + +This feature cannot be set dynamically. + +.Configure lock acquisition timeout +==== +Set the timeout to ten seconds. +[source, parameters] +---- +db.lock.acquisition.timeout=10s +---- +==== + +[[deadlocks]] +== Deadlocks + +Since locks are used, deadlocks can happen. +A deadlock occurs when two transactions are blocked by each other because they are attempting to concurrently modify a node or a relationship that is locked by the other transaction. +In such a scenario, neither of the transactions will be able to proceed. +When Neo4j detects a deadlock, the transaction is terminated with the transient error message code `Neo.TransientError.Transaction.DeadlockDetected`. + +All locks acquired by the transaction are still held but will be released when the transaction finishes. +Once the locks are released, other transactions that were waiting for locks held by the transaction causing the deadlock can proceed. +You can then retry the work performed by the transaction causing the deadlock if needed. + +Experiencing frequent deadlocks is an indication of concurrent write requests happening in such a way that it is not possible to execute them while at the same time living up to the intended isolation and consistency. +The solution is to make sure concurrent updates happen reasonably. +For example, given two specific nodes (A and B), adding or deleting relationships to both these nodes in random order for each transaction results in deadlocks when two or more transactions do that concurrently. +One option is to make sure that updates always happen in the same order (first A then B). +Another option is to make sure that each thread/transaction does not have any conflicting writes to a node or relationship as some other concurrent transaction. +This can, for example, be achieved by letting a single thread do all updates of a specific type. + +[IMPORTANT] +==== +Deadlocks caused by the use of other synchronization than the locks managed by Neo4j can still happen. +Other code that requires synchronization should be synchronized in such a way that it never performs any Neo4j operation in the synchronized block. +==== + +=== Deadlock detection + +For example, running the following two queries in https://neo4j.com/docs/operations-manual/current/tools/cypher-shell/[Cypher-shell] at the same time will result in a deadlock because they are attempting to modify the same node properties concurrently: + +.Transaction A +[source, cypher, indent=0, role=nocopy noplay] +---- +:begin +MATCH (n:Test) SET n.prop = 1 +WITH collect(n) as nodes +CALL apoc.util.sleep(5000) +MATCH (m:Test2) SET m.prop = 1; +---- + +.Transaction B +[source, cypher, indent=0, role=nocopy noplay] +---- +:begin +MATCH (n:Test2) SET n.prop = 1 +WITH collect(n) as nodes +CALL apoc.util.sleep(5000) +MATCH (m:Test) SET m.prop = 1; +---- + +The following error message is thrown: + +[source, output, role="noheader", indent=0] +---- +The transaction will be rolled back and terminated. Error: ForsetiClient[transactionId=6698, clientId=1] can't acquire ExclusiveLock{owner=ForsetiClient[transactionId=6697, clientId=3]} on NODE(27), because holders of that lock are waiting for ForsetiClient[transactionId=6698, clientId=1]. + Wait list:ExclusiveLock[ +Client[6697] waits for [ForsetiClient[transactionId=6698, clientId=1]]] +---- + +[NOTE] +==== +The Cypher clause `MERGE` takes locks out of order to ensure the uniqueness of the data, and this may prevent Neo4j's internal sorting operations from ordering transactions in a way that avoids deadlocks. +When possible, you are, therefore, encouraged to use the Cypher clause `CREATE` instead, which does not take locks out of order. +==== + +[[transactions-deadlocks-code]] +=== Deadlock handling in code + +When dealing with deadlocks in code, there are several issues you may want to address: + +* Only do a limited amount of retries, and fail if a threshold is reached. +* Pause between each attempt to allow the other transaction to finish before trying again. +* A retry loop can be useful not only for deadlocks but for other types of transient errors as well. + +For an example of how deadlocks can be handled in procedures, server extensions, or when using Neo4j embedded, see link:{neo4j-docs-base-uri}/java-reference/{page-version}/transaction-management/[Transaction management in the Neo4j Java Reference]. + +=== Avoiding deadlocks + +Most likely, a deadlock will be resolved by retrying the transaction. +This will, however, negatively impact the total transactional throughput of the database, so it is useful to know about strategies to avoid deadlocks. + +Neo4j assists transactions by internally sorting operations. +See below for more information about internal locks). +However, this internal sorting only applies to the locks taken when creating or deleting relationships. +Users are, therefore, encouraged to sort their operations in cases where Neo4j does not internally assist, such as when locks are taken for property updates. +This is done by ensuring that updates occur in the same order. +For example, if the three locks `A`, `B`, and `C` are always taken in the same order (e.g. `A->B->C`), then a transaction will never hold lock `B` while waiting for lock `A` to be released, and so a deadlock will not occur. + +Another option is to avoid lock contention by not modifying the same entities concurrently. + +To avoid deadlocks, internal locks should be taken in the following order: + +[WARNING] +==== +The internal lock types may change without any notification between different Neo4j versions. +The lock types are only listed here to give an idea of the internal locking mechanism. +==== + +[cols="2,1,3a"] +|=== +| Lock type | Locked entity | Description + +| `LABEL` or `RELATIONSHIP_TYPE` +| Token id +| Schema locks, which lock indexes and constraints on the particular label or relationship type. + +| `SCHEMA_NAME` +| Schema name +| Lock a schema name to avoid duplicates. +[NOTE] +Collisions are possible because the hash is stringed. +This only affects concurrency and not correctness. + +| `NODE_RELATIONSHIP_GROUP_DELETE` +| Node id +| Lock taken on a node during the transaction creation phase to prevent deletion of that node and/or relationship group. +This is different from the `NODE` lock in order to allow concurrent label and property changes together with relationship modifications. + +| `NODE` +| Node id +| Lock on a node, used to prevent concurrent updates to the node records (i.e. add/remove label, set property, add/remove relationship). +Note that updating relationships will only require a lock on the node if the head of the relationship chain/relationship group chain must be updated since that is the only data part of the node record. + +| `DEGREES` +| Node id +| Used to lock nodes to avoid concurrent label changes when a relationship is added or deleted. +Such an update would otherwise lead to an inconsistent count store. + +| `RELATIONSHIP_DELETE` +| Relationship id +| Lock a relationship for exclusive access during deletion. + +| `RELATIONSHIP_GROUP` +| Node id +| Lock the full relationship group chain for a given dense node. +This will not lock the node, in contrast to the lock `NODE_RELATIONSHIP_GROUP_DELETE`. + +| `RELATIONSHIP` +| Relationship +| Lock on a relationship, or more specifically a relationship record, to prevent concurrent updates. +|=== + +[[transactions-delete]] +== Delete semantics + +When deleting a node or a relationship, all properties for that entity will be automatically removed but the relationships of a node will not be removed. +Neo4j enforces a constraint (upon commit) that all relationships must have a valid start node and end node. +In effect, this means that trying to delete a node that still has relationships attached to it will throw an exception upon commit. +It is, however, possible to choose in which order to delete the node and the attached relationships as long as no relationships exist when the transaction is committed. + +The delete semantics can be summarized as follows: + +* All properties of a node or relationship will be removed when it is deleted. +* A deleted node cannot have any attached relationships when the transaction commits. +* It is possible to acquire a reference to a deleted relationship or node that has not yet been committed. +* Any write operation on a node or relationship after it has been deleted (but not yet committed) will throw an exception. +* Trying to acquire a new or work with an old reference to a deleted node or relationship after commit, will throw an exception. diff --git a/modules/ROOT/pages/database-internals/transaction-logs.adoc b/modules/ROOT/pages/database-internals/transaction-logs.adoc new file mode 100644 index 000000000..493edcfb8 --- /dev/null +++ b/modules/ROOT/pages/database-internals/transaction-logs.adoc @@ -0,0 +1,120 @@ +[[transaction-logging]] += Transaction logging + +:description: Transaction logs, checkpointing, and log pruning. The retention and rotation policies for the Neo4j transaction logs, and how to configure them. + +Neo4j keeps track of all write operations to each database to ensure data consistency and enable recovery. + +[[transaction-log-files]] +== Transaction log files + +A transaction log file contains a sequence of records with all changes made to a particular database as part of each transaction, including data, indexes, and constraints. + +The transaction log serves multiple purposes, including providing differential backups and supporting cluster operations. At a minimum, the most recent non-empty transaction log is retained for any given configuration. +It is important to note that transaction logs are unrelated to log monitoring. + +The transaction logging configuration is set per database and can be configured using the following configuration settings: + +[[transaction-logging-log-location]] +== Configure transaction log location + +By default, transaction logs for a database are located at _/data/transactions/_. + +The root directory where those folders are located is configured by xref:configuration/configuration-settings.adoc#config_server.directories.transaction.logs.root[`server.directories.transaction.logs.root`]. +The value is a path. +If relative, it is resolved from `server.directories.data`. +For maximum performance, it is recommended to configure transaction logs to be stored on a dedicated device. + +[[transaction-logging-log-preallocation]] +== Configure transaction log preallocation + +You can specify if Neo4j should try to preallocate logical log files in advance using the parameter xref:configuration/configuration-settings.adoc#config_db.tx_log.preallocate[`db.tx_log.preallocate`]. +By default, it is `true`. +Log preallocation optimizes the filesystem by ensuring there is room to accommodate newly generated files and avoid file-level fragmentation. +This configuration setting is dynamic and can be changed at runtime. + +[[transaction-logging-log-rotation]] +== Configure transaction log rotation size + +You can specify how much space a single transaction log file can roughly occupy using xref:configuration/configuration-settings.adoc#config_db.tx_log.rotation.size[`db.tx_log.rotation.size`]. +By default, it is set to `256 MiB`, which means that after a transaction log file reaches this size, it is rotated and a new one is created. +The minimum accepted value is `128K` (128 KiB). +This configuration setting is dynamic and can be changed at runtime. + +This setting influences how much space can be reclaimed by all checkpoint strategies under the following: + +To reclaim a given file, the newest checkpoint for the transaction log must exist in another file. +So if you have a huge transaction log, then it is likely that your latest checkpoint is in the same file, making it impossible to reclaim said file. +For information about checkpointing, see xref:database-internals/checkpointing.adoc#control-log-pruning[Control transaction log pruning]. + + +[[transaction-logging-log-retention]] +== Configure transaction log retention policy + +[WARNING] +==== +Manually deleting transaction log files is not supported. +==== + +You can control the number of transaction logs that Neo4j keeps to back up the database using the parameter xref:configuration/configuration-settings.adoc#config_db.tx_log.rotation.retention_policy[`db.tx_log.rotation.retention_policy`]. +This configuration setting is dynamic and can be changed at runtime. + +By default, it is set to `2 days`, which means Neo4j keeps logical logs that contain any transaction committed within 2 days and prunes the ones that only contain transactions older than 2 days. + +Other possible ways to configure the log retention policy are: + +* `db.tx_log.rotation.retention_policy=true|keep_all` -- keep transaction logs indefinitely. ++ +[NOTE] +==== +This option is not recommended due to the effectively unbounded storage usage. +Old transaction logs cannot be safely archived or removed by external jobs since safe log pruning requires knowledge about the most recent successful checkpoint. +==== + +* `db.tx_log.rotation.retention_policy=false|keep_none` -- keep only the most recent non-empty log. ++ +Log pruning is called only after checkpoint completion to ensure at least one checkpoint and points to a valid place in the transaction log data. +In reality, this means that all transaction logs created between checkpoints are kept for some time, and only after a checkpoint, the pruning strategy removes them. +For more details on how to speed up checkpointing, see xref:database-internals/checkpointing.adoc#transaction-logging-log-pruning[Configure log pruning]. +To force a checkpoint, run the procedure xref:reference/procedures.adoc#procedure_db_checkpoint[`CALL db.checkpoint()`]. ++ +[NOTE] +==== +This option is not recommended in production Enterprise Edition environments, as <> rely on the presence of the transaction logs since the last backup. +==== + +* ` ` where valid units are `k`, `M`, and `G`, and valid types are `files`, `size`, `txs`, `entries`, `hours`, and `days`. ++ +.Types that can be used to control log retention +[options="header",cols="1m,3a,2m"] +|=== + +| Type +| Description +| Example + +| files +| The number of the most recent transaction log files to keep after pruning. +| db.tx_log.rotation.retention_policy=10 files + +| size +| The max disk size of the transaction log files to keep after pruning. +For example, `500M size` leaves at least 500M worth of files behind. +| db.tx_log.rotation.retention_policy=300M size + +| txs or entries +| The number of transactions (in the files) to keep after pruning, regardless of file count or size. +`txs` and `entries` are synonymous. +If set, the policy keeps the 500k latest transactions from each database and prunes any older transactions. +| db.tx_log.rotation.retention_policy=500k txs + + +| hours +| Keep logs that contain any transaction committed within the specified number of hours from the current time. +The value of `10 hours` ensures that at least 10 hours' worth of transactions is present in the logs. +m| db.tx_log.rotation.retention_policy=10 hours + +| days +| Keep logs that contain any transaction committed within the specified number of days from the current time. +m| db.tx_log.rotation.retention_policy=30 days +|=== diff --git a/modules/ROOT/pages/database-internals/transaction-management.adoc b/modules/ROOT/pages/database-internals/transaction-management.adoc new file mode 100644 index 000000000..15428fcf8 --- /dev/null +++ b/modules/ROOT/pages/database-internals/transaction-management.adoc @@ -0,0 +1,65 @@ +[[transaction-management]] += Transaction management + +== Transactions + +Database operations that access the graph, indexes, or schema are performed in a transaction to ensure the ACID properties. +Transactions are single-threaded, confined, and independent. +Multiple transactions can be started in a single thread and they are independent of each other. + +The interaction cycle of working with transactions follows the steps: + +. Begin a transaction. +. Perform database operations. +. Commit or roll back the transaction. + +It is crucial to finish each transaction because the xref:/database-internals/locks-deadlocks.adoc#_locks[locks] or memory acquired by a transaction are only released upon completion. +All non-committed transactions are rolled back as part of resource cleanup at the end of the statement. +No resource cleanup is required for a transaction that is explicitly committed or rolled back, and the transaction closure is an empty operation. + +[NOTE] +==== +All modifications performed in a transaction are kept in memory. +This means that very large updates must be split into several transactions to avoid running out of memory. +==== + +== Configure transactional behavior + +The transaction settings help you manage the transactions in your database, for example, the transaction timeout, the maximum number of concurrently running transactions, how much time to allow Neo4j to wait for running transactions to complete before allowing initiated database shutdown to continue, and so on. +For all available settings, see xref:/configuration/configuration-settings.adoc#_transaction_settings[Transaction settings]. + +=== Configure the maximum number of concurrently running transactions + +By default, Neo4j can run a maximum of 1000 concurrent transactions. +To change this value, use the xref:configuration/configuration-settings.adoc#config_db.transaction.concurrent.maximum[`db.transaction.concurrent.maximum`] setting. +If set to `0`, the limit is disabled. + +[[transaction-management-transaction-timeout]] +=== Configure transaction timeout + +It is recommended to configure Neo4j to terminate transactions whose execution time has exceeded the configured timeout. + +* Set `xref:configuration/configuration-settings.adoc#config_db.transaction.timeout[db.transaction.timeout]` to some positive time interval value (e.g.,`10s`) denoting the default transaction timeout. +Setting `db.transaction.timeout` to `0` -- which is the default value -- disables the feature. + +* You can also set this dynamically on each primary server using the procedure `dbms.setConfigValue('db.transaction.timeout','10s')`. + +.Configure transaction timeout +==== +Set the timeout to ten seconds. +[source, parameters] +---- +db.transaction.timeout=10s +---- +==== + +Configuring transaction timeout does not affect transactions executed with custom timeouts (e.g., via the Java API or Neo4j Drivers), as the custom timeout overrides the value set for `db.transaction.timeout`. +Note that the timeout value can only be overridden to a value that is smaller than that configured by `db.transaction.timeout`. + + +== Manage transactions + +Transactions can be managed using the Cypher commands `SHOW TRANSACTIONS` and `TERMINATE TRANSACTIONS`. +The `TERMINATE TRANSACTIONS` command can be combined with multiple `SHOW TRANSACTIONS` and `TERMINATE TRANSACTIONS` commands in the same query. + +For more information, see link:{neo4j-docs-base-uri}/cypher-manual/{page-version}/clauses/transaction-clauses/[Cypher manual -> Transaction commands]. \ No newline at end of file diff --git a/modules/ROOT/pages/monitoring/index.adoc b/modules/ROOT/pages/monitoring/index.adoc index 10c059523..c94f5840e 100644 --- a/modules/ROOT/pages/monitoring/index.adoc +++ b/modules/ROOT/pages/monitoring/index.adoc @@ -1,6 +1,6 @@ [[monitoring]] = Monitoring -:description: This chapter describes the tools that are available for monitoring Neo4j. +:description: This chapter describes the tools that are available for monitoring Neo4j. Neo4j provides mechanisms for continuous analysis through the output of metrics as well as the inspection and management of currently-executing queries. @@ -18,14 +18,6 @@ This chapter describes the following: ** xref:monitoring/metrics/enable.adoc[Enable metrics logging] ** xref:monitoring/metrics/expose.adoc[Connect monitoring tools] ** xref:monitoring/metrics/reference.adoc[Metrics reference] -* xref:monitoring/query-management.adoc[Manage queries] -** xref:monitoring/query-management.adoc#query-management-list-queries[List all running queries] -** xref:monitoring/query-management.adoc#query-management-list-active-locks[List all active locks for a query] -** xref:monitoring/query-management.adoc#query-management-terminate-queries[Terminate queries] -* xref:monitoring/transaction-management.adoc[Manage transactions] -** xref:monitoring/transaction-management.adoc#transaction-management-transaction-timeout[Configure transaction timeout] -** xref:monitoring/transaction-management.adoc#transaction-management-lock-acquisition-timeout[Configure lock acquisition timeout] -** xref:monitoring/transaction-management.adoc#transaction-management-list-transactions[List all running transactions] * xref:monitoring/connection-management.adoc[Manage connections] ** xref:monitoring/connection-management.adoc#connection-management-list-connections[List all network connections] ** xref:monitoring/connection-management.adoc#connection-management-terminate-multiple-connections[Terminate multiple network connections] diff --git a/modules/ROOT/pages/monitoring/metrics/essential.adoc b/modules/ROOT/pages/monitoring/metrics/essential.adoc index 370c277a7..06712d8ec 100644 --- a/modules/ROOT/pages/monitoring/metrics/essential.adoc +++ b/modules/ROOT/pages/monitoring/metrics/essential.adoc @@ -82,7 +82,7 @@ If this happens, consider the following steps to improve checkpointing performan * Raise the xref:configuration/configuration-settings.adoc#config_db.checkpoint.iops.limit[`db.checkpoint.iops.limit`] to make checkpoints faster, but only if there is enough IOPS budget available to avoid slowing the commit process. * xref:configuration/configuration-settings.adoc#config_server.memory.pagecache.flush.buffer.enabled[`server.memory.pagecache.flush.buffer.enabled`] / xref:configuration/configuration-settings.adoc#config_server.memory.pagecache.flush.buffer.size_in_pages[`server.memory.pagecache.flush.buffer.size_in_pages`] make checkpoints faster by writing batches of data in a way that plays well with the underlying disk (with a nice multiple to the block size). * Change the checkpointing policy (xref:configuration/configuration-settings.adoc#config_db.checkpoint[`db.checkpoint.*`], xref:configuration/configuration-settings.adoc#config_db.checkpoint.interval.time[`db.checkpoint.interval.time`]) to more frequent/smaller checkpoints, continuous checkpointing, or checkpoint by volume or `tx` count. -For more information, see xref:performance/disks-ram-and-other-tips.adoc#performance-checkpoint-iops-limit[Checkpoint IOPS limit] and xref:configuration/transaction-logs.adoc#transaction-logging-log-pruning[Log pruning]. +For more information, see xref:performance/disks-ram-and-other-tips.adoc#performance-checkpoint-iops-limit[Checkpoint IOPS limit] and xref:database-internals/checkpointing.adoc[Checkpointing and log pruning]. |=== == Neo4j cluster health metrics diff --git a/modules/ROOT/pages/monitoring/query-management.adoc b/modules/ROOT/pages/monitoring/query-management.adoc deleted file mode 100644 index 7e2746a0b..000000000 --- a/modules/ROOT/pages/monitoring/query-management.adoc +++ /dev/null @@ -1,88 +0,0 @@ -:description: This section describes facilities for query management. - -[[query-management]] -= Manage queries - -[[query-management-list-queries]] -== List all running queries - -The procedure for listing queries, `dbms.listQueries()`, is replaced by the command for listing transactions, `SHOW TRANSACTIONS`. -This command returns information about the currently executing query in the transaction. -For more information on the command, see the link:{neo4j-docs-base-uri}/cypher-manual/{page-version}/clauses/transaction-clauses#query-listing-transactions[Cypher manual -> `SHOW TRANSACTIONS` command]. - - -[[query-management-list-active-locks]] -== List all active locks for a query - -An xref:authentication-authorization/terminology.adoc#term-administrator[administrator] is able to view all active locks held by the transaction executing the query with the `queryId`. - -*Syntax:* - -`CALL dbms.listActiveLocks(queryId)` - -*Returns:* - -[options="header"] -|=== -| Name | Type | Description -| `mode` | String | Lock mode corresponding to the transaction. -| `resourceType` | String | Resource type of the locked resource -| `resourceId` | Integer | Resource id of the locked resource . -|=== - -.Viewing active locks for a query -==== - -The following example demonstrates how to show the active locks held by the transaction executing a given query. - -Firstly, to get the IDs of the currently executing queries, yield the `currentQueryId` from the `SHOW TRANSACTIONS` command: - -[source, cypher] ----- -SHOW TRANSACTIONS YIELD currentQueryId, currentQuery ----- - -Then, call `dbms.listActiveLocks` passing the `currentQueryId` of interest (`query-614` in this example): - -[source, cypher] ----- -CALL dbms.listActiveLocks( "query-614" ) ----- - -[queryresult] ----- -╒════════╤══════════════╤════════════╕ -│"mode" │"resourceType"│"resourceId"│ -╞════════╪══════════════╪════════════╡ -│"SHARED"│"SCHEMA" │0 │ -└────────┴──────────────┴────────────┘ -1 row ----- - -==== - - -[[query-management-terminate-queries]] -== Terminate queries - -Queries are terminated by terminating the transaction on which they are running. This is done using the `TERMINATE TRANSACTIONS transactionIds` command. -The `transactionIds` can be found using the link:{neo4j-docs-base-uri}/cypher-manual/{page-version}/clauses/transaction-clauses#query-listing-transactions[`SHOW TRANSACTIONS` command]. - -The link:{neo4j-docs-base-uri}/cypher-manual/{page-version}/administration/access-control/database-administration#access-control-database-administration-transaction[`TERMINATE TRANSACTION` privilege] determines what transactions can be terminated. -However, the xref:authentication-authorization/terminology.adoc#term-current-user[current user] can always terminate all of their own transactions. - -*Syntax:* - -`TERMINATE TRANSACTIONS transactionIds` - -*Argument:* - -[options="header"] -|=== -| Name | Type | Description -| `transactionIds` | Comma-separated strings | The IDs of all the transactions to be terminated. -| `transactionIds` | Single string parameter | The ID of the transaction to be terminated. -| `transactionIds` | List parameter | The IDs of all the transactions to be terminated. -|=== - -For more information on the command, see the link:{neo4j-docs-base-uri}/cypher-manual/{page-version}/clauses/transaction-clauses#query-terminate-transactions[Cypher manual -> `TERMINATE TRANSACTIONS` command]. diff --git a/modules/ROOT/pages/monitoring/transaction-management.adoc b/modules/ROOT/pages/monitoring/transaction-management.adoc deleted file mode 100644 index f83cb7607..000000000 --- a/modules/ROOT/pages/monitoring/transaction-management.adoc +++ /dev/null @@ -1,61 +0,0 @@ -:description: This section describes facilities for transaction management. -[[transaction-management]] -= Manage transactions - -[[transaction-management-transaction-timeout]] -== Configure transaction timeout - -It is recommended to configure Neo4j to terminate transactions whose execution time has exceeded the configured timeout. - -* Set `xref:configuration/configuration-settings.adoc#config_db.transaction.timeout[db.transaction.timeout]` to some positive time interval value (e.g.,`10s`) denoting the default transaction timeout. -Setting `db.transaction.timeout` to `0` -- which is the default value -- disables the feature. - -* You can also set this dynamically on each instance (Read Replicas only if required) using the procedure `dbms.setConfigValue('db.transaction.timeout','10s')`. - -.Configure transaction timeout -==== -Set the timeout to ten seconds. -[source, parameters] ----- -db.transaction.timeout=10s ----- -==== - -Configuring transaction timeout has no effect on transactions executed with custom timeouts (e.g., via the Java API or Neo4j Drivers), as the custom timeout overrides the value set for `db.transaction.timeout`. -Note that the timeout value can only be overridden to a value that is smaller than that configured by `db.transaction.timeout`. - -The _transaction timeout_ feature is also known as the _transaction guard_. - - -[[transaction-management-lock-acquisition-timeout]] -== Configure lock acquisition timeout - -An executing transaction may get stuck while waiting for some lock to be released by another transaction. -To kill that transaction and remove the lock, set set `xref:configuration/configuration-settings.adoc#config_db.lock.acquisition.timeout[db.lock.acquisition.timeout]` to some positive time interval value (e.g., `10s`) denoting the maximum time interval within which any particular lock should be acquired, before failing the transaction. -Setting `db.lock.acquisition.timeout` to `0` -- which is the default value -- disables the lock acquisition timeout. - -This feature cannot be set dynamically. - -.Configure lock acquisition timeout -==== -Set the timeout to ten seconds. -[source, parameters] ----- -db.lock.acquisition.timeout=10s ----- -==== - - -[[transaction-management-list-transactions]] -== List all running transactions - -To list the currently running transactions within an instance, use the `SHOW TRANSACTIONS` command. - -The link:{neo4j-docs-base-uri}/cypher-manual/{page-version}/administration/access-control/database-administration#access-control-database-administration-transaction[`SHOW TRANSACTION` privilege] determines what transactions are returned by the command. -However, the xref:authentication-authorization/terminology.adoc#term-current-user[current user] can always view all of their own currently executing transactions. - -*Syntax:* - -`SHOW TRANSACTIONS` - -For more information on this command, see the link:{neo4j-docs-base-uri}/cypher-manual/{page-version}/clauses/transaction-clauses#query-listing-transactions[Cypher manual -> `SHOW TRANSACTIONS` command]. diff --git a/modules/ROOT/pages/performance/disks-ram-and-other-tips.adoc b/modules/ROOT/pages/performance/disks-ram-and-other-tips.adoc index 1615f4793..64c81cc76 100644 --- a/modules/ROOT/pages/performance/disks-ram-and-other-tips.adoc +++ b/modules/ROOT/pages/performance/disks-ram-and-other-tips.adoc @@ -121,5 +121,5 @@ Each IO is, in the case of the checkpoint process, an 8 KiB write. An IOPS limit of 600, for instance, would thus only allow the checkpoint process to write at a rate of roughly 5 MiB per second. This will, on the other hand, make checkpoints take longer to complete. A longer time between checkpoints can cause more transaction log data to accumulate, and can lengthen recovery times. -See the xref:configuration/transaction-logs.adoc[transaction logs] section for more details on the relationship between checkpoints and log pruning. +See the xref:database-internals/checkpointing.adoc[Checkpointing and log pruning] section for more details on the relationship between checkpoints and log pruning. The IOPS limit can be xref:configuration/dynamic-settings.adoc[changed at runtime], making it possible to tune it until you have the right balance between IO usage and checkpoint time. diff --git a/modules/ROOT/pages/performance/index.adoc b/modules/ROOT/pages/performance/index.adoc index 52ab598aa..db2995480 100644 --- a/modules/ROOT/pages/performance/index.adoc +++ b/modules/ROOT/pages/performance/index.adoc @@ -1,6 +1,6 @@ [[performance]] = Performance -:description: This chapter describes factors that affect operational performance, and how to tune Neo4j for optimal throughput. +:description: This chapter describes factors that affect operational performance, and how to tune Neo4j for optimal throughput. This section describes factors that affect operational performance and how to tune Neo4j for optimal throughput. The following topics are covered: @@ -10,7 +10,6 @@ The following topics are covered: * xref:performance/gc-tuning.adoc[Garbage collector] -- How to configure the Java Virtual Machine's garbage collector. * xref:performance/bolt-thread-pool-configuration.adoc[Bolt thread pool configuration] -- How to configure the Bolt thread pool. * xref:performance/linux-file-system-tuning.adoc[Linux file system tuning] -- How to configure the Linux file system. -* xref:performance/locks-deadlocks.adoc[Locks and deadlocks] -- Information about locks and deadlocks in Neo4j. * xref:performance/disks-ram-and-other-tips.adoc[Disks, RAM and other tips] -- Disks, RAM and other tips. * xref:performance/statistics-execution-plans.adoc[Statistics and execution plans] -- How schema statistics and execution plans affect Cypher query performance. * xref:performance/space-reuse.adoc[Space reuse] -- Data deletion and storage space reuse. diff --git a/modules/ROOT/pages/performance/locks-deadlocks.adoc b/modules/ROOT/pages/performance/locks-deadlocks.adoc deleted file mode 100644 index 1532df628..000000000 --- a/modules/ROOT/pages/performance/locks-deadlocks.adoc +++ /dev/null @@ -1,180 +0,0 @@ -= Locks and deadlocks -:description: This page discusses how locks are used in Neo4j, and strategies to avoid deadlocks. - -Neo4j is fully https://neo4j.com/docs/java-reference/current/transaction-management/[ACID compliant]. -This means that all database operations which access graphs, indexes, or schemas must be performed in a transaction. -When a write transaction occurs, Neo4j takes locks to preserve data consistency while updating. - -== Locks - -Locks are taken automatically by the queries that users run. -They ensure that a node/relationship is locked to one particular transaction until that transaction is completed. -In other words, a lock on a node or a relationship by one transaction will pause additional transactions which seek to concurrently modify the same node or relationship. -As such, locks prevent concurrent modifications of shared resources between transactions. - -Locks are used in Neo4j to ensure data consistency and isolation levels. -They not only protect logical entities (such as nodes and relationships), but also the integrity of internal data structures. - -The default isolation is read-committed isolation level. -It is, however, possible to manually acquire write locks on nodes and relationships. -For more information on how to manually acquire write locks, see https://neo4j.com/docs/java-reference/current/transaction-management/#transactions-isolation[Neo4j Java Reference Manual -> Transaction management]. - -== Lock contention - -Lock contention may arise if an application needs to perform concurrent updates on the same nodes/relationships. -In such a scenario, transactions must wait for locks held by other transactions to be released in order to be completed. -If two or more transactions attempt to modify the same data concurrently, it will increase the likelihood of a deadlock (explained in more detail below). -In larger graphs, it is less likely that two transactions modify the same data concurrently, and so the likelihood of a deadlock is reduced. -That said, even in large graphs, a deadlock can occur if two or more transactions are attempting to modify the same data concurrently. - -== Locks in practice - -.Locks taken for specific graph modifications -[cols="1,3a"] -|=== -| Modification | Lock taken - -| Creating a node | No lock -| Updating a node label | Node lock -| Updating a node property | Node lock -| Deleting a node | Node lock -| Creating a relationship* | If node is sparse: node lock. - -If node is dense: node delete prevention lock.** -| Updating a relationship property | Relationship lock -| Deleting a relationship* | If node is sparse: node lock. - -If node is dense: node delete prevention lock. - -Relationship lock for both sparse and dense nodes. -|=== -*_Applies for both source nodes and target nodes._ - -**_A node is considered dense if it at any point has had 50 or more relationships (i.e. it will still be considered dense even it comes to have less than 50 relationships at any point in the future)._ -_A node is considered sparse if it has never had more than 50 relationships._ - -Additional locks are often taken to maintain indexes and other internal structures, depending on how other data in the graph is affected by a transaction. -For these additional locks, no assumptions or guarantees can be made with regard to which lock will or will not be taken. - -== Locks and dense nodes - -When creating or deleting relationships in Neo4j, dense nodes are not exclusively locked during a transaction (a node is considered dense if, at any point in time, it has had more than 50 relationships). -Rather, internally shared locks prevent the deletion of nodes, and shared degree locks are acquired for synchronizing with concurrent label changes for those nodes (to ensure correct count updates). - -At commit time, relationships are inserted into their relationship chains at places that are currently uncontested (i.e. not currently modified by another transaction), and the surrounding relationships are exclusively locked. - -In other words, relationship modifications acquires coarse-grained shared node locks when doing the operation in the transaction, and then acquires precise exclusive relationship locks during commit. - -The locking is very similar for sparse and dense nodes. -The biggest contention for sparse nodes is the update of the degree (i.e. number of relationships) for the node. -Dense nodes store this data in a concurrent data structure, and so can avoid exclusive node locks in almost all cases for relationship modifications. - -== Deadlocks - -A deadlock occurs when two transactions are blocked by each other because they are attempting to concurrently modify a node or a relationship that is locked by the other transaction. -In such a scenario, neither of the transactions will be able to proceed. - -When Neo4j detects a deadlock, the transaction is terminated (with the transient error message code `Neo.TransientError.Transaction.DeadlockDetected`). - -For example, running the following two queries in https://neo4j.com/docs/operations-manual/current/tools/cypher-shell/[Cypher-shell] at the same time, will result in a deadlock because they are attempting to modify the same node properties concurrently: - -.Transaction A -[source, cypher, indent=0] ----- -:begin -MATCH (n:Test) SET n.prop = 1 -WITH collect(n) as nodes -CALL apoc.util.sleep(5000) -MATCH (m:Test2) SET m.prop = 1; ----- - -.Transaction B -[source, cypher, indent=0] ----- -:begin -MATCH (n:Test2) SET n.prop = 1 -WITH collect(n) as nodes -CALL apoc.util.sleep(5000) -MATCH (m:Test) SET m.prop = 1; ----- - -The following error message is thrown: - -[source, output, role="noheader", indent=0] ----- -The transaction will be rolled back and terminated. Error: ForsetiClient[transactionId=6698, clientId=1] can't acquire ExclusiveLock{owner=ForsetiClient[transactionId=6697, clientId=3]} on NODE(27), because holders of that lock are waiting for ForsetiClient[transactionId=6698, clientId=1]. - Wait list:ExclusiveLock[ -Client[6697] waits for [ForsetiClient[transactionId=6698, clientId=1]]] ----- - -[NOTE] -==== -The Cypher clause `MERGE` takes locks out of order to ensure uniqueness of the data, and this may prevent Neo4j's internal sorting operations from ordering transactions in a way which avoids deadlocks. -When possible, users are, therefore, encouraged to use the Cypher clause `CREATE` instead, which does not take locks out of order. -==== - -== Avoiding deadlocks - -Most likely, a deadlock will be resolved by retrying the transaction. -This will, however, negatively impact the total transactional throughput of the database, so it is useful to know about strategies to avoid deadlocks. - -Neo4j assists transactions by internally sorting operations (see below for more information about internal locks). -However, this internal sorting only applies for the locks taken when creating or deleting relationships. -Users are, therefore, encouraged to sort their operations in cases where Neo4j does not internally assist, such as when locks are taken for property updates. -This is done by ensuring that updates occur in the same order. -For example, if the three locks `A`, `B`, and `C` are always taken in the same order (e.g. `A->B->C`), then a transaction will never hold lock `B` while waiting for lock `A` to be released, and so a deadlock will not occur. - -Another option is to avoid lock contention by not modifying the same entities concurrently. - -For more information about deadlocks, see https://neo4j.com/docs/java-reference/5/transaction-management/#transactions-deadlocks[Neo4j Java Reference Manual -> Transaction management]. - -== Internal lock types - -To avoid deadlocks, internal locks should be taken in the following order: - -[cols="2,1,3a"] -|=== -| Lock type | Locked entity | Description - - -| `LABEL` or `RELATIONSHIP_TYPE` -| Token id -| Schema locks, which lock indexes and constraints on the particular label or relationship type. - -| `SCHEMA_NAME` -| Schema name -| Lock a schema name to avoid duplicates. -Note, collisions are possible because the hash is stringed (this only affects concurrency and not correctness). - -| `NODE_RELATIONSHIP_GROUP_DELETE` -| Node id -| Lock taken on a node during the transaction creation phase to prevent deletion of said node and/or relationship group. -This is different from the `NODE` lock to allow concurrent label and property changes together with relationship modifications. - -| `NODE` -| Node id -| Lock on a node, used to prevent concurrent updates to the node records (i.e. add/remove label, set property, add/remove relationship). -Note that updating relationships will only require a lock on the node if the head of the relationship chain/relationship group chain must be updated, since that is the only data part of the node record. - -| `DEGREES` -| Node id -| Used to lock nodes to avoid concurrent label changes when a relationship is added or deleted. -Such an update would otherwise lead to an inconsistent count store. - -| `RELATIONSHIP_DELETE` -| Relationship id -| Lock a relationship for exclusive access during deletion. - -| `RELATIONSHIP_GROUP` -| Node id -| Lock the full relationship group chain for a given dense node.* -This will not lock the node, in contrast to the lock `NODE_RELATIONSHIP_GROUP_DELETE`. - -| `RELATIONSHIP` -| Relationship -| Lock on a relationship, or more specifically a relationship record, to prevent concurrent updates. -|=== - -*_A node is considered dense if it at any point has had 50 or more relationships (i.e. it will still be considered dense even it comes to have less than 50 relationships at any point in the future)._ - -Note that these lock types may change without any notification between different Neo4j versions. \ No newline at end of file diff --git a/modules/ROOT/pages/performance/space-reuse.adoc b/modules/ROOT/pages/performance/space-reuse.adoc index 6185ce660..860fd0c56 100644 --- a/modules/ROOT/pages/performance/space-reuse.adoc +++ b/modules/ROOT/pages/performance/space-reuse.adoc @@ -1,12 +1,12 @@ [[space-reuse]] = Space reuse -:description: This page describes how Neo4j handles data deletion and storage space. +:description: This page describes how Neo4j handles data deletion and storage space. Neo4j uses logical deletes to remove data from the database to achieve maximum performance and scalability. A logical delete means that all relevant records are marked as deleted, but the space they occupy is not immediately returned to the operating system. Instead, it is subsequently reused by the transactions _creating_ data. -Marking a record as deleted requires writing a record update command to the xref:configuration/transaction-logs.adoc[transaction log], as when something is created or updated. +Marking a record as deleted requires writing a record update command to the xref:database-internals/transaction-logs.adoc[transaction log files], as when something is created or updated. Therefore, when deleting large amounts of data, this leads to a storage usage growth of that particular database, because Neo4j writes records for all deleted nodes, their properties, and relationships to the transaction log. @@ -15,7 +15,7 @@ all deleted nodes, their properties, and relationships to the transaction log. Keep in mind that when doing `DETACH DELETE` on many nodes, those deletes can take up more space in the in-memory transaction state and the transaction log than you might expect. ==== -Transactions are eventually pruned out of the xref:configuration/transaction-logs.adoc[transaction log], bringing the storage usage of the log back down to the expected level. +Transactions are eventually pruned out of the xref:database-internals/transaction-logs.adoc[transaction log files], bringing the storage usage of the log back down to the expected level. The store files, on the other hand, do not shrink when data is deleted. The space that the deleted records take up is kept in the store files. Until the space is reused, the store files are sparse and fragmented, but the performance impact of this is usually minimal.