Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion modules/ROOT/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ It covers the following topics:
* xref:extending-neo4j/index.adoc[] -- How to build unmanaged extensions and procedures.
* xref:java-embedded/index.adoc[] -- Instructions on embedding Neo4j in .
* xref:traversal-framework/index.adoc[] -- A walkthrough of the traversal framework.
* xref:transaction-management.adoc[] -- Details on transaction semantics in Neo4j.
* xref:transaction-management.adoc[] -- Examples on transaction management in Neo4j.
* xref:jmx-metrics.adoc[] -- How to monitor Neo4j with JMX and a reference of available metrics.

[TIP]
Expand Down
24 changes: 23 additions & 1 deletion modules/ROOT/pages/java-embedded/unique-nodes.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,27 @@

This describes how to ensure the uniqueness of a property when creating nodes.

For an overview of unique nodes, see xref:transaction-management.adoc#transactions-unique-nodes[Transaction management -> Creating unique nodes].
[[transactions-unique-nodes]]
== Creating unique nodes

In many use cases, a certain level of uniqueness is desired among entities.
For example, only one user with a certain email address may exist in a system.
If multiple concurrent threads naively try to create the user, duplicates will be created.

The following are the main strategies for ensuring uniqueness, and they all work across cluster and single-instance deployments.


[[transactions-unique-nodes-singlethread]]
=== Single thread

By using a single thread, no two threads even try to create a particular entity simultaneously.
In a cluster, an external single-threaded client can perform the operations.


[[transactions-get-or-create]]
=== Get or create

Defining a uniqueness constraint and using the Cypher `MERGE` clause is the most efficient way to _get or create_ a unique node.

[TIP]
====
Expand Down Expand Up @@ -62,3 +82,5 @@ You might also be tempted to use Java synchronization for pessimistic locking, b
By mixing locks in Neo4j and the Java runtime, it is possible to produce deadlocks that are not detectable by Neo4j.
As long as all locking is done by Neo4j, all deadlocks will be detected and avoided.

For more information on locks and deadlocks, see link:{neo4j-docs-base-uri}/operations-manual/{page-version}/database-internals/locks-deadlocks.adoc#_locks[Operations Manual -> Locks and deadlocks^].

202 changes: 13 additions & 189 deletions modules/ROOT/pages/transaction-management.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
:description: Neo4j transaction management, including interaction cycle, isolation levels, default locking behavior, deadlocks, delete semantics, creating unique nodes, and transaction events.
:description: Neo4j transaction management, creating unique nodes, and transaction events.

:org-neo4j-graphdb-event-TransactionEventListener: {neo4j-javadocs-base-uri}/org/neo4j/graphdb/event/TransactionEventListener.html
:org-neo4j-graphdb-event-TransactionData: {neo4j-javadocs-base-uri}/org/neo4j/graphdb/event/TransactionData.html
Expand All @@ -8,31 +8,17 @@
[[transaction-management]]
= Transaction management

This topic describes transactional management and behavior.

To fully maintain data integrity and ensure good transactional behavior, Neo4j DBMS supports the **ACID** properties:

* **A**tomicity -- If a part of a transaction fails, the database state is left unchanged.
* **C**onsistency -- Every transaction leaves the database in a consistent state.
* **I**solation -- During a transaction, modified data cannot be accessed by other operations.
* **D**urability -- The DBMS can always recover the results of a committed transaction.

Specifically:

* All database operations that access the graph, indexes, or schema must be performed in a transaction.
* The default isolation level is _read-committed isolation level_.
* Data retrieved by traversals is not protected from modification by other transactions.
* Non-repeatable reads may occur (i.e., only write locks are acquired and held until the end of the transaction).
* One can manually acquire write locks on nodes and relationships to achieve a higher level of isolation -- _serialization isolation level_.
* Locks are acquired at the Node and Relationship levels.
* Deadlock detection is built into the core transaction management.
[IMPORTANT]
====
This page describes only some specific aspects of transaction management when used with the Neo4j Java API and provides some examples of how to avoid deadlocks, and how to register a transaction event listener for a specific database and perform basic operations on top of the transaction change set.

Therefore, it is highly recommended that you read link:{neo4j-docs-base-uri}/operations-manual/{page-version}/database-internals/[Operations Manual -> Database internals and transactional behavior] before you continue reading this page.
====

[[transactions-interaction]]
== Interaction cycle
[[transactions-overview]]
== Overview

There are database operations that must be performed in a transaction to ensure the ACID properties.
Specifically, operations that access the graph, indexes, or schema are such operations.
Database operations that access the graph, indexes, or schema are performed in a transaction to ensure the ACID properties.
Transactions are single-threaded, confined, and independent.
Multiple transactions can be started in a single thread and they are independent of each other.

Expand All @@ -44,8 +30,8 @@ The interaction cycle of working with transactions follows the steps:

[NOTE]
====
It is crucial to finish each transaction.
The locks or memory acquired by a transaction are only released upon completion.
It is crucial to finish each transaction because the link:{neo4j-docs-base-uri}/operations-manual/{page-version}/database-internals/locks-deadlocks.adoc#_locks[locks^] or memory acquired by a transaction are only released upon completion.
For more information on locks and deadlocks, see link:{neo4j-docs-base-uri}/operations-manual/{page-version}/database-internals/locks-deadlocks.adoc#_locks[Operations Manual -> Locks and deadlocks^].
====

The idiomatic use of transactions in Neo4j is to use a `try-with-resources` statement and declare `transaction` as one of the resources.
Expand All @@ -61,125 +47,8 @@ All modifications performed in a transaction are kept in memory.
This means that very large updates must be split into several transactions to avoid running out of memory.
====


[[transactions-isolation]]
== Isolation levels

Transactions in Neo4j use a _read-committed isolation level_, which means they see data as soon as it has been committed but cannot see data in other transactions that have not yet been committed.
This type of isolation is weaker than serialization but offers significant performance advantages while being sufficient for the overwhelming majority of cases.

In addition, the Neo4j Java API enables explicit locking of nodes and relationships.
Using locks allows simulating the effects of higher levels of isolation by obtaining and releasing locks explicitly.
For example, if a write lock is taken on a common node or relationship, then all transactions are serialized on that lock -- giving the effect of a _serialization isolation level_.


[[transactions-isolation-lostupdates]]
=== Lost updates in Cypher

In Cypher it is possible to acquire write locks to simulate improved isolation in some cases.
Consider the case where multiple concurrent Cypher queries increment the value of a property.
Due to the limitations of the _read-committed isolation level_, the increments might not result in a deterministic final value.
If there is a direct dependency, Cypher automatically acquires a write lock before reading.
A direct dependency is when the right-hand side of `SET` has a dependent property read in the expression or the value of a key-value pair in a literal map.

For example, if you run one of the following queries by one hundred concurrent clients, it will increment the property `n.prop` to 100 because Cypher will automatically acquire a write lock.

.Cypher can acquire a write lock
====
The following example requires a write lock, and Cypher automatically acquires one:

[source, cypher, role="noheader"]
----
MATCH (n:Example {id: 42})
SET n.prop = n.prop + 1
----
====

.Cypher can acquire a write lock
====
This example also requires a write lock, and Cypher automatically acquires one:

[source, cypher, role="noheader"]
----
MATCH (n)
SET n += {prop: n.prop + 1}
----
====

Due to the complexity of determining such a dependency in the general case, Cypher does not cover all cases.
If you run one of the following queries concurrently 100 times, the final value of `n.prop` will most probably be less than 100.

.Complex Cypher
====
Variable depending on results from reading the property in an earlier statement:

[source, cypher, role="noheader"]
----
MATCH (n)
WITH n.prop AS p
// ... operations depending on p, producing k
SET n.prop = k + 1
----
====

.Complex Cypher
====
Circular dependency between properties read and written in the same query:

[source, cypher, role="noheader"]
----
MATCH (n)
SET n += {propA: n.propB + 1, propB: n.propA + 1}
----
====

To ensure deterministic behavior also in the more complex cases, it is necessary to explicitly acquire a write lock on the node in question.
In Cypher there is no explicit support for this, but it is possible to work around this limitation by writing to a temporary property.

.Explicitly acquire a write lock
====
This example acquires a write lock for the node by writing to a dummy property before reading the requested value:

[source, cypher, role="noheader"]
----
MATCH (n:Example {id: 42})
SET n._LOCK_ = true
WITH n.prop AS p
// ... operations depending on p, producing k
SET n.prop = k + 1
REMOVE n._LOCK_
----
====

The existence of the `+SET n._LOCK_+` statement before the read of the `n.prop` read ensures the lock is acquired before the read action, and no updates are lost due to enforced serialization of all concurrent queries on that specific node.


[[transactions-locking]]
== Default locking behavior

* When adding, changing, or removing a property on a node or relationship a write lock is taken on the specific node or relationship.
* When creating or deleting a node a write lock is taken for the specific node.
* When creating or deleting a relationship a write lock is taken on the specific relationship and both its nodes.

The locks are added to the transaction and released when the transaction finishes.


[[transactions-deadlocks]]
== Deadlocks

Since locks are used, deadlocks can happen.
Neo4j, however, detects any deadlock (caused by acquiring a lock) before they happen and throws an exception.
The transaction is marked for rollback before the exception is thrown.
All locks acquired by the transaction are still held but will be released when the transaction finishes.
Once the locks are released, other transactions that were waiting for locks held by the transaction causing the deadlock can proceed.
You can then retry the work performed by the transaction causing the deadlock if needed.

Experiencing frequent deadlocks is an indication of concurrent write requests happening in such a way that it is not possible to execute them while at the same time living up to the intended isolation and consistency.
The solution is to make sure concurrent updates happen reasonably.
For example, given two specific nodes (A and B), adding or deleting relationships to both these nodes in random order for each transaction results in deadlocks when two or more transactions do that concurrently.
One option is to make sure that updates always happen in the same order (first A then B).
Another option is to make sure that each thread/transaction does not have any conflicting writes to a node or relationship as some other concurrent transaction.
This can, for example, be achieved by letting a single thread do all updates of a specific type.
[[transactions-deadlocks-code]]
== Deadlock handling an example

[IMPORTANT]
====
Expand All @@ -188,10 +57,6 @@ Since all operations in the Neo4j API are thread-safe unless specified otherwise
Other code that requires synchronization should be synchronized in such a way that it never performs any Neo4j operation in the synchronized block.
====


[[transactions-deadlocks-code]]
=== Deadlock handling an example

The following is an example of how deadlocks can be handled in procedures, server extensions, or when using Neo4j embedded.

[TIP]
Expand Down Expand Up @@ -270,47 +135,6 @@ else
----
====

[[transactions-delete]]
== Delete semantics

When deleting a node or a relationship all properties for that entity will be automatically removed but the relationships of a node will not be removed.
Neo4j enforces a constraint (upon commit) that all relationships must have a valid start node and end node.
In effect, this means that trying to delete a node that still has relationships attached to it will throw an exception upon commit.
It is, however, possible to choose in which order to delete the node and the attached relationships as long as no relationships exist when the transaction is committed.

The delete semantics can be summarized as follows:

* All properties of a node or relationship will be removed when it is deleted.
* A deleted node cannot have any attached relationships when the transaction commits.
* It is possible to acquire a reference to a deleted relationship or node that has not yet been committed.
* Any write operation on a node or relationship after it has been deleted (but not yet committed) will throw an exception.
* Trying to acquire a new or work with an old reference to a deleted node or relationship after commit, will throw an exception.


[[transactions-unique-nodes]]
== Creating unique nodes

In many use cases, a certain level of uniqueness is desired among entities.
For example, only one user with a certain email address may exist in a system.
If multiple concurrent threads naively try to create the user, duplicates will be created.

The following are the main strategies for ensuring uniqueness, and they all work across cluster and single-instance deployments.


[[transactions-unique-nodes-singlethread]]
=== Single thread

By using a single thread, no two threads even try to create a particular entity simultaneously.
In a cluster, an external single-threaded client can perform the operations.


[[transactions-get-or-create]]
=== Get or create

Defining a uniqueness constraint and using the Cypher `MERGE` clause is the most efficient way to _get or create_ a unique node.
See xref:java-embedded/unique-nodes.adoc[] for more information.


[[transactions-events]]
== Transaction events

Expand Down