From 7296e3bb17bdff145c3b03f87183c6f45b9082bc Mon Sep 17 00:00:00 2001 From: tselmeg Date: Fri, 20 Sep 2024 16:47:09 +0200 Subject: [PATCH 01/14] Create a page for the rafted status check. --- .../clustering/monitoring/status-check.adoc | 46 +++++++++++++++++++ 1 file changed, 46 insertions(+) create mode 100644 modules/ROOT/pages/clustering/monitoring/status-check.adoc diff --git a/modules/ROOT/pages/clustering/monitoring/status-check.adoc b/modules/ROOT/pages/clustering/monitoring/status-check.adoc new file mode 100644 index 000000000..7a753f01a --- /dev/null +++ b/modules/ROOT/pages/clustering/monitoring/status-check.adoc @@ -0,0 +1,46 @@ +:description: This section describes how to monitor a database's availability with the help of the rafted status check +[role=label--new-5.24] +== Rafted Status Check + +Neo4j 5.24 introduces the xref:reference/procedures.adoc#procedure_dbms_cluster_statusCheck[`dbms.cluster.statusCheck()`] procedure, which can be used to monitor the ability to replicate in rafted databases, which in most cases means being able to write to the database. It can also +be used to check which members are up-to-date and can participate in a successful replication. Therefore, it is useful in determining the fault-tolerance of a rafted database as well. A third and final function is to determine the leader of the raft group. + +[NOTE] +==== +The member on which the procedure is called replicates a `status check entry` in the same raft group as the transactions, and verifies that the entry can be replicated and applied. + +Since the entry is not applied to the transaction state machine, it's not guaranteed that the database is write available even though the status check reports that +it can replicate. However, it tells that the raft group is healthy and in most cases that means that the database is write available. +==== + +=== Syntax + +[source, shell] +---- +CALL dbms.cluster.statusCheck(databases :: LIST, timeoutMilliseconds = null :: INTEGER) +---- + +* *databases:* the list of databases for which the status check should run. Providing an empty list will run the +status check for all *rafted* databases on that server. +* *timeoutMilliseconds:* specifies how long the replication may take. Default value is 1000 milliseconds. If replication takes longer than this timeout, it will return that +replication is unsuccessful. + + +The procedure returns a row for all raft group members of all the requested databases where each row consists of: + +* *database:* the database for which the `status check entry` was replicated. +* *serverId:* the server id of each raft group member, which did or did not participate in a successful replication of the `status check entry`. +* *serverName:* the server name of each raft group member. +* *address:* the bolt address of each raft group member. +* *replicationSuccessful:* indicates if the server (on which the procedure is run) can replicate an entry in raft. Is `TRUE` if this server managed to replicate the `status check entry` to a majority of raft members within the given timeout. `FALSE` +if it failed to replicate within the timeout. The value is the same column-wise. A failed replication +can either mean that there is a real issue in the cluster (e.g. no leader) or it may simply mean that this server is too far behind in raft, and can't therefore replicate. +* *memberStatus:* shows the status of each raft group member. It can either be `APPLYING`, `REPLICATING` or `UNAVAILABLE`. `APPLYING` means that the raft group member has raft running and is actively applying entries, including transactions. +`REPLICATING` means that the member can participate in replicating, but can't apply. This state is uncommon, but may happen while waiting for the database to start and accept transactions. +* *recognisedLeader:* shows the server id of the perceived leader of each raft group member. +* *recognisedLeaderTerm:* shows the term of the perceived leader of each raft group member. If the raft group members report different leaders, the one with the highest term should be trusted. +* *requester:* is `TRUE` for the server on which the procedure is run, and `FALSE` on the remaining servers. +* *error:* contains the error message if there is one. An example of an error is that one of more of the requested databases doesn't exist on the requester. + +In general the `replicationSuccessful` field can be used to determine overall write-availability, whereas the `memberStatus` field can be checked in order to see whether the database is fault-tolerant or not. + From 9c978eebbeff78714264a5cac2274286d0ecc4a3 Mon Sep 17 00:00:00 2001 From: tselmeg Date: Mon, 23 Sep 2024 08:10:57 +0200 Subject: [PATCH 02/14] Add detailed information about fault-tolerance. --- .../pages/clustering/monitoring/status-check.adoc | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/modules/ROOT/pages/clustering/monitoring/status-check.adoc b/modules/ROOT/pages/clustering/monitoring/status-check.adoc index 7a753f01a..f9d59a419 100644 --- a/modules/ROOT/pages/clustering/monitoring/status-check.adoc +++ b/modules/ROOT/pages/clustering/monitoring/status-check.adoc @@ -44,3 +44,14 @@ can either mean that there is a real issue in the cluster (e.g. no leader) or it In general the `replicationSuccessful` field can be used to determine overall write-availability, whereas the `memberStatus` field can be checked in order to see whether the database is fault-tolerant or not. +[NOTE] +==== +Members that are `REPLICATING` are good from a data safety point of view. They can participate in replication and keep the data durably until application. They are also up-to-date and therefore eligible leaders. So they add to the fault-tolerance. + +Members that are `APPLYING` have all the qualities of `REPLICATING` members, so they too add to the fault-tolerance. But they are also applying to the database, which is a requirement for writing transactions and reading with bookmarks in a timely manner. + +Lastly, `UNAVAILABLE` members are either too far behind or unreachable. They are unhealthy and cannot add to the fault-tolerance. + +==== + + From d93280bd77c2c210acf3f80358a9abced72ce225 Mon Sep 17 00:00:00 2001 From: NataliaIvakina <82437520+NataliaIvakina@users.noreply.github.com> Date: Mon, 23 Sep 2024 09:02:10 +0200 Subject: [PATCH 03/14] Update the TOC to include the new page --- modules/ROOT/content-nav.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/modules/ROOT/content-nav.adoc b/modules/ROOT/content-nav.adoc index 4b8b5493d..2f45de7f9 100644 --- a/modules/ROOT/content-nav.adoc +++ b/modules/ROOT/content-nav.adoc @@ -148,6 +148,7 @@ *** xref:clustering/monitoring/show-servers-monitoring.adoc[] *** xref:clustering/monitoring/show-databases-monitoring.adoc[] *** xref:clustering/monitoring/endpoints.adoc[] +*** xref:clustering/monitoring/status-check.adoc[] ** xref:clustering/disaster-recovery.adoc[] //** xref:clustering/internals.adoc[] ** xref:clustering/settings.adoc[] From b07a7b54ddab4d9812868694b8b351147bb53422 Mon Sep 17 00:00:00 2001 From: tselmeg Date: Mon, 23 Sep 2024 12:20:43 +0200 Subject: [PATCH 04/14] Addressing review comments --- .../clustering/monitoring/status-check.adoc | 42 ++++++++----------- 1 file changed, 18 insertions(+), 24 deletions(-) diff --git a/modules/ROOT/pages/clustering/monitoring/status-check.adoc b/modules/ROOT/pages/clustering/monitoring/status-check.adoc index f9d59a419..f409840e5 100644 --- a/modules/ROOT/pages/clustering/monitoring/status-check.adoc +++ b/modules/ROOT/pages/clustering/monitoring/status-check.adoc @@ -1,16 +1,15 @@ -:description: This section describes how to monitor a database's availability with the help of the rafted status check -[role=label--new-5.24] -== Rafted Status Check +:description: This section describes how to monitor a database's availability with the help of the cluster status check +[role=label--new-5.24 label--enterprise-edition] +[[database-status-check]] +== Cluster Status Check -Neo4j 5.24 introduces the xref:reference/procedures.adoc#procedure_dbms_cluster_statusCheck[`dbms.cluster.statusCheck()`] procedure, which can be used to monitor the ability to replicate in rafted databases, which in most cases means being able to write to the database. It can also -be used to check which members are up-to-date and can participate in a successful replication. Therefore, it is useful in determining the fault-tolerance of a rafted database as well. A third and final function is to determine the leader of the raft group. +Neo4j 5.24 introduces the xref:reference/procedures.adoc#procedure_dbms_cluster_statusCheck[`dbms.cluster.statusCheck()`] procedure, which can be used to monitor the ability to replicate in clustered databases, which in most cases means being able to write to the database. You can also use the procedure to check which members are up-to-date and can participate in a successful replication. Therefore, it is useful in determining the fault-tolerance of a clustered database as well. A third and final function is to determine the leader of the cluster. [NOTE] ==== -The member on which the procedure is called replicates a `status check entry` in the same raft group as the transactions, and verifies that the entry can be replicated and applied. +The member on which the procedure is called replicates a dummy transaction in the same cluster as the real transactions, and verifies that it can be replicated and applied. -Since the entry is not applied to the transaction state machine, it's not guaranteed that the database is write available even though the status check reports that -it can replicate. However, it tells that the raft group is healthy and in most cases that means that the database is write available. +Since the status check doesn't replicate an actual transaction, it's not guaranteed that the database is write available even though the status check reports that it can replicate. Apart from replication there are other stops in the write path that can potentially block a transaction from being applied, e.g. issues in the database. However, it tells that the cluster is healthy and in most cases that means that the database is write available. ==== === Syntax @@ -20,27 +19,22 @@ it can replicate. However, it tells that the raft group is healthy and in most c CALL dbms.cluster.statusCheck(databases :: LIST, timeoutMilliseconds = null :: INTEGER) ---- -* *databases:* the list of databases for which the status check should run. Providing an empty list will run the -status check for all *rafted* databases on that server. -* *timeoutMilliseconds:* specifies how long the replication may take. Default value is 1000 milliseconds. If replication takes longer than this timeout, it will return that -replication is unsuccessful. +* *databases:* the list of databases for which the status check should run. Providing an empty list will run the status check for all *clustered* databases on that server, i.e. the status check won't run on singles or secondaries. +* *timeoutMilliseconds:* specifies how long the replication may take. Default value is 1000 milliseconds. If replication takes longer than this timeout, it will return that replication is unsuccessful. -The procedure returns a row for all raft group members of all the requested databases where each row consists of: +The procedure returns a row for all primary members of all the requested databases where each row consists of: * *database:* the database for which the `status check entry` was replicated. -* *serverId:* the server id of each raft group member, which did or did not participate in a successful replication of the `status check entry`. -* *serverName:* the server name of each raft group member. -* *address:* the bolt address of each raft group member. -* *replicationSuccessful:* indicates if the server (on which the procedure is run) can replicate an entry in raft. Is `TRUE` if this server managed to replicate the `status check entry` to a majority of raft members within the given timeout. `FALSE` -if it failed to replicate within the timeout. The value is the same column-wise. A failed replication -can either mean that there is a real issue in the cluster (e.g. no leader) or it may simply mean that this server is too far behind in raft, and can't therefore replicate. -* *memberStatus:* shows the status of each raft group member. It can either be `APPLYING`, `REPLICATING` or `UNAVAILABLE`. `APPLYING` means that the raft group member has raft running and is actively applying entries, including transactions. -`REPLICATING` means that the member can participate in replicating, but can't apply. This state is uncommon, but may happen while waiting for the database to start and accept transactions. -* *recognisedLeader:* shows the server id of the perceived leader of each raft group member. -* *recognisedLeaderTerm:* shows the term of the perceived leader of each raft group member. If the raft group members report different leaders, the one with the highest term should be trusted. +* *serverId:* the server id of each primary member, which did or did not participate in a successful replication of the `status check entry`. +* *serverName:* the server name of each primary member. +* *address:* the bolt address of each primary member. +* *replicationSuccessful:* indicates if the server (on which the procedure is run) can replicate a transaction. Is `TRUE` if this server managed to replicate the dummy transaction to a majority of raft members within the given timeout. `FALSE` if it failed to replicate within the timeout. The value is the same column-wise. A failed replication can either mean that there is a real issue in the cluster (e.g. no leader) or it may simply mean that this server is too far behind in apply, and can't therefore replicate. +* *memberStatus:* shows the status of each primary member. It can either be `APPLYING`, `REPLICATING` or `UNAVAILABLE`. `APPLYING` means that the member can replicate and is actively applying transactions. `REPLICATING` means that the member can participate in replicating, but can't apply. This state is uncommon, but may happen while waiting for the database to start and accept transactions. +* *recognisedLeader:* shows the server id of the perceived leader of each primary member. +* *recognisedLeaderTerm:* shows the term of the perceived leader of each primary member. If the members report different leaders, the one with the highest term should be trusted. * *requester:* is `TRUE` for the server on which the procedure is run, and `FALSE` on the remaining servers. -* *error:* contains the error message if there is one. An example of an error is that one of more of the requested databases doesn't exist on the requester. +* *error:* contains the error message if there is one. An example of an error is that one or more of the requested databases doesn't exist on the requester. In general the `replicationSuccessful` field can be used to determine overall write-availability, whereas the `memberStatus` field can be checked in order to see whether the database is fault-tolerant or not. From af39c66509412243017674ceb347179494986564 Mon Sep 17 00:00:00 2001 From: NataliaIvakina <82437520+NataliaIvakina@users.noreply.github.com> Date: Mon, 23 Sep 2024 14:24:38 +0200 Subject: [PATCH 05/14] Fix headings and their levels --- .../clustering/monitoring/status-check.adoc | 56 +++++++++++++------ 1 file changed, 40 insertions(+), 16 deletions(-) diff --git a/modules/ROOT/pages/clustering/monitoring/status-check.adoc b/modules/ROOT/pages/clustering/monitoring/status-check.adoc index f409840e5..be2c92461 100644 --- a/modules/ROOT/pages/clustering/monitoring/status-check.adoc +++ b/modules/ROOT/pages/clustering/monitoring/status-check.adoc @@ -1,26 +1,36 @@ -:description: This section describes how to monitor a database's availability with the help of the cluster status check +:description: This section describes how to monitor a database's availability with the help of the cluster status check procedure. + [role=label--new-5.24 label--enterprise-edition] -[[database-status-check]] -== Cluster Status Check +[[cluster-status-check]] += Cluster status check -Neo4j 5.24 introduces the xref:reference/procedures.adoc#procedure_dbms_cluster_statusCheck[`dbms.cluster.statusCheck()`] procedure, which can be used to monitor the ability to replicate in clustered databases, which in most cases means being able to write to the database. You can also use the procedure to check which members are up-to-date and can participate in a successful replication. Therefore, it is useful in determining the fault-tolerance of a clustered database as well. A third and final function is to determine the leader of the cluster. +Neo4j 5.24 introduces the xref:reference/procedures.adoc#procedure_dbms_cluster_statusCheck[`dbms.cluster.statusCheck()`] procedure, which can be used to monitor the ability to replicate in clustered databases, which in most cases means being able to write to the database. +You can also use the procedure to check which members are up-to-date and can participate in a successful replication. +Therefore, it is useful in determining the fault-tolerance of a clustered database as well. +A third and final function is to determine the leader of the cluster. [NOTE] ==== The member on which the procedure is called replicates a dummy transaction in the same cluster as the real transactions, and verifies that it can be replicated and applied. -Since the status check doesn't replicate an actual transaction, it's not guaranteed that the database is write available even though the status check reports that it can replicate. Apart from replication there are other stops in the write path that can potentially block a transaction from being applied, e.g. issues in the database. However, it tells that the cluster is healthy and in most cases that means that the database is write available. +Since the status check doesn't replicate an actual transaction, it's not guaranteed that the database is write available even though the status check reports that it can replicate. +Apart from replication there are other stops in the write path that can potentially block a transaction from being applied, e.g. issues in the database. +However, it tells that the cluster is healthy and in most cases that means that the database is write available. ==== -=== Syntax +[[procedure-syntax]] +== Syntax [source, shell] ---- CALL dbms.cluster.statusCheck(databases :: LIST, timeoutMilliseconds = null :: INTEGER) ---- -* *databases:* the list of databases for which the status check should run. Providing an empty list will run the status check for all *clustered* databases on that server, i.e. the status check won't run on singles or secondaries. -* *timeoutMilliseconds:* specifies how long the replication may take. Default value is 1000 milliseconds. If replication takes longer than this timeout, it will return that replication is unsuccessful. +* *databases:* the list of databases for which the status check should run. +Providing an empty list runs the status check for all *clustered* databases on that server, i.e. the status check won't run on singles or secondaries. +* *timeoutMilliseconds:* specifies how long the replication may take. +Default value is 1000 milliseconds. +If replication takes longer than this timeout, it will return that replication is unsuccessful. The procedure returns a row for all primary members of all the requested databases where each row consists of: @@ -29,23 +39,37 @@ The procedure returns a row for all primary members of all the requested databas * *serverId:* the server id of each primary member, which did or did not participate in a successful replication of the `status check entry`. * *serverName:* the server name of each primary member. * *address:* the bolt address of each primary member. -* *replicationSuccessful:* indicates if the server (on which the procedure is run) can replicate a transaction. Is `TRUE` if this server managed to replicate the dummy transaction to a majority of raft members within the given timeout. `FALSE` if it failed to replicate within the timeout. The value is the same column-wise. A failed replication can either mean that there is a real issue in the cluster (e.g. no leader) or it may simply mean that this server is too far behind in apply, and can't therefore replicate. -* *memberStatus:* shows the status of each primary member. It can either be `APPLYING`, `REPLICATING` or `UNAVAILABLE`. `APPLYING` means that the member can replicate and is actively applying transactions. `REPLICATING` means that the member can participate in replicating, but can't apply. This state is uncommon, but may happen while waiting for the database to start and accept transactions. +* *replicationSuccessful:* indicates if the server (on which the procedure is run) can replicate a transaction. +Is `TRUE` if this server managed to replicate the dummy transaction to a majority of raft members within the given timeout. +`FALSE` if it failed to replicate within the timeout. +The value is the same column-wise. +A failed replication can either mean that there is a real issue in the cluster (e.g. no leader) or it may simply mean that this server is too far behind in apply, and can't therefore replicate. +* *memberStatus:* shows the status of each primary member. +It can either be `APPLYING`, `REPLICATING` or `UNAVAILABLE`. +`APPLYING` means that the member can replicate and is actively applying transactions. +`REPLICATING` means that the member can participate in replicating, but can't apply. +This state is uncommon, but may happen while waiting for the database to start and accept transactions. * *recognisedLeader:* shows the server id of the perceived leader of each primary member. -* *recognisedLeaderTerm:* shows the term of the perceived leader of each primary member. If the members report different leaders, the one with the highest term should be trusted. +* *recognisedLeaderTerm:* shows the term of the perceived leader of each primary member. +If the members report different leaders, the one with the highest term should be trusted. * *requester:* is `TRUE` for the server on which the procedure is run, and `FALSE` on the remaining servers. -* *error:* contains the error message if there is one. An example of an error is that one or more of the requested databases doesn't exist on the requester. +* *error:* contains the error message if there is one. +An example of an error is that one or more of the requested databases doesn't exist on the requester. In general the `replicationSuccessful` field can be used to determine overall write-availability, whereas the `memberStatus` field can be checked in order to see whether the database is fault-tolerant or not. [NOTE] ==== -Members that are `REPLICATING` are good from a data safety point of view. They can participate in replication and keep the data durably until application. They are also up-to-date and therefore eligible leaders. So they add to the fault-tolerance. - -Members that are `APPLYING` have all the qualities of `REPLICATING` members, so they too add to the fault-tolerance. But they are also applying to the database, which is a requirement for writing transactions and reading with bookmarks in a timely manner. +Members that are `REPLICATING` are good from a data safety point of view. +They can participate in replication and keep the data durably until application. +They are also up-to-date and therefore eligible leaders. +So they add to the fault-tolerance. -Lastly, `UNAVAILABLE` members are either too far behind or unreachable. They are unhealthy and cannot add to the fault-tolerance. +Members that are `APPLYING` have all the qualities of `REPLICATING` members, so they too add to the fault-tolerance. +But they are also applying to the database, which is a requirement for writing transactions and reading with bookmarks in a timely manner. +Lastly, `UNAVAILABLE` members are either too far behind or unreachable. +They are unhealthy and cannot add to the fault-tolerance. ==== From cc15defdf94647711a5b9c8791e4dd28ba269631 Mon Sep 17 00:00:00 2001 From: Tselmeg Baasan <37698237+tselmegbaasan@users.noreply.github.com> Date: Wed, 25 Sep 2024 10:43:50 +0200 Subject: [PATCH 06/14] Update modules/ROOT/pages/clustering/monitoring/status-check.adoc Co-authored-by: NataliaIvakina <82437520+NataliaIvakina@users.noreply.github.com> --- modules/ROOT/pages/clustering/monitoring/status-check.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/ROOT/pages/clustering/monitoring/status-check.adoc b/modules/ROOT/pages/clustering/monitoring/status-check.adoc index be2c92461..bcf2036a5 100644 --- a/modules/ROOT/pages/clustering/monitoring/status-check.adoc +++ b/modules/ROOT/pages/clustering/monitoring/status-check.adoc @@ -38,7 +38,7 @@ The procedure returns a row for all primary members of all the requested databas * *database:* the database for which the `status check entry` was replicated. * *serverId:* the server id of each primary member, which did or did not participate in a successful replication of the `status check entry`. * *serverName:* the server name of each primary member. -* *address:* the bolt address of each primary member. +* *address:* the Bolt address of each primary member. * *replicationSuccessful:* indicates if the server (on which the procedure is run) can replicate a transaction. Is `TRUE` if this server managed to replicate the dummy transaction to a majority of raft members within the given timeout. `FALSE` if it failed to replicate within the timeout. From f86df3b03eeb36caa92b8952281653f8cdd8d89c Mon Sep 17 00:00:00 2001 From: Tselmeg Baasan <37698237+tselmegbaasan@users.noreply.github.com> Date: Wed, 25 Sep 2024 10:43:57 +0200 Subject: [PATCH 07/14] Update modules/ROOT/pages/clustering/monitoring/status-check.adoc Co-authored-by: NataliaIvakina <82437520+NataliaIvakina@users.noreply.github.com> --- modules/ROOT/pages/clustering/monitoring/status-check.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/ROOT/pages/clustering/monitoring/status-check.adoc b/modules/ROOT/pages/clustering/monitoring/status-check.adoc index bcf2036a5..6c75112ea 100644 --- a/modules/ROOT/pages/clustering/monitoring/status-check.adoc +++ b/modules/ROOT/pages/clustering/monitoring/status-check.adoc @@ -40,7 +40,7 @@ The procedure returns a row for all primary members of all the requested databas * *serverName:* the server name of each primary member. * *address:* the Bolt address of each primary member. * *replicationSuccessful:* indicates if the server (on which the procedure is run) can replicate a transaction. -Is `TRUE` if this server managed to replicate the dummy transaction to a majority of raft members within the given timeout. +** `TRUE` -- if this server managed to replicate the dummy transaction to a majority of cluster members within the given timeout. `FALSE` if it failed to replicate within the timeout. The value is the same column-wise. A failed replication can either mean that there is a real issue in the cluster (e.g. no leader) or it may simply mean that this server is too far behind in apply, and can't therefore replicate. From b13f78c8d72704161b1c9eb08c854089c8017693 Mon Sep 17 00:00:00 2001 From: Tselmeg Baasan <37698237+tselmegbaasan@users.noreply.github.com> Date: Wed, 25 Sep 2024 10:44:16 +0200 Subject: [PATCH 08/14] Update modules/ROOT/pages/clustering/monitoring/status-check.adoc Co-authored-by: NataliaIvakina <82437520+NataliaIvakina@users.noreply.github.com> --- modules/ROOT/pages/clustering/monitoring/status-check.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/modules/ROOT/pages/clustering/monitoring/status-check.adoc b/modules/ROOT/pages/clustering/monitoring/status-check.adoc index 6c75112ea..8bd5e31db 100644 --- a/modules/ROOT/pages/clustering/monitoring/status-check.adoc +++ b/modules/ROOT/pages/clustering/monitoring/status-check.adoc @@ -40,6 +40,7 @@ The procedure returns a row for all primary members of all the requested databas * *serverName:* the server name of each primary member. * *address:* the Bolt address of each primary member. * *replicationSuccessful:* indicates if the server (on which the procedure is run) can replicate a transaction. ++ ** `TRUE` -- if this server managed to replicate the dummy transaction to a majority of cluster members within the given timeout. `FALSE` if it failed to replicate within the timeout. The value is the same column-wise. From 7d1333383a3188b41dc845259a16a92e1b83f161 Mon Sep 17 00:00:00 2001 From: Tselmeg Baasan <37698237+tselmegbaasan@users.noreply.github.com> Date: Wed, 25 Sep 2024 10:44:23 +0200 Subject: [PATCH 09/14] Update modules/ROOT/pages/clustering/monitoring/status-check.adoc Co-authored-by: NataliaIvakina <82437520+NataliaIvakina@users.noreply.github.com> --- modules/ROOT/pages/clustering/monitoring/status-check.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/ROOT/pages/clustering/monitoring/status-check.adoc b/modules/ROOT/pages/clustering/monitoring/status-check.adoc index 8bd5e31db..a31659cb9 100644 --- a/modules/ROOT/pages/clustering/monitoring/status-check.adoc +++ b/modules/ROOT/pages/clustering/monitoring/status-check.adoc @@ -42,7 +42,7 @@ The procedure returns a row for all primary members of all the requested databas * *replicationSuccessful:* indicates if the server (on which the procedure is run) can replicate a transaction. + ** `TRUE` -- if this server managed to replicate the dummy transaction to a majority of cluster members within the given timeout. -`FALSE` if it failed to replicate within the timeout. +** `FALSE` -- if it failed to replicate within the timeout. The value is the same column-wise. A failed replication can either mean that there is a real issue in the cluster (e.g. no leader) or it may simply mean that this server is too far behind in apply, and can't therefore replicate. * *memberStatus:* shows the status of each primary member. From 671e6cf9c867ac978abc979b91963d1251f97987 Mon Sep 17 00:00:00 2001 From: Tselmeg Baasan <37698237+tselmegbaasan@users.noreply.github.com> Date: Wed, 25 Sep 2024 10:45:20 +0200 Subject: [PATCH 10/14] Update modules/ROOT/pages/clustering/monitoring/status-check.adoc Co-authored-by: NataliaIvakina <82437520+NataliaIvakina@users.noreply.github.com> --- modules/ROOT/pages/clustering/monitoring/status-check.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/ROOT/pages/clustering/monitoring/status-check.adoc b/modules/ROOT/pages/clustering/monitoring/status-check.adoc index a31659cb9..64946044f 100644 --- a/modules/ROOT/pages/clustering/monitoring/status-check.adoc +++ b/modules/ROOT/pages/clustering/monitoring/status-check.adoc @@ -44,7 +44,7 @@ The procedure returns a row for all primary members of all the requested databas ** `TRUE` -- if this server managed to replicate the dummy transaction to a majority of cluster members within the given timeout. ** `FALSE` -- if it failed to replicate within the timeout. The value is the same column-wise. -A failed replication can either mean that there is a real issue in the cluster (e.g. no leader) or it may simply mean that this server is too far behind in apply, and can't therefore replicate. +A failed replication can either mean a real issue in the cluster (e.g., no leader) or that this server is too far behind in apply and can't replicate. * *memberStatus:* shows the status of each primary member. It can either be `APPLYING`, `REPLICATING` or `UNAVAILABLE`. `APPLYING` means that the member can replicate and is actively applying transactions. From 32618f5ca4260f52f58564bcfb8a2aca17fdb06c Mon Sep 17 00:00:00 2001 From: Tselmeg Baasan <37698237+tselmegbaasan@users.noreply.github.com> Date: Wed, 25 Sep 2024 10:45:42 +0200 Subject: [PATCH 11/14] Update modules/ROOT/pages/clustering/monitoring/status-check.adoc Co-authored-by: NataliaIvakina <82437520+NataliaIvakina@users.noreply.github.com> --- modules/ROOT/pages/clustering/monitoring/status-check.adoc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/modules/ROOT/pages/clustering/monitoring/status-check.adoc b/modules/ROOT/pages/clustering/monitoring/status-check.adoc index 64946044f..8c798050e 100644 --- a/modules/ROOT/pages/clustering/monitoring/status-check.adoc +++ b/modules/ROOT/pages/clustering/monitoring/status-check.adoc @@ -46,7 +46,8 @@ The procedure returns a row for all primary members of all the requested databas The value is the same column-wise. A failed replication can either mean a real issue in the cluster (e.g., no leader) or that this server is too far behind in apply and can't replicate. * *memberStatus:* shows the status of each primary member. -It can either be `APPLYING`, `REPLICATING` or `UNAVAILABLE`. +It can be `APPLYING`, `REPLICATING`, or `UNAVAILABLE`. ++ `APPLYING` means that the member can replicate and is actively applying transactions. `REPLICATING` means that the member can participate in replicating, but can't apply. This state is uncommon, but may happen while waiting for the database to start and accept transactions. From 367035f87ec389a01c15f36c97d84f5ab3cb60f4 Mon Sep 17 00:00:00 2001 From: Tselmeg Baasan <37698237+tselmegbaasan@users.noreply.github.com> Date: Wed, 25 Sep 2024 10:45:53 +0200 Subject: [PATCH 12/14] Update modules/ROOT/pages/clustering/monitoring/status-check.adoc Co-authored-by: NataliaIvakina <82437520+NataliaIvakina@users.noreply.github.com> --- modules/ROOT/pages/clustering/monitoring/status-check.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/ROOT/pages/clustering/monitoring/status-check.adoc b/modules/ROOT/pages/clustering/monitoring/status-check.adoc index 8c798050e..3248cd7c0 100644 --- a/modules/ROOT/pages/clustering/monitoring/status-check.adoc +++ b/modules/ROOT/pages/clustering/monitoring/status-check.adoc @@ -58,7 +58,7 @@ If the members report different leaders, the one with the highest term should be * *error:* contains the error message if there is one. An example of an error is that one or more of the requested databases doesn't exist on the requester. -In general the `replicationSuccessful` field can be used to determine overall write-availability, whereas the `memberStatus` field can be checked in order to see whether the database is fault-tolerant or not. +In general, you can use the `replicationSuccessful` field to determine overall write-availability, whereas the `memberStatus` field can be checked in order to see whether the database is fault-tolerant or not. [NOTE] ==== From b9ef49abd7822baf7435a1eca794cf555121e3bf Mon Sep 17 00:00:00 2001 From: Tselmeg Baasan <37698237+tselmegbaasan@users.noreply.github.com> Date: Wed, 25 Sep 2024 10:46:11 +0200 Subject: [PATCH 13/14] Update modules/ROOT/pages/clustering/monitoring/status-check.adoc Co-authored-by: NataliaIvakina <82437520+NataliaIvakina@users.noreply.github.com> --- modules/ROOT/pages/clustering/monitoring/status-check.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/ROOT/pages/clustering/monitoring/status-check.adoc b/modules/ROOT/pages/clustering/monitoring/status-check.adoc index 3248cd7c0..ca02b81f7 100644 --- a/modules/ROOT/pages/clustering/monitoring/status-check.adoc +++ b/modules/ROOT/pages/clustering/monitoring/status-check.adoc @@ -48,7 +48,7 @@ A failed replication can either mean a real issue in the cluster (e.g., no leade * *memberStatus:* shows the status of each primary member. It can be `APPLYING`, `REPLICATING`, or `UNAVAILABLE`. + -`APPLYING` means that the member can replicate and is actively applying transactions. +** `APPLYING` means that the member can replicate and is actively applying transactions. `REPLICATING` means that the member can participate in replicating, but can't apply. This state is uncommon, but may happen while waiting for the database to start and accept transactions. * *recognisedLeader:* shows the server id of the perceived leader of each primary member. From d45422c8c2dc31b280358369d4c2d50f59a72ba3 Mon Sep 17 00:00:00 2001 From: Tselmeg Baasan <37698237+tselmegbaasan@users.noreply.github.com> Date: Wed, 25 Sep 2024 10:46:18 +0200 Subject: [PATCH 14/14] Update modules/ROOT/pages/clustering/monitoring/status-check.adoc Co-authored-by: NataliaIvakina <82437520+NataliaIvakina@users.noreply.github.com> --- modules/ROOT/pages/clustering/monitoring/status-check.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/modules/ROOT/pages/clustering/monitoring/status-check.adoc b/modules/ROOT/pages/clustering/monitoring/status-check.adoc index ca02b81f7..036d4d0d0 100644 --- a/modules/ROOT/pages/clustering/monitoring/status-check.adoc +++ b/modules/ROOT/pages/clustering/monitoring/status-check.adoc @@ -49,7 +49,7 @@ A failed replication can either mean a real issue in the cluster (e.g., no leade It can be `APPLYING`, `REPLICATING`, or `UNAVAILABLE`. + ** `APPLYING` means that the member can replicate and is actively applying transactions. -`REPLICATING` means that the member can participate in replicating, but can't apply. +** `REPLICATING` means that the member can participate in replicating, but can't apply. This state is uncommon, but may happen while waiting for the database to start and accept transactions. * *recognisedLeader:* shows the server id of the perceived leader of each primary member. * *recognisedLeaderTerm:* shows the term of the perceived leader of each primary member.