Permalink
Browse files

Review the documentation for the replication API (#756)

* Update replication.md

* Update replication.md
  • Loading branch information...
davecramer committed Feb 21, 2017
1 parent 773ee67 commit 3e1eb34d2500eae9fdf4938ebbced4cff886edef
Showing with 23 additions and 35 deletions.
  1. +23 −35 docs/documentation/head/replication.md
@@ -19,10 +19,10 @@ next: thread.html
# Overview
Postgres 9.4 (released in December 2014) introduced a new feature called logical replication. Logical replication allows
changes from database to be streamed in real-time to external system. The difference between streaming replication and
logical replication is that logical replication sends data over in a logical format whereas streaming replication sends data over in a binary format. Additionally logical replication can send over a single table, or database. Streaming replication was all or nothing.
changes from a database to be streamed in real-time to an external system. The difference between physical replication and
logical replication is that logical replication sends data over in a logical format whereas physical replication sends data over in a binary format. Additionally logical replication can send over a single table, or database. Binary replication replicates the entire cluster in an all or nothing fashion; which is to say there is no way to get a specific table or database using binary replication
Before Logical replication keeping an external system synchronized in real time was problematic. The application would have to update/invalidate the appropriate cache entries, reindex the data in your search engine, send it to your analytics system, and so on.
Prior to logical replication keeping an external system synchronized in real time was problematic. The application would have to update/invalidate the appropriate cache entries, reindex the data in your search engine, send it to your analytics system, and so on.
This suffers from race conditions and reliability problems. For example if slightly different data gets written to two different datastores (perhaps due to a bug or a race condition),the contents of the datastores will gradually drift apart — they will become more and more inconsistent over time. Recovering from such gradual data corruption is difficult.
Logical decoding takes the database’s write-ahead log (WAL), and gives us access to row-level change events:
@@ -32,8 +32,7 @@ do not appear in the stream. Thus, if you apply the change events in the same or
transactionally consistent copy of the database. It's looks like the Event Sourcing pattern that you previously implemented
in your application, but now it's available out of the box from the PostgreSQL database.
For access to real-time changes PostgreSQL provides streaming replication protocol. Replication protocol can be physical or
logical. Physical replication protocol is used for Master/Slave replication. Logical replication protocol can be used
For access to real-time changes PostgreSQL provides the streaming replication protocol. Replication protocol can be physical or logical. Physical replication protocol is used for Master/Slave replication. Logical replication protocol can be used
to stream changes to an external system.
@@ -85,11 +84,10 @@ host replication all ::1/128 md5
<a name="logical-replication"></a>
# Logical replication
Logical replication uses a replication slot to reserve WAL logs on the server and also defines which decoding plugin to use to
decode the WAL logs to the required format, for example you can decode changes as json, protobuf, etc . For demonstrate how use pgjdbc replication API will be use `test_decoding` plugin that include to `postgresql-contrib` package, but you can use your own decoding plugin.
Logical replication uses a replication slot to reserve WAL logs on the server and also defines which decoding plugin to use to decode the WAL logs to the required format, for example you can decode changes as json, protobuf, etc. To demonstrate how to use the pgjdbc replication API we will use the `test_decoding` plugin that is include in the `postgresql-contrib` package, but you can use your own decoding plugin. There are a few on github which can be used as examples.
For use replication API, Connection should be create with replication mode, in this mode on connection not available
execute any kinds of sql, this connection can work only with replication API. It's restriction of PostgreSQL.
In order to use the replication API, the Connection has to be created in replication mode, in this mode the connection is not available to
execute SQL commands, and can only be used with replication API. This is a restriction imposed by PostgreSQL.
**Example 9.4. Create replication connection.**
@@ -105,7 +103,7 @@ execute any kinds of sql, this connection can work only with replication API. It
PGConnection replConnection = con.unwrap(PGConnection.class);
```
The whole replication API is grouped in `org.postgresql.replication.PGReplicationConnection` and is available
The entire replication API is grouped in `org.postgresql.replication.PGReplicationConnection` and is available
via `org.postgresql.PGConnection#getReplicationAPI`.
Before you can start replication protocol, you need to have replication slot, which can be also created via pgjdbc API.
@@ -121,7 +119,7 @@ Before you can start replication protocol, you need to have replication slot, wh
.make();
```
Once we have the replication slot, we can create ReplicationStream.
Once we have the replication slot, we can create a ReplicationStream.
**Example 9.6. Create logical replication stream.**
@@ -137,8 +135,7 @@ Once we have the replication slot, we can create ReplicationStream.
```
The replication stream will send all changes since the creation of the replication slot or from replication slot
restart LSN if slot already was use for replication. You can also start streaming changes from particular LSN position,
in that case LNS position should be specified when you create the replication stream.
restart LSN if the slot was already used for replication. You can also start streaming changes from a particular LSN position,in that case LSN position should be specified when you create the replication stream.
**Example 9.7. Create logical replication stream from particular position.**
@@ -176,12 +173,10 @@ table public.test_logic_table: INSERT: pk[integer]:1 name[character varying]:'pr
COMMIT
```
During replication database and consumer periodically exchange ping messages. When database or client do not receive
ping message in configured timeout, replication has been deemed to have stopped and an exception will be thrown and database free resources. In PostgreSQL ping timeout is configured by the property `wal_sender_timeout` (default = 60 seconds).
During replication the database and consumer periodically exchange ping messages. When the database or client do not receive
ping message within the configured timeout, replication has been deemed to have stopped and an exception will be thrown and the database will free resources. In PostgreSQL the ping timeout is configured by the property `wal_sender_timeout` (default = 60 seconds).
Replication stream in pgjdc can be configured to send feedback(ping) when required or by time interval.
It is recommended to send feedback(ping) to database more often than configured `wal_sender_timeout`. In production I use value
equal to `wal_sender_timeout / 3`. It's avoids a potential problems with networks and allow stream changes without
disconnects by timeout. To specify the feedback interval use `withStatusInterval` method.
It is recommended to send feedback(ping) to the database more often than configured `wal_sender_timeout`. In production I use value equal to `wal_sender_timeout / 3`. It's avoids a potential problems with networks and changes to be streamed without disconnects by timeout. To specify the feedback interval use `withStatusInterval` method.
**Example 9.10. Replication stream with configured feedback interval equal to 20 sec**
@@ -200,10 +195,8 @@ disconnects by timeout. To specify the feedback interval use `withStatusInterval
After create `PGReplicationStream`, it's time to start receive changes in real-time. Changes can be received from
stream as blocking(`org.postgresql.replication.PGReplicationStream#read`)
or as non-blocking(`org.postgresql.replication.PGReplicationStream#readPending`).
Both methods receive changes as a `java.nio.ByteBuffer` with the payload from the send output plugin. We can't receive
part of message, only the full message that was sent by the output plugin. ByteBuffer contains message in format that is defined by the
decoding output plugin, it can be simple String, json, or anything. That why pgjdbc return raw ByteBuffer
instead of String or anything.
Both methods receive changes as a `java.nio.ByteBuffer` with the payload from the send output plugin. We can't receive
part of message, only the full message that was sent by the output plugin. ByteBuffer contains message in format that is defined by the decoding output plugin, it can be simple String, json, or whatever the plugin determines. That why pgjdbc returns the raw ByteBuffer instead of making assumptions.
**Example 9.11. Example send message from output plugin.**
@@ -232,17 +225,15 @@ OutputPluginWrite(ctx, true);
}
```
As mentioned previously, replication stream should periodically send feedback to database to prevent disconnect by
timeout. Feedback sends during call `read` or `readPending` if it's time to send feedback. We can also force send
feedback via `org.postgresql.replication.PGReplicationStream#forceUpdateStatus()`. Another important duty of feedback is to provide the server with the LSN that has been successfully received and applied to consumer, it necessary for monitoring and
truncate/archive WAL's that that are no longer needed. In the event that replication has been restarted, it's will start from last successfully processed LSN that was send via feedback to database.
As mentioned previously, replication stream should periodically send feedback to the database to prevent disconnect via
timeout. Feedback is automatically sent when `read` or `readPending` are called if it's time to send feedback. Feedback can also be sent via `org.postgresql.replication.PGReplicationStream#forceUpdateStatus()` regardless of the timeout. Another important duty of feedback is to provide the server with the Logial Sequence Number (LSN) that has been successfully received and applied to consumer, it is necessary for monitoring and to truncate/archive WAL's that that are no longer needed. In the event that replication has been restarted, it's will start from last successfully processed LSN that was sent via feedback to database.
For say database which LSN successfully applied on current consumer and can be truncated/archive you should set it to
The API provides the following feedback mechanism to indicate the successfully applied LSN by the current consumer. LSN's before this can be truncated or archived.
`org.postgresql.replication.PGReplicationStream#setFlushedLSN` and
`org.postgresql.replication.PGReplicationStream#setAppliedLSN`. You always can get last receive LSN via
`org.postgresql.replication.PGReplicationStream#getLastReceiveLSN`.
**Example 9.13. Add feedback about successfully process LSN**
**Example 9.13. Add feedback indicating a successfully process LSN**
```java
while (true) {
@@ -265,7 +256,7 @@ For say database which LSN successfully applied on current consumer and can be t
}
```
**Example 9.14. Full example use logical replication**
**Example 9.14. Full example of logical replication**
```java
String url = "jdbc:postgresql://localhost:5432/test";
@@ -330,7 +321,7 @@ For say database which LSN successfully applied on current consumer and can be t
}
```
Where output looks like this, where each line and separate message.
Where output looks like this, where each line is a separate message.
```
BEGIN
@@ -347,11 +338,8 @@ COMMIT
<a name="physical-replication"></a>
# Physical replication
API for physical replication looks like API for logical replication. Physical replication also not required replication
slot. And ByteBuffer will contains binary form of WAL logs. Binary WAL format very low level API, and can changes from
version to version. That why replication between different PostgreSQL version not working. But physical replication
can contains many important data, that not available via logical replication. That why pgjdc contain implementation for
both.
API for physical replication looks like the API for logical replication. Physical replication does not require a replication
slot. And ByteBuffer will contain the binary form of WAL logs. The binary WAL format is a very low level API, and can change from version to version. That is why replication between different major PostgreSQL versions is not possible. But physical replication can contain many important data, that is not available via logical replication. That is why pgjdc contains an implementation for both.
**Example 9.15. Use physical replication**

0 comments on commit 3e1eb34

Please sign in to comment.