-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add replication protocol API #550
Conversation
Awesome! Thanks for this. I'll try to find some time to review soon |
Problem scenario for logical replication protocol seem like this PGConnection pgConnection = (PGConnection) replConnection;
LogSequenceNumber startLSN = getCurrentLSN();
Statement st = sqlConnection.createStatement();
st.execute("insert into test_logic_table(name) values('message to repeat')");
st.close();
PGReplicationStream stream =
pgConnection
.replicationStream()
.logical()
.withSlotName(SLOT_NAME)
.withStartPosition(startLSN)
.withSlotOption("include-xids", false)
.withSlotOption("skip-empty-xacts", true)
.start();
List<String> result = new ArrayList<String>();
result.addAll(receiveMessage(stream, 3));
stream.close(); Logs:
As you can see, after get CopyDone from backend. Backend still send KeepAlive message. But same scenario on physycal replication work well. |
@davecramer do you think problem describe above postgresql sever bug or bug implementation in driver? |
@Gordiychuk I'd suggest posting to hackers... I don't know enough right now to comment |
@Gordiychuk I suspect this is a bug with the driver implementation. Replication is fairly well tested |
After some code analize, I found problem in postgresql. And right now prepare patch. Inside walsender. WalSndLoop conaint next check /*
* If we don't have any pending data in the output buffer, try to send
* some more. If there is some, we don't bother to call send_data
* again until we've flushed it ... but we'd better assume we are not
* caught up.
*/
if (!pq_is_send_pending())
send_data();
else
WalSndCaughtUp = false; that execute callback for tranform wal record and send it. Then for logical replication executes /*
* read_page callback for logical decoding contexts, as a walsender process.
*
* Inside the walsender we can do better than logical_read_local_xlog_page,
* which has to do a plain sleep/busy loop, because the walsender's latch gets
* set everytime WAL is flushed.
*/
static int
logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetRecPtr, char *cur_page, TimeLineID *pageTLI) and it code start long loop with waiting available to send wals /*
* Wait till WAL < loc is flushed to disk so it can be safely read.
*/
static XLogRecPtr
WalSndWaitForWal(XLogRecPtr loc)
{
int wakeEvents;
static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr;
/*
* Fast path to avoid acquiring the spinlock in the we already know we
* have enough WAL available. This is particularly interesting if we're
* far behind.
*/
if (RecentFlushPtr != InvalidXLogRecPtr &&
loc <= RecentFlushPtr)
return RecentFlushPtr;
/* Get a more recent flush pointer. */
if (!RecoveryInProgress())
RecentFlushPtr = GetFlushRecPtr();
else
RecentFlushPtr = GetXLogReplayRecPtr(NULL);
for (;;)
{
long sleeptime;
TimestampTz now;
/*
* Emergency bailout if postmaster has died. This is to avoid the
* necessity for manual cleanup of all postmaster children.
*/
if (!PostmasterIsAlive())
exit(1);
/* Clear any already-pending wakeups */
ResetLatch(MyLatch);
CHECK_FOR_INTERRUPTS();
/* Process any requests or signals received recently */
if (got_SIGHUP)
{
got_SIGHUP = false;
ProcessConfigFile(PGC_SIGHUP);
SyncRepInitConfig();
}
/* Check for input from the client */
ProcessRepliesIfAny();
/* Update our idea of the currently flushed position. */
if (!RecoveryInProgress())
RecentFlushPtr = GetFlushRecPtr();
else
RecentFlushPtr = GetXLogReplayRecPtr(NULL);
/*
* If postmaster asked us to stop, don't wait here anymore. This will
* cause the xlogreader to return without reading a full record, which
* is the fastest way to reach the mainloop which then can quit.
*
* It's important to do this check after the recomputation of
* RecentFlushPtr, so we can send all remaining data before shutting
* down.
*/
if (walsender_ready_to_stop)
break;
/*
* We only send regular messages to the client for full decoded
* transactions, but a synchronous replication and walsender shutdown
* possibly are waiting for a later location. So we send pings
* containing the flush location every now and then.
*/
if (MyWalSnd->flush < sentPtr &&
MyWalSnd->write < sentPtr &&
!waiting_for_ping_response)
{
WalSndKeepalive(false);
waiting_for_ping_response = true;
}
/* check whether we're done */
if (loc <= RecentFlushPtr)
break;
/* Waiting for new WAL. Since we need to wait, we're now caught up. */
WalSndCaughtUp = true;
/*
* Try to flush pending output to the client. Also wait for the socket
* becoming writable, if there's still pending output after an attempt
* to flush. Otherwise we might just sit on output data while waiting
* for new WAL being generated.
*/
if (pq_flush_if_writable() != 0)
WalSndShutdown();
now = GetCurrentTimestamp();
/* die if timeout was reached */
WalSndCheckTimeOut(now);
/* Send keepalive if the time has come */
WalSndKeepaliveIfNecessary(now);
sleeptime = WalSndComputeSleeptime(now);
wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
WL_SOCKET_READABLE | WL_TIMEOUT;
if (pq_is_send_pending())
wakeEvents |= WL_SOCKET_WRITEABLE;
/* Sleep until something happens or we time out */
WaitLatchOrSocket(MyLatch, wakeEvents,
MyProcPort->sock, sleeptime);
}
/* reactivate latch so WalSndLoop knows to continue */
SetLatch(MyLatch);
return RecentFlushPtr;
} In this sicle cycle after execute ProcessRepliesIfAny(); we get CopyDone command, reply with CopyDone and continue waiting WALs, sending WALs and sending keep alives with ignore streamingDoneReceiving and streamingDoneSending flags. Thats why same test for physical replication work well but for logical fail with timeout. |
The second problem is that postgresql after reply with CopyData still send CopyData messages, it problem presend for logical and physical replication. And I also want fix it. /* Check for input from the client */
ProcessRepliesIfAny();
/*
* If we have received CopyDone from the client, sent CopyDone
* ourselves, and the output buffer is empty, it's time to exit
* streaming.
*/
if (!pq_is_send_pending() && streamingDoneSending && streamingDoneReceiving)
break;
/*
* If we don't have any pending data in the output buffer, try to send
* some more. If there is some, we don't bother to call send_data
* again until we've flushed it ... but we'd better assume we are not
* caught up.
*/
if (!pq_is_send_pending())
send_data();
else
WalSndCaughtUp = false;
/* Try to flush pending output to the client */
if (pq_flush_if_writable() != 0)
WalSndShutdown();
/* If nothing remains to be sent right now ... */
if (WalSndCaughtUp && !pq_is_send_pending())
{
/*
* If we're in catchup state, move to streaming. This is an
* important state change for users to know about, since before
* this point data loss might occur if the primary dies and we
* need to failover to the standby. The state change is also
* important for synchronous replication, since commits that
* started to wait at that point might wait for some time.
*/
if (MyWalSnd->state == WALSNDSTATE_CATCHUP)
{
ereport(DEBUG1,
(errmsg("standby \"%s\" has now caught up with primary",
application_name)));
WalSndSetState(WALSNDSTATE_STREAMING);
}
/*
* When SIGUSR2 arrives, we send any outstanding logs up to the
* shutdown checkpoint record (i.e., the latest record), wait for
* them to be replicated to the standby, and exit. This may be a
* normal termination at shutdown, or a promotion, the walsender
* is not sure which.
*/
if (walsender_ready_to_stop)
WalSndDone(send_data);
}
now = GetCurrentTimestamp();
/* Check for replication timeout. */
WalSndCheckTimeOut(now);
/* Send keepalive if the time has come */
WalSndKeepaliveIfNecessary(now); In ProcessRepliesIfAny(); we get CopyDone and send CopyDone as responce, message still present in output buffer so condition if (!pq_is_send_pending() && streamingDoneSending && streamingDoneReceiving)
break; will return false and we will execute next functions even if streamingDoneSending and streamingDoneReceiving equal to true /* Check for replication timeout. */
WalSndCheckTimeOut(now);
/* Send keepalive if the time has come */
WalSndKeepaliveIfNecessary(now); |
I would suggest posting this to hackers to get their take first |
Logical decoding during decode WALs ignore message that can reponse receiver on XLogData. So during big transaction for example that change 1 million record it can lead to two problem: 1. Receiver can disconect server because it not responce on keepalive message with required respose marker. 2. Receiver can't stop replication, until whole transaction will not send to receiver. Not available stop replication it's main problem. Because receiver will fail during stop replication with timeout and also backend will generate many not network traffic. This problem was found during implement physical\logical replication protocol in pgjdbc driver pgjdbc/pgjdbc#550 And it broke scenario when WALs consumer receive decoded WALs and put it to external system asynchroneze were if some problem occurs callback say which LSN was fail, so we can rollback to last success process LSN and start logical replication again from it place. I measure stopping replication with fix and without by this test: For physical replicaion: LogSequenceNumber startLSN = getCurrentLSN(); Statement st = sqlConnection.createStatement(); st.execute("insert into test_logic_table\n" + " select id, md5(random()::text) as name from generate_series(1, 1000000) as id;"); st.close(); long start = System.nanoTime(); PGReplicationStream stream = pgConnection .replicationStream() .physical() .withStartPosition(startLSN) .start(); //read single message stream.read(); long startStopping = System.nanoTime(); stream.close(); long now = System.nanoTime(); long startAndStopTime = now - start; long stopTime = now - startStopping; System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime)); System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime)); For logical replication: LogSequenceNumber startLSN = getCurrentLSN(); Statement st = sqlConnection.createStatement(); st.execute("insert into test_logic_table\n" + " select id, md5(random()::text) as name from generate_series(1, 1000000) as id;"); st.close(); long start = System.nanoTime(); PGReplicationStream stream = pgConnection .replicationStream() .logical() .withSlotName(SLOT_NAME) .withStartPosition(startLSN) .withSlotOption("include-xids", false) .withSlotOption("skip-empty-xacts", true) .start(); //read single message stream.read(); long startStopping = System.nanoTime(); stream.close(); long now = System.nanoTime(); long startAndStopTime = now - start; long stopTime = now - startStopping; System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime)); System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime)); And get next timing: Before ----- logical start and stopping: 15446ms logical stopping: 13820ms physical start and stopping: 462ms physical stopping: 348 After ----- logical start and stopping: 2424ms logical stopping: 26ms physical start and stopping: 458ms physical stopping: 329ms As you can see, not it allow stop logical replication very fast. For do it, not we check replies first and only after that send decoded data. After get CopyDone from frontend we stoping decoding as soon as possible. The second part of fix, it disable sending keep alive message to frontend if already got CopyDone.
Logical decoding during decode WALs ignore message that can reponse receiver on XLogData. So during big transaction for example that change 1 million record it can lead to two problem: 1. Receiver can disconect server because it not responce on keepalive message with required respose marker. 2. Receiver can't stop replication, until whole transaction will not send to receiver. Not available stop replication it's main problem. Because receiver will fail during stop replication with timeout and also backend will generate many not network traffic. This problem was found during implement physical\logical replication protocol in pgjdbc driver pgjdbc/pgjdbc#550 And it broke scenario when WALs consumer receive decoded WALs and put it to external system asynchroneze were if some problem occurs callback say which LSN was fail, so we can rollback to last success process LSN and start logical replication again from it place. I measure stopping replication with fix and without by this test: For physical replicaion: LogSequenceNumber startLSN = getCurrentLSN(); Statement st = sqlConnection.createStatement(); st.execute("insert into test_logic_table\n" + " select id, md5(random()::text) as name from generate_series(1, 1000000) as id;"); st.close(); long start = System.nanoTime(); PGReplicationStream stream = pgConnection .replicationStream() .physical() .withStartPosition(startLSN) .start(); //read single message stream.read(); long startStopping = System.nanoTime(); stream.close(); long now = System.nanoTime(); long startAndStopTime = now - start; long stopTime = now - startStopping; System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime)); System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime)); For logical replication: LogSequenceNumber startLSN = getCurrentLSN(); Statement st = sqlConnection.createStatement(); st.execute("insert into test_logic_table\n" + " select id, md5(random()::text) as name from generate_series(1, 1000000) as id;"); st.close(); long start = System.nanoTime(); PGReplicationStream stream = pgConnection .replicationStream() .logical() .withSlotName(SLOT_NAME) .withStartPosition(startLSN) .withSlotOption("include-xids", false) .withSlotOption("skip-empty-xacts", true) .start(); //read single message stream.read(); long startStopping = System.nanoTime(); stream.close(); long now = System.nanoTime(); long startAndStopTime = now - start; long stopTime = now - startStopping; System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime)); System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime)); And get next timing: Before ----- logical start and stopping: 15446ms logical stopping: 13820ms physical start and stopping: 462ms physical stopping: 348 After ----- logical start and stopping: 2424ms logical stopping: 26ms physical start and stopping: 458ms physical stopping: 329ms As you can see, not it allow stop logical replication very fast. For do it, not we check replies first and only after that send decoded data. After get CopyDone from frontend we stoping decoding as soon as possible. The second part of fix, it disable sending keep alive message to frontend if already got CopyDone.
Logical decoding during decode WALs ignore message that can reponse receiver on XLogData. So during big transaction for example that change 1 million record it can lead to two problem: 1. Receiver can disconect server because it not responce on keepalive message with required respose marker. 2. Receiver can't stop replication, until whole transaction will not send to receiver. Not available stop replication it's main problem. Because receiver will fail during stop replication with timeout and also backend will generate many not network traffic. This problem was found during implement physical\logical replication protocol in pgjdbc driver pgjdbc/pgjdbc#550 And it broke scenario when WALs consumer receive decoded WALs and put it to external system asynchroneze were if some problem occurs callback say which LSN was fail, so we can rollback to last success process LSN and start logical replication again from it place. I measure stopping replication with fix and without by this test: For physical replicaion: LogSequenceNumber startLSN = getCurrentLSN(); Statement st = sqlConnection.createStatement(); st.execute("insert into test_logic_table\n" + " select id, md5(random()::text) as name from generate_series(1, 1000000) as id;"); st.close(); long start = System.nanoTime(); PGReplicationStream stream = pgConnection .replicationStream() .physical() .withStartPosition(startLSN) .start(); //read single message stream.read(); long startStopping = System.nanoTime(); stream.close(); long now = System.nanoTime(); long startAndStopTime = now - start; long stopTime = now - startStopping; System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime)); System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime)); For logical replication: LogSequenceNumber startLSN = getCurrentLSN(); Statement st = sqlConnection.createStatement(); st.execute("insert into test_logic_table\n" + " select id, md5(random()::text) as name from generate_series(1, 1000000) as id;"); st.close(); long start = System.nanoTime(); PGReplicationStream stream = pgConnection .replicationStream() .logical() .withSlotName(SLOT_NAME) .withStartPosition(startLSN) .withSlotOption("include-xids", false) .withSlotOption("skip-empty-xacts", true) .start(); //read single message stream.read(); long startStopping = System.nanoTime(); stream.close(); long now = System.nanoTime(); long startAndStopTime = now - start; long stopTime = now - startStopping; System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime)); System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime)); And get next timing: Before ----- logical start and stopping: 15446ms logical stopping: 13820ms physical start and stopping: 462ms physical stopping: 348 After ----- logical start and stopping: 2424ms logical stopping: 26ms physical start and stopping: 458ms physical stopping: 329ms As you can see, not it allow stop logical replication very fast. For do it, not we check replies first and only after that send decoded data. After get CopyDone from frontend we stoping decoding as soon as possible. The second part of fix, it disable sending keep alive message to frontend if already got CopyDone.
Logical decoding during decode WALs ignore message that can reponse receiver on XLogData. So during big transaction for example that change 1 million record it can lead to two problem: 1. Receiver can disconect server because it not responce on keepalive message with required respose marker. 2. Receiver can't stop replication, until whole transaction will not send to receiver. Not available stop replication it's main problem. Because receiver will fail during stop replication with timeout and also backend will generate many not network traffic. This problem was found during implement physical\logical replication protocol in pgjdbc driver pgjdbc/pgjdbc#550 And it broke scenario when WALs consumer receive decoded WALs and put it to external system asynchroneze were if some problem occurs callback say which LSN was fail, so we can rollback to last success process LSN and start logical replication again from it place. I measure stopping replication with fix and without by this test: For physical replicaion: LogSequenceNumber startLSN = getCurrentLSN(); Statement st = sqlConnection.createStatement(); st.execute("insert into test_logic_table\n" + " select id, md5(random()::text) as name from generate_series(1, 1000000) as id;"); st.close(); long start = System.nanoTime(); PGReplicationStream stream = pgConnection .replicationStream() .physical() .withStartPosition(startLSN) .start(); //read single message stream.read(); long startStopping = System.nanoTime(); stream.close(); long now = System.nanoTime(); long startAndStopTime = now - start; long stopTime = now - startStopping; System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime)); System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime)); For logical replication: LogSequenceNumber startLSN = getCurrentLSN(); Statement st = sqlConnection.createStatement(); st.execute("insert into test_logic_table\n" + " select id, md5(random()::text) as name from generate_series(1, 1000000) as id;"); st.close(); long start = System.nanoTime(); PGReplicationStream stream = pgConnection .replicationStream() .logical() .withSlotName(SLOT_NAME) .withStartPosition(startLSN) .withSlotOption("include-xids", false) .withSlotOption("skip-empty-xacts", true) .start(); //read single message stream.read(); long startStopping = System.nanoTime(); stream.close(); long now = System.nanoTime(); long startAndStopTime = now - start; long stopTime = now - startStopping; System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime)); System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime)); And get next timing: Before ----- logical start and stopping: 15446ms logical stopping: 13820ms physical start and stopping: 462ms physical stopping: 348 After ----- logical start and stopping: 2424ms logical stopping: 26ms physical start and stopping: 458ms physical stopping: 329ms As you can see, not it allow stop logical replication very fast. For do it, not we check replies first and only after that send decoded data. After get CopyDone from frontend we stoping decoding as soon as possible. The second part of fix, it disable sending keep alive message to frontend if already got CopyDone.
Logical decoding during decode WALs ignore message that can reponse receiver on XLogData. So during big transaction for example that change 1 million record it can lead to two problem: 1. Receiver can disconect server because it not responce on keepalive message with required respose marker. 2. Receiver can't stop replication, until whole transaction will not send to receiver. Not available stop replication it's main problem. Because receiver will fail during stop replication with timeout and also backend will generate many not network traffic. This problem was found during implement physical\logical replication protocol in pgjdbc driver pgjdbc/pgjdbc#550 And it broke scenario when WALs consumer receive decoded WALs and put it to external system asynchroneze were if some problem occurs callback say which LSN was fail, so we can rollback to last success process LSN and start logical replication again from it place. I measure stopping replication with fix and without by this test: For physical replicaion: LogSequenceNumber startLSN = getCurrentLSN(); Statement st = sqlConnection.createStatement(); st.execute("insert into test_logic_table\n" + " select id, md5(random()::text) as name from generate_series(1, 1000000) as id;"); st.close(); long start = System.nanoTime(); PGReplicationStream stream = pgConnection .replicationStream() .physical() .withStartPosition(startLSN) .start(); //read single message stream.read(); long startStopping = System.nanoTime(); stream.close(); long now = System.nanoTime(); long startAndStopTime = now - start; long stopTime = now - startStopping; System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime)); System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime)); For logical replication: LogSequenceNumber startLSN = getCurrentLSN(); Statement st = sqlConnection.createStatement(); st.execute("insert into test_logic_table\n" + " select id, md5(random()::text) as name from generate_series(1, 1000000) as id;"); st.close(); long start = System.nanoTime(); PGReplicationStream stream = pgConnection .replicationStream() .logical() .withSlotName(SLOT_NAME) .withStartPosition(startLSN) .withSlotOption("include-xids", false) .withSlotOption("skip-empty-xacts", true) .start(); //read single message stream.read(); long startStopping = System.nanoTime(); stream.close(); long now = System.nanoTime(); long startAndStopTime = now - start; long stopTime = now - startStopping; System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime)); System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime)); And get next timing: Before ----- logical start and stopping: 15446ms logical stopping: 13820ms physical start and stopping: 462ms physical stopping: 348 After ----- logical start and stopping: 2424ms logical stopping: 26ms physical start and stopping: 458ms physical stopping: 329ms As you can see, not it allow stop logical replication very fast. For do it, not we check replies first and only after that send decoded data. After get CopyDone from frontend we stoping decoding as soon as possible. The second part of fix, it disable sending keep alive message to frontend if already got CopyDone.
03b9f72
to
86474f3
Compare
Awesome. I'm really excited to get this in. I'd like to get some clarification around the applied/flushed LSN handling. In this code snippet, there is no handling of flushed or applied LSN: while (true) {
ByteBuffer read = stream.read();
//process binary WAL logs
} I'm trying to grok how manual handling LSN data would work. Something like this: while (true) {
ByteBuffer read = stream.read();
// this is the current LSN of the copy data read above?
LogSequenceNumber lsn = stream.getLastReceiveLSN();
// do some external async processing
processor.doAsyncProcessing(lsn, read);
// for status updates to the server
// get LSN apply/flush data from the async processor
stream.setAppliedLSN(processor.getLastAppliedLSN());
stream.setFlushedLSN(processor.getLastFlushedLSN());
} ? Let me know if you'd like any help. I'm really interested in this feature. It would really help simplify the current C based system we use for this today. |
@jarreds Yes. And for logical decoding
can be skip, because parameter uses only for physical replication. I also think that need method for check active stream or not, in stream present or not pending messages. Because I think iterate in a cycle forever is not always convenient. |
@davecramer I've posted patch with describe problen to pgsql-hackers http://www.postgresql.org/message-id/CAFgjRd3hdYOa33m69TbeOfNNer2BZbwa8FFjt2V5VFzTBvUU3w@mail.gmail.com |
Current coverage is 64.08% (diff: 80.72%)@@ master #550 diff @@
==========================================
Files 151 165 +14
Lines 14787 15129 +342
Methods 0 0
Messages 0 0
Branches 2934 2978 +44
==========================================
+ Hits 9366 9695 +329
+ Misses 4213 4200 -13
- Partials 1208 1234 +26
|
7e20bc9
to
9236c1a
Compare
@Gordiychuk So given that the server won't fix their end until 9.7 where does that leave us? On another note Travis is failing on a few things. checkstyle and test does not have replication privileges. Checkstyle should be relatively easy to fix you can run mvn checkstyle:check to find the errors. Your tests may have to run as user postgres to get them to work... |
@davecramer No, we can delivery solution with assumption that org.postgresql.replication.PGReplicationStream#close can take long time and as workaround can be use Connection#close method until fixes will not include to postgresql.
Sorry, on this week I have not much time. I am planning finish api and fix CI problems on this weekend. After that start implementation by #558 issue. I think they should be merged to master together. |
@Gordiychuk Thanks! No rush |
97a0b20
to
b6dff39
Compare
addons: | ||
postgresql: "9.4" | ||
sudo: required | ||
dist: trusty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 for converting lots of jobs to sudo: required
. It will make Travis a lot slower
8c84236
to
167a696
Compare
Now user in tests should have replication privilege and also postgres should be configure for accept replication connection, it necessary for test replication protocol
Replication protocol use CopyBothResponce package for initial bidirectional copy protocol. Then WAL data is sent as a series of CopyData messages. With periodical exchange KeepAlive package.
Replication for protocol version 3 work via CopyAPI. For protocol version 2 replication protocol not supports. Main class for work with replication protocol it PGReplicationStream. It class hide low level replication protocol and prediodical update status messages allow work only with payload. Current implementation faced with logical replication protocol bug: After close ReplicationStream backend not send CommandCompleate and ReadyForQuery packages. As result it bug broke scenario when from replication stream fetches wals and send to another system for example elasticsearch asynchonize - after get first problem during asynchronize message send, replication protocol close and restart replication from last success send wal record. Example logical API: PGReplicationStream stream = pgConnection .replicationStream() .logical() .withSlotName("test_decoding") .withSlotOption("include-xids", false) .withSlotOption("skip-empty-xacts", true) .start(); while (true) { ByteBuffer buffer = stream.read(); //process logical changes } Example physical API: LogSequenceNumber lsn = getCurrentLSN(); PGReplicationStream stream = pgConnection .replicationStream() .physical() .withStartPosition(lsn) .start(); while (true) { ByteBuffer read = stream.read(); //process binary WAL logs }
…cking Periodical check replication stream allow for example process asynchronize feedback from external system that accept changes from logical replication stream more faster, because waiting new changes may take a lot of time that not allow get feedback messages from external system where message can contain fail and we should restart replication from fail position but we can't because new changes absent in replication stream. Example use: PGReplicationStream stream = pgConnection .replicationStream() .logical() .withSlotName(SLOT_NAME) .withStartPosition(getCurrentLSN()) .start(); while (true) { ByteBuffer result = stream.readPending(); if(result == null) { TimeUnit.MILLISECONDS.sleep(10); continue; } //process message }
org.postgresql.replication.PGReplication necessary for extend replication commands in futures. Example start logical replication: pgConnection .getReplicationAPI() .createReplicationSlot() .logical() .withSlotName("mySlot") .withOutputPlugin("test_decoding") .make(); PGReplicationStream stream = pgConnection .getReplicationAPI() .replicationStream() .logical() .withSlotName("test_decoding") .withSlotName("mySlot") .withSlotOption("include-xids", false) .withSlotOption("skip-empty-xacts", true) .start();
When first message received from server it keep allive we set as last receive lsn last lsn position on server, as result during reply on keep alive we can lost reserved wal. Also during process XLogData we can't use last server lsn from message.
Official documentation say that status update should send back lsn + offcet that was write or apply to dist: "The location of the last WAL byte + 1 received and written to disk in the standby.". In our case it means that last receive LSN should contains also payload offset.
ac59499
to
e04d171
Compare
Check max_wal_senders not enough for decide, run or not replication tests because replication can be configure on server, but tests user doesn't have grants to use replication. For deal with this scenario now we check not only configured replication but also that user have replication grants.
e04d171
to
518444e
Compare
import java.util.Queue; | ||
|
||
public class CopyDualImpl extends CopyOperationImpl implements CopyDual { | ||
private Queue<byte[]> received = new LinkedList<byte[]>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the history: ArrayDeque
should be preferred over LinkedList
@Gordiychuk , thanks for pushing it forward. |
The replication protocol is managed by PGReplicationStream. It hides low level replication protocol details and enables end user deal with just payload data. The entry point is `PGConnection#getReplicationAPI`. Current PostgreSQL backend has issues with terminating of replication connection (e.g. "stop decode" message might be ignored for a while, so termination would take some time). Relevant hacker's thread is https://www.postgresql.org/message-id/CAFgjRd3hdYOa33m69TbeOfNNer2BZbwa8FFjt2V5VFzTBvUU3w%40mail.gmail.com Locgical replication API: PGReplicationStream stream = pgConnection .replicationStream() .logical() .withSlotName("test_decoding") .withSlotOption("include-xids", false) .withSlotOption("skip-empty-xacts", true) .start(); while (true) { ByteBuffer buffer = stream.read(); //process logical changes } Physical replication API: LogSequenceNumber lsn = getCurrentLSN(); PGReplicationStream stream = pgConnection .replicationStream() .physical() .withStartPosition(lsn) .start(); while (true) { ByteBuffer read = stream.read(); //process binary WAL logs } The main purpose for supporting of the replication protocol at the driver level is to provide an ability to create realtime time integration with external systems (e.g. Kafka+ElasticSearch)
I know this is already merged, but is V2ReplicationProtocol.java necessary? The driver already dropped support for v2 protocol and it looks that V2ReplicationProtocol.java is not used anywhere. |
@jorsol , seems it can be dropped. I just did not catch that. |
* docs: Replication protocol PR contains sample of use replication protocol implemented in PR pgjdbc/pgjdbc#550 * Update replication.md @Gordyichuck, hopefully I haven't changed the essence ? * Move replication to extension sub group
@Gordiychuk I'd like to add Autoclosable to both PGReplicationStream and PGReplicationConnection? Thoughts ? |
@davecramer good idea |
AutoCloseable is a Java 7 feature, the support for Java 6 will be dropped in the near future? |
Replication for protocol version 3 works via CopyAPI. Replication protocol is not supported for protocol version 2 .
The main class used is PGReplicationStream. This class encapsulates the low level
replication protocol and periodical update status messages. The output of this is the decode plugin payload.
Current implementation is faced with logical replication protocol bug:
After close ReplicationStream backend not send CommandCompleate and ReadyForQuery packages.
As result it bug broke scenario when from replication stream fetches WAL and send to another system for example elasticsearch asynchonize - after get first problem during asynchronize message send, replication protocol close and restart replication from last success send WAL record.
Example logical API:
Example physical API:
The main purpose for add replication protocol to driver - it logical replication and ability create realtime t integration with external system(for me it kafka+elasticsearch)