feat: add replication protocol API #550

Gordiychuk · 2016-05-03T19:04:43Z

Replication for protocol version 3 works via CopyAPI. Replication protocol is not supported for protocol version 2 .
The main class used is PGReplicationStream. This class encapsulates the low level
replication protocol and periodical update status messages. The output of this is the decode plugin payload.

Current implementation is faced with logical replication protocol bug:
After close ReplicationStream backend not send CommandCompleate and ReadyForQuery packages.
As result it bug broke scenario when from replication stream fetches WAL and send to another system for example elasticsearch asynchonize - after get first problem during asynchronize message send, replication protocol close and restart replication from last success send WAL record.

Example logical API:

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .logical()
            .withSlotName("test_decoding")
            .withSlotOption("include-xids", false)
            .withSlotOption("skip-empty-xacts", true)
            .start();
    while (true) {
      ByteBuffer buffer = stream.read();
      //process logical changes
    }

Example physical API:

    LogSequenceNumber lsn = getCurrentLSN();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .physical()
            .withStartPosition(lsn)
            .start();

    while (true) {
      ByteBuffer read = stream.read();
      //process binary WAL logs
    }

The main purpose for add replication protocol to driver - it logical replication and ability create realtime t integration with external system(for me it kafka+elasticsearch)

davecramer · 2016-05-03T19:06:29Z

Awesome! Thanks for this. I'll try to find some time to review soon

Gordiychuk · 2016-05-03T19:19:00Z

Problem scenario for logical replication protocol seem like this

  PGConnection pgConnection = (PGConnection) replConnection;

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table(name) values('message to repeat')");
    st.close();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .logical()
            .withSlotName(SLOT_NAME)
            .withStartPosition(startLSN)
            .withSlotOption("include-xids", false)
            .withSlotOption("skip-empty-xacts", true)
            .start();

    List<String> result = new ArrayList<String>();
    result.addAll(receiveMessage(stream, 3));

    stream.close();

Logs:

22:13:14.087 (2)  FE=> StartReplication(query: START_REPLICATION SLOT pgjdbc_logical_replication_slot LOGICAL 0/18FCFD0 ("include-xids" 'false', "skip-empty-xacts" 'true'))
22:13:14.087 (2)  FE=> Query(CopyStart)
22:13:14.088 (2)  <=BE CopyBothResponse
22:13:14.093 (2)  FE=> StandbyStatusUpdate(received: 0/18FCFD0, flushed: 0/0, applied: 0/0, clock: Tue May 03 22:13:14 MSK 2016)
22:13:14.094 (2)  FE=> CopyData(34)
22:13:14.094 (2)  <=BE CopyData
22:13:14.094 (2) k    ���� ���`�� 
22:13:14.094 (2)  <=BE CopyData
22:13:14.094 (2) w                        BEGIN
22:13:14.095 (2)   <=BE Keepalive(lastServerWal: 0/18FCFD0, clock: Tue May 03 22:13:14 MSK 2016 needReply: false)
22:13:14.095 (2)   <=BE XLogData(currWal: 0/0, lastServerWal: 0/0, clock: 0)
22:13:14.095 (2)  <=BE CopyData
22:13:14.095 (2) w    ��Р    ��Р        table public.test_logic_table: INSERT: pk[integer]:1 name[character varying]:'message to repeat'
22:13:14.096 (2)   <=BE XLogData(currWal: 0/18FD0A0, lastServerWal: 0/18FD0A0, clock: 0)
22:13:14.096 (2)  <=BE CopyData
22:13:14.096 (2) w    ��Ѱ    ��Ѱ        COMMIT
22:13:14.096 (2)   <=BE XLogData(currWal: 0/18FD1B0, lastServerWal: 0/18FD1B0, clock: 0)
22:13:14.096 (2)  FE=> StopReplication
22:13:14.096 (2)  <=BE CopyData
22:13:14.096 (2) k    ��Ѱ ���`�' 
22:13:14.096 (2)  FE=> CopyDone
22:13:14.097 (2)  <=BE CopyDone
22:13:14.097 (2)  <=BE CopyData
22:13:14.097 (2) k    ��Ѱ ���`�� 

org.postgresql.util.PSQLException: Database connection failed when ending copy

    at org.postgresql.core.v3.QueryExecutorImpl.endCopy(QueryExecutorImpl.java:834)
    at org.postgresql.core.v3.CopyDualImpl.endCopy(CopyDualImpl.java:23)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.close(V3PGReplicationStream.java:244)
    at org.postgresql.replication.LogicalReplicationTest.testRepeatWalPositionTwice(LogicalReplicationTest.java:281)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
    at org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:234)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:74)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:170)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:143)
    at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:112)
    at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:70)
    at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:283)
    at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:947)
    at org.postgresql.core.v3.QueryExecutorImpl.endCopy(QueryExecutorImpl.java:830)
    ... 33 more

As you can see, after get CopyDone from backend. Backend still send KeepAlive message. But same scenario on physycal replication work well.

Gordiychuk · 2016-05-03T19:21:46Z

@davecramer do you think problem describe above postgresql sever bug or bug implementation in driver?

davecramer · 2016-05-03T19:24:07Z

@Gordiychuk I'd suggest posting to hackers... I don't know enough right now to comment

davecramer · 2016-05-04T11:17:13Z

@Gordiychuk I suspect this is a bug with the driver implementation. Replication is fairly well tested

Gordiychuk · 2016-05-04T19:58:56Z

After some code analize, I found problem in postgresql. And right now prepare patch. Inside walsender. WalSndLoop conaint next check

        /*
         * If we don't have any pending data in the output buffer, try to send
         * some more.  If there is some, we don't bother to call send_data
         * again until we've flushed it ... but we'd better assume we are not
         * caught up.
         */
        if (!pq_is_send_pending())
            send_data();
        else
            WalSndCaughtUp = false;

that execute callback for tranform wal record and send it. Then for logical replication executes

/*
 * read_page callback for logical decoding contexts, as a walsender process.
 *
 * Inside the walsender we can do better than logical_read_local_xlog_page,
 * which has to do a plain sleep/busy loop, because the walsender's latch gets
 * set everytime WAL is flushed.
 */
static int
logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
                XLogRecPtr targetRecPtr, char *cur_page, TimeLineID *pageTLI)

and it code start long loop with waiting available to send wals

/*
 * Wait till WAL < loc is flushed to disk so it can be safely read.
 */
static XLogRecPtr
WalSndWaitForWal(XLogRecPtr loc)
{
    int         wakeEvents;
    static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr;


    /*
     * Fast path to avoid acquiring the spinlock in the we already know we
     * have enough WAL available. This is particularly interesting if we're
     * far behind.
     */
    if (RecentFlushPtr != InvalidXLogRecPtr &&
        loc <= RecentFlushPtr)
        return RecentFlushPtr;

    /* Get a more recent flush pointer. */
    if (!RecoveryInProgress())
        RecentFlushPtr = GetFlushRecPtr();
    else
        RecentFlushPtr = GetXLogReplayRecPtr(NULL);

    for (;;)
    {
        long        sleeptime;
        TimestampTz now;

        /*
         * Emergency bailout if postmaster has died.  This is to avoid the
         * necessity for manual cleanup of all postmaster children.
         */
        if (!PostmasterIsAlive())
            exit(1);

        /* Clear any already-pending wakeups */
        ResetLatch(MyLatch);

        CHECK_FOR_INTERRUPTS();

        /* Process any requests or signals received recently */
        if (got_SIGHUP)
        {
            got_SIGHUP = false;
            ProcessConfigFile(PGC_SIGHUP);
            SyncRepInitConfig();
        }

        /* Check for input from the client */
        ProcessRepliesIfAny();

        /* Update our idea of the currently flushed position. */
        if (!RecoveryInProgress())
            RecentFlushPtr = GetFlushRecPtr();
        else
            RecentFlushPtr = GetXLogReplayRecPtr(NULL);

        /*
         * If postmaster asked us to stop, don't wait here anymore. This will
         * cause the xlogreader to return without reading a full record, which
         * is the fastest way to reach the mainloop which then can quit.
         *
         * It's important to do this check after the recomputation of
         * RecentFlushPtr, so we can send all remaining data before shutting
         * down.
         */
        if (walsender_ready_to_stop)
            break;

        /*
         * We only send regular messages to the client for full decoded
         * transactions, but a synchronous replication and walsender shutdown
         * possibly are waiting for a later location. So we send pings
         * containing the flush location every now and then.
         */
        if (MyWalSnd->flush < sentPtr &&
            MyWalSnd->write < sentPtr &&
            !waiting_for_ping_response)
        {
            WalSndKeepalive(false);
            waiting_for_ping_response = true;
        }

        /* check whether we're done */
        if (loc <= RecentFlushPtr)
            break;

        /* Waiting for new WAL. Since we need to wait, we're now caught up. */
        WalSndCaughtUp = true;

        /*
         * Try to flush pending output to the client. Also wait for the socket
         * becoming writable, if there's still pending output after an attempt
         * to flush. Otherwise we might just sit on output data while waiting
         * for new WAL being generated.
         */
        if (pq_flush_if_writable() != 0)
            WalSndShutdown();

        now = GetCurrentTimestamp();

        /* die if timeout was reached */
        WalSndCheckTimeOut(now);

        /* Send keepalive if the time has come */
        WalSndKeepaliveIfNecessary(now);

        sleeptime = WalSndComputeSleeptime(now);

        wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
            WL_SOCKET_READABLE | WL_TIMEOUT;

        if (pq_is_send_pending())
            wakeEvents |= WL_SOCKET_WRITEABLE;

        /* Sleep until something happens or we time out */
        WaitLatchOrSocket(MyLatch, wakeEvents,
                          MyProcPort->sock, sleeptime);
    }

    /* reactivate latch so WalSndLoop knows to continue */
    SetLatch(MyLatch);
    return RecentFlushPtr;
}

In this sicle cycle after execute ProcessRepliesIfAny(); we get CopyDone command, reply with CopyDone and continue waiting WALs, sending WALs and sending keep alives with ignore streamingDoneReceiving and streamingDoneSending flags. Thats why same test for physical replication work well but for logical fail with timeout.

Gordiychuk · 2016-05-04T20:05:01Z

The second problem is that postgresql after reply with CopyData still send CopyData messages, it problem presend for logical and physical replication. And I also want fix it.

        /* Check for input from the client */
        ProcessRepliesIfAny();

        /*
         * If we have received CopyDone from the client, sent CopyDone
         * ourselves, and the output buffer is empty, it's time to exit
         * streaming.
         */
        if (!pq_is_send_pending() && streamingDoneSending && streamingDoneReceiving)
            break;

        /*
         * If we don't have any pending data in the output buffer, try to send
         * some more.  If there is some, we don't bother to call send_data
         * again until we've flushed it ... but we'd better assume we are not
         * caught up.
         */
        if (!pq_is_send_pending())
            send_data();
        else
            WalSndCaughtUp = false;

        /* Try to flush pending output to the client */
        if (pq_flush_if_writable() != 0)
            WalSndShutdown();

        /* If nothing remains to be sent right now ... */
        if (WalSndCaughtUp && !pq_is_send_pending())
        {
            /*
             * If we're in catchup state, move to streaming.  This is an
             * important state change for users to know about, since before
             * this point data loss might occur if the primary dies and we
             * need to failover to the standby. The state change is also
             * important for synchronous replication, since commits that
             * started to wait at that point might wait for some time.
             */
            if (MyWalSnd->state == WALSNDSTATE_CATCHUP)
            {
                ereport(DEBUG1,
                     (errmsg("standby \"%s\" has now caught up with primary",
                             application_name)));
                WalSndSetState(WALSNDSTATE_STREAMING);
            }

            /*
             * When SIGUSR2 arrives, we send any outstanding logs up to the
             * shutdown checkpoint record (i.e., the latest record), wait for
             * them to be replicated to the standby, and exit. This may be a
             * normal termination at shutdown, or a promotion, the walsender
             * is not sure which.
             */
            if (walsender_ready_to_stop)
                WalSndDone(send_data);
        }

        now = GetCurrentTimestamp();

        /* Check for replication timeout. */
        WalSndCheckTimeOut(now);

        /* Send keepalive if the time has come */
        WalSndKeepaliveIfNecessary(now);

In ProcessRepliesIfAny(); we get CopyDone and send CopyDone as responce, message still present in output buffer so condition

        if (!pq_is_send_pending() && streamingDoneSending && streamingDoneReceiving)
            break;

will return false and we will execute next functions even if streamingDoneSending and streamingDoneReceiving equal to true

        /* Check for replication timeout. */
        WalSndCheckTimeOut(now);

        /* Send keepalive if the time has come */
        WalSndKeepaliveIfNecessary(now);

davecramer · 2016-05-04T21:12:41Z

I would suggest posting this to hackers to get their take first

Logical decoding during decode WALs ignore message that can reponse receiver on XLogData. So during big transaction for example that change 1 million record it can lead to two problem: 1. Receiver can disconect server because it not responce on keepalive message with required respose marker. 2. Receiver can't stop replication, until whole transaction will not send to receiver. Not available stop replication it's main problem. Because receiver will fail during stop replication with timeout and also backend will generate many not network traffic. This problem was found during implement physical\logical replication protocol in pgjdbc driver pgjdbc/pgjdbc#550 And it broke scenario when WALs consumer receive decoded WALs and put it to external system asynchroneze were if some problem occurs callback say which LSN was fail, so we can rollback to last success process LSN and start logical replication again from it place. I measure stopping replication with fix and without by this test: For physical replicaion: LogSequenceNumber startLSN = getCurrentLSN(); Statement st = sqlConnection.createStatement(); st.execute("insert into test_logic_table\n" + " select id, md5(random()::text) as name from generate_series(1, 1000000) as id;"); st.close(); long start = System.nanoTime(); PGReplicationStream stream = pgConnection .replicationStream() .physical() .withStartPosition(startLSN) .start(); //read single message stream.read(); long startStopping = System.nanoTime(); stream.close(); long now = System.nanoTime(); long startAndStopTime = now - start; long stopTime = now - startStopping; System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime)); System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime)); For logical replication: LogSequenceNumber startLSN = getCurrentLSN(); Statement st = sqlConnection.createStatement(); st.execute("insert into test_logic_table\n" + " select id, md5(random()::text) as name from generate_series(1, 1000000) as id;"); st.close(); long start = System.nanoTime(); PGReplicationStream stream = pgConnection .replicationStream() .logical() .withSlotName(SLOT_NAME) .withStartPosition(startLSN) .withSlotOption("include-xids", false) .withSlotOption("skip-empty-xacts", true) .start(); //read single message stream.read(); long startStopping = System.nanoTime(); stream.close(); long now = System.nanoTime(); long startAndStopTime = now - start; long stopTime = now - startStopping; System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime)); System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime)); And get next timing: Before ----- logical start and stopping: 15446ms logical stopping: 13820ms physical start and stopping: 462ms physical stopping: 348 After ----- logical start and stopping: 2424ms logical stopping: 26ms physical start and stopping: 458ms physical stopping: 329ms As you can see, not it allow stop logical replication very fast. For do it, not we check replies first and only after that send decoded data. After get CopyDone from frontend we stoping decoding as soon as possible. The second part of fix, it disable sending keep alive message to frontend if already got CopyDone.

jarreds · 2016-05-07T15:57:52Z

Awesome. I'm really excited to get this in. I'd like to get some clarification around the applied/flushed LSN handling.

In this code snippet, there is no handling of flushed or applied LSN:

    while (true) {
      ByteBuffer read = stream.read();
      //process binary WAL logs
    }

I'm trying to grok how manual handling LSN data would work. Something like this:

    while (true) {
      ByteBuffer read = stream.read();

      // this is the current LSN of the copy data read above?
      LogSequenceNumber lsn = stream.getLastReceiveLSN();

      // do some external async processing
      processor.doAsyncProcessing(lsn, read);

      // for status updates to the server
      // get LSN apply/flush data from the async processor
      stream.setAppliedLSN(processor.getLastAppliedLSN());
      stream.setFlushedLSN(processor.getLastFlushedLSN());
    }

?

Let me know if you'd like any help. I'm really interested in this feature. It would really help simplify the current C based system we use for this today.

Gordiychuk · 2016-05-07T16:21:00Z

@jarreds Yes. And for logical decoding

stream.setAppliedLSN(processor.getLastAppliedLSN());

can be skip, because parameter uses only for physical replication.

I also think that need method for check active stream or not, in stream present or not pending messages. Because I think iterate in a cycle forever is not always convenient.

Gordiychuk · 2016-05-07T16:23:18Z

@davecramer I've posted patch with describe problen to pgsql-hackers http://www.postgresql.org/message-id/CAFgjRd3hdYOa33m69TbeOfNNer2BZbwa8FFjt2V5VFzTBvUU3w@mail.gmail.com

codecov-io · 2016-05-12T06:20:19Z

Current coverage is 64.08% (diff: 80.72%)

Merging #550 into master will increase coverage by 0.74%

@@             master       #550   diff @@
==========================================
  Files           151        165    +14   
  Lines         14787      15129   +342   
  Methods           0          0          
  Messages          0          0          
  Branches       2934       2978    +44   
==========================================
+ Hits           9366       9695   +329   
+ Misses         4213       4200    -13   
- Partials       1208       1234    +26

Powered by Codecov. Last update d32b077...518444e

davecramer · 2016-05-12T13:37:48Z

@Gordiychuk So given that the server won't fix their end until 9.7 where does that leave us?

On another note Travis is failing on a few things. checkstyle and test does not have replication privileges.

Checkstyle should be relatively easy to fix you can run mvn checkstyle:check to find the errors.

Your tests may have to run as user postgres to get them to work...

Gordiychuk · 2016-05-12T14:17:05Z

@Gordiychuk So given that the server won't fix their end until 9.7 where does that leave us?

@davecramer No, we can delivery solution with assumption that org.postgresql.replication.PGReplicationStream#close can take long time and as workaround can be use Connection#close method until fixes will not include to postgresql.

On another note Travis is failing on a few things. checkstyle and test does not have replication privileges.
Checkstyle should be relatively easy to fix you can run mvn checkstyle:check to find the errors.
Your tests may have to run as user postgres to get them to work...

Sorry, on this week I have not much time. I am planning finish api and fix CI problems on this weekend. After that start implementation by #558 issue. I think they should be merged to master together.

davecramer · 2016-05-12T14:24:46Z

@Gordiychuk Thanks! No rush

vlsi · 2016-05-14T21:27:49Z

.travis.yml

-      addons:
-        postgresql: "9.4"
+      sudo: required
+      dist: trusty


-1 for converting lots of jobs to sudo: required. It will make Travis a lot slower

Now user in tests should have replication privilege and also postgres should be configure for accept replication connection, it necessary for test replication protocol

Replication protocol use CopyBothResponce package for initial bidirectional copy protocol. Then WAL data is sent as a series of CopyData messages. With periodical exchange KeepAlive package.

Replication for protocol version 3 work via CopyAPI. For protocol version 2 replication protocol not supports. Main class for work with replication protocol it PGReplicationStream. It class hide low level replication protocol and prediodical update status messages allow work only with payload. Current implementation faced with logical replication protocol bug: After close ReplicationStream backend not send CommandCompleate and ReadyForQuery packages. As result it bug broke scenario when from replication stream fetches wals and send to another system for example elasticsearch asynchonize - after get first problem during asynchronize message send, replication protocol close and restart replication from last success send wal record. Example logical API: PGReplicationStream stream = pgConnection .replicationStream() .logical() .withSlotName("test_decoding") .withSlotOption("include-xids", false) .withSlotOption("skip-empty-xacts", true) .start(); while (true) { ByteBuffer buffer = stream.read(); //process logical changes } Example physical API: LogSequenceNumber lsn = getCurrentLSN(); PGReplicationStream stream = pgConnection .replicationStream() .physical() .withStartPosition(lsn) .start(); while (true) { ByteBuffer read = stream.read(); //process binary WAL logs }

…rkaround

…cking Periodical check replication stream allow for example process asynchronize feedback from external system that accept changes from logical replication stream more faster, because waiting new changes may take a lot of time that not allow get feedback messages from external system where message can contain fail and we should restart replication from fail position but we can't because new changes absent in replication stream. Example use: PGReplicationStream stream = pgConnection .replicationStream() .logical() .withSlotName(SLOT_NAME) .withStartPosition(getCurrentLSN()) .start(); while (true) { ByteBuffer result = stream.readPending(); if(result == null) { TimeUnit.MILLISECONDS.sleep(10); continue; } //process message }

org.postgresql.replication.PGReplication necessary for extend replication commands in futures. Example start logical replication: pgConnection .getReplicationAPI() .createReplicationSlot() .logical() .withSlotName("mySlot") .withOutputPlugin("test_decoding") .make(); PGReplicationStream stream = pgConnection .getReplicationAPI() .replicationStream() .logical() .withSlotName("test_decoding") .withSlotName("mySlot") .withSlotOption("include-xids", false) .withSlotOption("skip-empty-xacts", true) .start();

When first message received from server it keep allive we set as last receive lsn last lsn position on server, as result during reply on keep alive we can lost reserved wal. Also during process XLogData we can't use last server lsn from message.

Official documentation say that status update should send back lsn + offcet that was write or apply to dist: "The location of the last WAL byte + 1 received and written to disk in the standby.". In our case it means that last receive LSN should contains also payload offset.

Check max_wal_senders not enough for decide, run or not replication tests because replication can be configure on server, but tests user doesn't have grants to use replication. For deal with this scenario now we check not only configured replication but also that user have replication grants.

vlsi · 2016-11-25T09:53:18Z

pgjdbc/src/main/java/org/postgresql/core/v3/CopyDualImpl.java

+import java.util.Queue;
+
+public class CopyDualImpl extends CopyOperationImpl implements CopyDual {
+  private Queue<byte[]> received = new LinkedList<byte[]>();


Just for the history: ArrayDeque should be preferred over LinkedList

vlsi · 2016-11-25T10:20:14Z

@Gordiychuk , thanks for pushing it forward.

The replication protocol is managed by PGReplicationStream. It hides low level replication protocol details and enables end user deal with just payload data. The entry point is `PGConnection#getReplicationAPI`. Current PostgreSQL backend has issues with terminating of replication connection (e.g. "stop decode" message might be ignored for a while, so termination would take some time). Relevant hacker's thread is https://www.postgresql.org/message-id/CAFgjRd3hdYOa33m69TbeOfNNer2BZbwa8FFjt2V5VFzTBvUU3w%40mail.gmail.com Locgical replication API: PGReplicationStream stream = pgConnection .replicationStream() .logical() .withSlotName("test_decoding") .withSlotOption("include-xids", false) .withSlotOption("skip-empty-xacts", true) .start(); while (true) { ByteBuffer buffer = stream.read(); //process logical changes } Physical replication API: LogSequenceNumber lsn = getCurrentLSN(); PGReplicationStream stream = pgConnection .replicationStream() .physical() .withStartPosition(lsn) .start(); while (true) { ByteBuffer read = stream.read(); //process binary WAL logs } The main purpose for supporting of the replication protocol at the driver level is to provide an ability to create realtime time integration with external systems (e.g. Kafka+ElasticSearch)

jorsol · 2016-12-01T16:31:21Z

I know this is already merged, but is V2ReplicationProtocol.java necessary?

The driver already dropped support for v2 protocol and it looks that V2ReplicationProtocol.java is not used anywhere.

vlsi · 2016-12-01T16:37:11Z

@jorsol , seems it can be dropped. I just did not catch that.

* docs: Replication protocol PR contains sample of use replication protocol implemented in PR pgjdbc/pgjdbc#550 * Update replication.md @Gordyichuck, hopefully I haven't changed the essence ? * Move replication to extension sub group

davecramer · 2017-02-19T21:15:18Z

@Gordiychuk I'd like to add Autoclosable to both PGReplicationStream and PGReplicationConnection?

Thoughts ?

Gordiychuk · 2017-02-20T06:06:18Z

@davecramer good idea

jorsol · 2017-02-20T12:27:23Z

AutoCloseable is a Java 7 feature, the support for Java 6 will be dropped in the near future?

Gordiychuk force-pushed the support_replication_protocol branch from 03b9f72 to 86474f3 Compare May 6, 2016 16:46

Gordiychuk mentioned this pull request May 10, 2016

Replication protocol requires "use simple queries only" #558

Closed

Gordiychuk force-pushed the support_replication_protocol branch 2 times, most recently from 7e20bc9 to 9236c1a Compare May 12, 2016 07:05

Gordiychuk force-pushed the support_replication_protocol branch 3 times, most recently from 97a0b20 to b6dff39 Compare May 14, 2016 20:17

vlsi reviewed May 14, 2016
View reviewed changes

Gordiychuk force-pushed the support_replication_protocol branch 2 times, most recently from 8c84236 to 167a696 Compare May 15, 2016 09:29

Gordiychuk added 12 commits November 25, 2016 11:45

feat: Add replication settings for Travis CI

ddb5eec

Now user in tests should have replication privilege and also postgres should be configure for accept replication connection, it necessary for test replication protocol

feat: Add support CopyBothResponse('W') package for Copy API

a7dff36

Replication protocol use CopyBothResponce package for initial bidirectional copy protocol. Then WAL data is sent as a series of CopyData messages. With periodical exchange KeepAlive package.

docs: Add information about PGReplicationStream#close overhead and wo…

f789abf

…rkaround

feat: Add isValid tests on ReplicationConnection

5aac09b

bug: Fix floating tests

0a20fd8

bug: lastServerLSN != lastReciveLSN

8dc6aad

When first message received from server it keep allive we set as last receive lsn last lsn position on server, as result during reply on keep alive we can lost reserved wal. Also during process XLogData we can't use last server lsn from message.

Include license for each new file

47ee117

Restore ability create table without OIDS

f12df40

Gordiychuk force-pushed the support_replication_protocol branch from ac59499 to e04d171 Compare November 25, 2016 08:46

Gordiychuk force-pushed the support_replication_protocol branch from e04d171 to 518444e Compare November 25, 2016 09:12

hchiorean mentioned this pull request Nov 25, 2016

DBZ-3 Implements a Debezium Connector for ingesting Postgresql changes via logical decoding debezium/debezium#144

Merged

vlsi changed the title ~~feat: Add Support replication protocol~~ feat: add replication protocol API Nov 25, 2016

vlsi merged this pull request into pgjdbc:master Nov 25, 2016

vlsi reviewed Nov 25, 2016

View reviewed changes

jorsol mentioned this pull request Dec 20, 2016

refactor: remove unused V2ReplicationProtocol.java #718

Merged

jpfuentes2 mentioned this pull request Feb 23, 2017

Logical replication LSN mgmt docs contradict javadoc #760

Closed

Gordiychuk mentioned this pull request Apr 4, 2017

Bug: Not valid calculate lastReceiveLSN for logical replication #801

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add replication protocol API #550

feat: add replication protocol API #550

Gordiychuk commented May 3, 2016 •

edited by davecramer

Loading

davecramer commented May 3, 2016

Gordiychuk commented May 3, 2016

Gordiychuk commented May 3, 2016

davecramer commented May 3, 2016

davecramer commented May 4, 2016

Gordiychuk commented May 4, 2016

Gordiychuk commented May 4, 2016

davecramer commented May 4, 2016

jarreds commented May 7, 2016

Gordiychuk commented May 7, 2016

Gordiychuk commented May 7, 2016

codecov-io commented May 12, 2016 •

edited

Loading

davecramer commented May 12, 2016

Gordiychuk commented May 12, 2016

davecramer commented May 12, 2016

vlsi May 14, 2016

vlsi Nov 25, 2016

vlsi commented Nov 25, 2016

jorsol commented Dec 1, 2016

vlsi commented Dec 1, 2016

davecramer commented Feb 19, 2017

Gordiychuk commented Feb 20, 2017

jorsol commented Feb 20, 2017

feat: add replication protocol API #550

feat: add replication protocol API #550

Conversation

Gordiychuk commented May 3, 2016 • edited by davecramer Loading

davecramer commented May 3, 2016

Gordiychuk commented May 3, 2016

Gordiychuk commented May 3, 2016

davecramer commented May 3, 2016

davecramer commented May 4, 2016

Gordiychuk commented May 4, 2016

Gordiychuk commented May 4, 2016

davecramer commented May 4, 2016

jarreds commented May 7, 2016

Gordiychuk commented May 7, 2016

Gordiychuk commented May 7, 2016

codecov-io commented May 12, 2016 • edited Loading

Current coverage is 64.08% (diff: 80.72%)

davecramer commented May 12, 2016

Gordiychuk commented May 12, 2016

davecramer commented May 12, 2016

vlsi May 14, 2016

Choose a reason for hiding this comment

vlsi Nov 25, 2016

Choose a reason for hiding this comment

vlsi commented Nov 25, 2016

jorsol commented Dec 1, 2016

vlsi commented Dec 1, 2016

davecramer commented Feb 19, 2017

Gordiychuk commented Feb 20, 2017

jorsol commented Feb 20, 2017

Gordiychuk commented May 3, 2016 •

edited by davecramer

Loading

codecov-io commented May 12, 2016 •

edited

Loading