New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add replication protocol API #550

Merged
merged 14 commits into from Nov 25, 2016

Conversation

Projects
None yet
8 participants
@Gordiychuk
Contributor

Gordiychuk commented May 3, 2016

Replication for protocol version 3 works via CopyAPI. Replication protocol is not supported for protocol version 2 .
The main class used is PGReplicationStream. This class encapsulates the low level
replication protocol and periodical update status messages. The output of this is the decode plugin payload.

Current implementation is faced with logical replication protocol bug:
After close ReplicationStream backend not send CommandCompleate and ReadyForQuery packages.
As result it bug broke scenario when from replication stream fetches WAL and send to another system for example elasticsearch asynchonize - after get first problem during asynchronize message send, replication protocol close and restart replication from last success send WAL record.

Example logical API:

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .logical()
            .withSlotName("test_decoding")
            .withSlotOption("include-xids", false)
            .withSlotOption("skip-empty-xacts", true)
            .start();
    while (true) {
      ByteBuffer buffer = stream.read();
      //process logical changes
    }

Example physical API:

    LogSequenceNumber lsn = getCurrentLSN();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .physical()
            .withStartPosition(lsn)
            .start();

    while (true) {
      ByteBuffer read = stream.read();
      //process binary WAL logs
    }

The main purpose for add replication protocol to driver - it logical replication and ability create realtime t integration with external system(for me it kafka+elasticsearch)

@davecramer

This comment has been minimized.

Show comment
Hide comment
@davecramer

davecramer May 3, 2016

Member

Awesome! Thanks for this. I'll try to find some time to review soon

Member

davecramer commented May 3, 2016

Awesome! Thanks for this. I'll try to find some time to review soon

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk May 3, 2016

Contributor

Problem scenario for logical replication protocol seem like this

  PGConnection pgConnection = (PGConnection) replConnection;

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table(name) values('message to repeat')");
    st.close();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .logical()
            .withSlotName(SLOT_NAME)
            .withStartPosition(startLSN)
            .withSlotOption("include-xids", false)
            .withSlotOption("skip-empty-xacts", true)
            .start();

    List<String> result = new ArrayList<String>();
    result.addAll(receiveMessage(stream, 3));

    stream.close();

Logs:

22:13:14.087 (2)  FE=> StartReplication(query: START_REPLICATION SLOT pgjdbc_logical_replication_slot LOGICAL 0/18FCFD0 ("include-xids" 'false', "skip-empty-xacts" 'true'))
22:13:14.087 (2)  FE=> Query(CopyStart)
22:13:14.088 (2)  <=BE CopyBothResponse
22:13:14.093 (2)  FE=> StandbyStatusUpdate(received: 0/18FCFD0, flushed: 0/0, applied: 0/0, clock: Tue May 03 22:13:14 MSK 2016)
22:13:14.094 (2)  FE=> CopyData(34)
22:13:14.094 (2)  <=BE CopyData
22:13:14.094 (2) k    ���� ���`�� 
22:13:14.094 (2)  <=BE CopyData
22:13:14.094 (2) w                        BEGIN
22:13:14.095 (2)   <=BE Keepalive(lastServerWal: 0/18FCFD0, clock: Tue May 03 22:13:14 MSK 2016 needReply: false)
22:13:14.095 (2)   <=BE XLogData(currWal: 0/0, lastServerWal: 0/0, clock: 0)
22:13:14.095 (2)  <=BE CopyData
22:13:14.095 (2) w    ����Рtable public.test_logic_table: INSERT: pk[integer]:1 name[character varying]:'message to repeat'
22:13:14.096 (2)   <=BE XLogData(currWal: 0/18FD0A0, lastServerWal: 0/18FD0A0, clock: 0)
22:13:14.096 (2)  <=BE CopyData
22:13:14.096 (2) w    ��Ѱ    ��Ѱ        COMMIT
22:13:14.096 (2)   <=BE XLogData(currWal: 0/18FD1B0, lastServerWal: 0/18FD1B0, clock: 0)
22:13:14.096 (2)  FE=> StopReplication
22:13:14.096 (2)  <=BE CopyData
22:13:14.096 (2) k    ��Ѱ ���`�' 
22:13:14.096 (2)  FE=> CopyDone
22:13:14.097 (2)  <=BE CopyDone
22:13:14.097 (2)  <=BE CopyData
22:13:14.097 (2) k    ��Ѱ ���`�� 

org.postgresql.util.PSQLException: Database connection failed when ending copy

    at org.postgresql.core.v3.QueryExecutorImpl.endCopy(QueryExecutorImpl.java:834)
    at org.postgresql.core.v3.CopyDualImpl.endCopy(CopyDualImpl.java:23)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.close(V3PGReplicationStream.java:244)
    at org.postgresql.replication.LogicalReplicationTest.testRepeatWalPositionTwice(LogicalReplicationTest.java:281)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
    at org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:234)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:74)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:170)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:143)
    at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:112)
    at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:70)
    at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:283)
    at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:947)
    at org.postgresql.core.v3.QueryExecutorImpl.endCopy(QueryExecutorImpl.java:830)
    ... 33 more

As you can see, after get CopyDone from backend. Backend still send KeepAlive message. But same scenario on physycal replication work well.

Contributor

Gordiychuk commented May 3, 2016

Problem scenario for logical replication protocol seem like this

  PGConnection pgConnection = (PGConnection) replConnection;

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table(name) values('message to repeat')");
    st.close();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .logical()
            .withSlotName(SLOT_NAME)
            .withStartPosition(startLSN)
            .withSlotOption("include-xids", false)
            .withSlotOption("skip-empty-xacts", true)
            .start();

    List<String> result = new ArrayList<String>();
    result.addAll(receiveMessage(stream, 3));

    stream.close();

Logs:

22:13:14.087 (2)  FE=> StartReplication(query: START_REPLICATION SLOT pgjdbc_logical_replication_slot LOGICAL 0/18FCFD0 ("include-xids" 'false', "skip-empty-xacts" 'true'))
22:13:14.087 (2)  FE=> Query(CopyStart)
22:13:14.088 (2)  <=BE CopyBothResponse
22:13:14.093 (2)  FE=> StandbyStatusUpdate(received: 0/18FCFD0, flushed: 0/0, applied: 0/0, clock: Tue May 03 22:13:14 MSK 2016)
22:13:14.094 (2)  FE=> CopyData(34)
22:13:14.094 (2)  <=BE CopyData
22:13:14.094 (2) k    ���� ���`�� 
22:13:14.094 (2)  <=BE CopyData
22:13:14.094 (2) w                        BEGIN
22:13:14.095 (2)   <=BE Keepalive(lastServerWal: 0/18FCFD0, clock: Tue May 03 22:13:14 MSK 2016 needReply: false)
22:13:14.095 (2)   <=BE XLogData(currWal: 0/0, lastServerWal: 0/0, clock: 0)
22:13:14.095 (2)  <=BE CopyData
22:13:14.095 (2) w    ����Рtable public.test_logic_table: INSERT: pk[integer]:1 name[character varying]:'message to repeat'
22:13:14.096 (2)   <=BE XLogData(currWal: 0/18FD0A0, lastServerWal: 0/18FD0A0, clock: 0)
22:13:14.096 (2)  <=BE CopyData
22:13:14.096 (2) w    ��Ѱ    ��Ѱ        COMMIT
22:13:14.096 (2)   <=BE XLogData(currWal: 0/18FD1B0, lastServerWal: 0/18FD1B0, clock: 0)
22:13:14.096 (2)  FE=> StopReplication
22:13:14.096 (2)  <=BE CopyData
22:13:14.096 (2) k    ��Ѱ ���`�' 
22:13:14.096 (2)  FE=> CopyDone
22:13:14.097 (2)  <=BE CopyDone
22:13:14.097 (2)  <=BE CopyData
22:13:14.097 (2) k    ��Ѱ ���`�� 

org.postgresql.util.PSQLException: Database connection failed when ending copy

    at org.postgresql.core.v3.QueryExecutorImpl.endCopy(QueryExecutorImpl.java:834)
    at org.postgresql.core.v3.CopyDualImpl.endCopy(CopyDualImpl.java:23)
    at org.postgresql.core.v3.replication.V3PGReplicationStream.close(V3PGReplicationStream.java:244)
    at org.postgresql.replication.LogicalReplicationTest.testRepeatWalPositionTwice(LogicalReplicationTest.java:281)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
    at org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
    at org.junit.rules.RunRules.evaluate(RunRules.java:20)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:234)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:74)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:170)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:143)
    at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:112)
    at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:70)
    at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:283)
    at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:947)
    at org.postgresql.core.v3.QueryExecutorImpl.endCopy(QueryExecutorImpl.java:830)
    ... 33 more

As you can see, after get CopyDone from backend. Backend still send KeepAlive message. But same scenario on physycal replication work well.

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk May 3, 2016

Contributor

@davecramer do you think problem describe above postgresql sever bug or bug implementation in driver?

Contributor

Gordiychuk commented May 3, 2016

@davecramer do you think problem describe above postgresql sever bug or bug implementation in driver?

@davecramer

This comment has been minimized.

Show comment
Hide comment
@davecramer

davecramer May 3, 2016

Member

@Gordiychuk I'd suggest posting to hackers... I don't know enough right now to comment

Member

davecramer commented May 3, 2016

@Gordiychuk I'd suggest posting to hackers... I don't know enough right now to comment

@davecramer

This comment has been minimized.

Show comment
Hide comment
@davecramer

davecramer May 4, 2016

Member

@Gordiychuk I suspect this is a bug with the driver implementation. Replication is fairly well tested

Member

davecramer commented May 4, 2016

@Gordiychuk I suspect this is a bug with the driver implementation. Replication is fairly well tested

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk May 4, 2016

Contributor

After some code analize, I found problem in postgresql. And right now prepare patch. Inside walsender. WalSndLoop conaint next check

        /*
         * If we don't have any pending data in the output buffer, try to send
         * some more.  If there is some, we don't bother to call send_data
         * again until we've flushed it ... but we'd better assume we are not
         * caught up.
         */
        if (!pq_is_send_pending())
            send_data();
        else
            WalSndCaughtUp = false;

that execute callback for tranform wal record and send it. Then for logical replication executes

/*
 * read_page callback for logical decoding contexts, as a walsender process.
 *
 * Inside the walsender we can do better than logical_read_local_xlog_page,
 * which has to do a plain sleep/busy loop, because the walsender's latch gets
 * set everytime WAL is flushed.
 */
static int
logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
                XLogRecPtr targetRecPtr, char *cur_page, TimeLineID *pageTLI)

and it code start long loop with waiting available to send wals

/*
 * Wait till WAL < loc is flushed to disk so it can be safely read.
 */
static XLogRecPtr
WalSndWaitForWal(XLogRecPtr loc)
{
    int         wakeEvents;
    static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr;


    /*
     * Fast path to avoid acquiring the spinlock in the we already know we
     * have enough WAL available. This is particularly interesting if we're
     * far behind.
     */
    if (RecentFlushPtr != InvalidXLogRecPtr &&
        loc <= RecentFlushPtr)
        return RecentFlushPtr;

    /* Get a more recent flush pointer. */
    if (!RecoveryInProgress())
        RecentFlushPtr = GetFlushRecPtr();
    else
        RecentFlushPtr = GetXLogReplayRecPtr(NULL);

    for (;;)
    {
        long        sleeptime;
        TimestampTz now;

        /*
         * Emergency bailout if postmaster has died.  This is to avoid the
         * necessity for manual cleanup of all postmaster children.
         */
        if (!PostmasterIsAlive())
            exit(1);

        /* Clear any already-pending wakeups */
        ResetLatch(MyLatch);

        CHECK_FOR_INTERRUPTS();

        /* Process any requests or signals received recently */
        if (got_SIGHUP)
        {
            got_SIGHUP = false;
            ProcessConfigFile(PGC_SIGHUP);
            SyncRepInitConfig();
        }

        /* Check for input from the client */
        ProcessRepliesIfAny();

        /* Update our idea of the currently flushed position. */
        if (!RecoveryInProgress())
            RecentFlushPtr = GetFlushRecPtr();
        else
            RecentFlushPtr = GetXLogReplayRecPtr(NULL);

        /*
         * If postmaster asked us to stop, don't wait here anymore. This will
         * cause the xlogreader to return without reading a full record, which
         * is the fastest way to reach the mainloop which then can quit.
         *
         * It's important to do this check after the recomputation of
         * RecentFlushPtr, so we can send all remaining data before shutting
         * down.
         */
        if (walsender_ready_to_stop)
            break;

        /*
         * We only send regular messages to the client for full decoded
         * transactions, but a synchronous replication and walsender shutdown
         * possibly are waiting for a later location. So we send pings
         * containing the flush location every now and then.
         */
        if (MyWalSnd->flush < sentPtr &&
            MyWalSnd->write < sentPtr &&
            !waiting_for_ping_response)
        {
            WalSndKeepalive(false);
            waiting_for_ping_response = true;
        }

        /* check whether we're done */
        if (loc <= RecentFlushPtr)
            break;

        /* Waiting for new WAL. Since we need to wait, we're now caught up. */
        WalSndCaughtUp = true;

        /*
         * Try to flush pending output to the client. Also wait for the socket
         * becoming writable, if there's still pending output after an attempt
         * to flush. Otherwise we might just sit on output data while waiting
         * for new WAL being generated.
         */
        if (pq_flush_if_writable() != 0)
            WalSndShutdown();

        now = GetCurrentTimestamp();

        /* die if timeout was reached */
        WalSndCheckTimeOut(now);

        /* Send keepalive if the time has come */
        WalSndKeepaliveIfNecessary(now);

        sleeptime = WalSndComputeSleeptime(now);

        wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
            WL_SOCKET_READABLE | WL_TIMEOUT;

        if (pq_is_send_pending())
            wakeEvents |= WL_SOCKET_WRITEABLE;

        /* Sleep until something happens or we time out */
        WaitLatchOrSocket(MyLatch, wakeEvents,
                          MyProcPort->sock, sleeptime);
    }

    /* reactivate latch so WalSndLoop knows to continue */
    SetLatch(MyLatch);
    return RecentFlushPtr;
}

In this sicle cycle after execute ProcessRepliesIfAny(); we get CopyDone command, reply with CopyDone and continue waiting WALs, sending WALs and sending keep alives with ignore streamingDoneReceiving and streamingDoneSending flags. Thats why same test for physical replication work well but for logical fail with timeout.

Contributor

Gordiychuk commented May 4, 2016

After some code analize, I found problem in postgresql. And right now prepare patch. Inside walsender. WalSndLoop conaint next check

        /*
         * If we don't have any pending data in the output buffer, try to send
         * some more.  If there is some, we don't bother to call send_data
         * again until we've flushed it ... but we'd better assume we are not
         * caught up.
         */
        if (!pq_is_send_pending())
            send_data();
        else
            WalSndCaughtUp = false;

that execute callback for tranform wal record and send it. Then for logical replication executes

/*
 * read_page callback for logical decoding contexts, as a walsender process.
 *
 * Inside the walsender we can do better than logical_read_local_xlog_page,
 * which has to do a plain sleep/busy loop, because the walsender's latch gets
 * set everytime WAL is flushed.
 */
static int
logical_read_xlog_page(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
                XLogRecPtr targetRecPtr, char *cur_page, TimeLineID *pageTLI)

and it code start long loop with waiting available to send wals

/*
 * Wait till WAL < loc is flushed to disk so it can be safely read.
 */
static XLogRecPtr
WalSndWaitForWal(XLogRecPtr loc)
{
    int         wakeEvents;
    static XLogRecPtr RecentFlushPtr = InvalidXLogRecPtr;


    /*
     * Fast path to avoid acquiring the spinlock in the we already know we
     * have enough WAL available. This is particularly interesting if we're
     * far behind.
     */
    if (RecentFlushPtr != InvalidXLogRecPtr &&
        loc <= RecentFlushPtr)
        return RecentFlushPtr;

    /* Get a more recent flush pointer. */
    if (!RecoveryInProgress())
        RecentFlushPtr = GetFlushRecPtr();
    else
        RecentFlushPtr = GetXLogReplayRecPtr(NULL);

    for (;;)
    {
        long        sleeptime;
        TimestampTz now;

        /*
         * Emergency bailout if postmaster has died.  This is to avoid the
         * necessity for manual cleanup of all postmaster children.
         */
        if (!PostmasterIsAlive())
            exit(1);

        /* Clear any already-pending wakeups */
        ResetLatch(MyLatch);

        CHECK_FOR_INTERRUPTS();

        /* Process any requests or signals received recently */
        if (got_SIGHUP)
        {
            got_SIGHUP = false;
            ProcessConfigFile(PGC_SIGHUP);
            SyncRepInitConfig();
        }

        /* Check for input from the client */
        ProcessRepliesIfAny();

        /* Update our idea of the currently flushed position. */
        if (!RecoveryInProgress())
            RecentFlushPtr = GetFlushRecPtr();
        else
            RecentFlushPtr = GetXLogReplayRecPtr(NULL);

        /*
         * If postmaster asked us to stop, don't wait here anymore. This will
         * cause the xlogreader to return without reading a full record, which
         * is the fastest way to reach the mainloop which then can quit.
         *
         * It's important to do this check after the recomputation of
         * RecentFlushPtr, so we can send all remaining data before shutting
         * down.
         */
        if (walsender_ready_to_stop)
            break;

        /*
         * We only send regular messages to the client for full decoded
         * transactions, but a synchronous replication and walsender shutdown
         * possibly are waiting for a later location. So we send pings
         * containing the flush location every now and then.
         */
        if (MyWalSnd->flush < sentPtr &&
            MyWalSnd->write < sentPtr &&
            !waiting_for_ping_response)
        {
            WalSndKeepalive(false);
            waiting_for_ping_response = true;
        }

        /* check whether we're done */
        if (loc <= RecentFlushPtr)
            break;

        /* Waiting for new WAL. Since we need to wait, we're now caught up. */
        WalSndCaughtUp = true;

        /*
         * Try to flush pending output to the client. Also wait for the socket
         * becoming writable, if there's still pending output after an attempt
         * to flush. Otherwise we might just sit on output data while waiting
         * for new WAL being generated.
         */
        if (pq_flush_if_writable() != 0)
            WalSndShutdown();

        now = GetCurrentTimestamp();

        /* die if timeout was reached */
        WalSndCheckTimeOut(now);

        /* Send keepalive if the time has come */
        WalSndKeepaliveIfNecessary(now);

        sleeptime = WalSndComputeSleeptime(now);

        wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
            WL_SOCKET_READABLE | WL_TIMEOUT;

        if (pq_is_send_pending())
            wakeEvents |= WL_SOCKET_WRITEABLE;

        /* Sleep until something happens or we time out */
        WaitLatchOrSocket(MyLatch, wakeEvents,
                          MyProcPort->sock, sleeptime);
    }

    /* reactivate latch so WalSndLoop knows to continue */
    SetLatch(MyLatch);
    return RecentFlushPtr;
}

In this sicle cycle after execute ProcessRepliesIfAny(); we get CopyDone command, reply with CopyDone and continue waiting WALs, sending WALs and sending keep alives with ignore streamingDoneReceiving and streamingDoneSending flags. Thats why same test for physical replication work well but for logical fail with timeout.

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk May 4, 2016

Contributor

The second problem is that postgresql after reply with CopyData still send CopyData messages, it problem presend for logical and physical replication. And I also want fix it.

        /* Check for input from the client */
        ProcessRepliesIfAny();

        /*
         * If we have received CopyDone from the client, sent CopyDone
         * ourselves, and the output buffer is empty, it's time to exit
         * streaming.
         */
        if (!pq_is_send_pending() && streamingDoneSending && streamingDoneReceiving)
            break;

        /*
         * If we don't have any pending data in the output buffer, try to send
         * some more.  If there is some, we don't bother to call send_data
         * again until we've flushed it ... but we'd better assume we are not
         * caught up.
         */
        if (!pq_is_send_pending())
            send_data();
        else
            WalSndCaughtUp = false;

        /* Try to flush pending output to the client */
        if (pq_flush_if_writable() != 0)
            WalSndShutdown();

        /* If nothing remains to be sent right now ... */
        if (WalSndCaughtUp && !pq_is_send_pending())
        {
            /*
             * If we're in catchup state, move to streaming.  This is an
             * important state change for users to know about, since before
             * this point data loss might occur if the primary dies and we
             * need to failover to the standby. The state change is also
             * important for synchronous replication, since commits that
             * started to wait at that point might wait for some time.
             */
            if (MyWalSnd->state == WALSNDSTATE_CATCHUP)
            {
                ereport(DEBUG1,
                     (errmsg("standby \"%s\" has now caught up with primary",
                             application_name)));
                WalSndSetState(WALSNDSTATE_STREAMING);
            }

            /*
             * When SIGUSR2 arrives, we send any outstanding logs up to the
             * shutdown checkpoint record (i.e., the latest record), wait for
             * them to be replicated to the standby, and exit. This may be a
             * normal termination at shutdown, or a promotion, the walsender
             * is not sure which.
             */
            if (walsender_ready_to_stop)
                WalSndDone(send_data);
        }

        now = GetCurrentTimestamp();

        /* Check for replication timeout. */
        WalSndCheckTimeOut(now);

        /* Send keepalive if the time has come */
        WalSndKeepaliveIfNecessary(now);

In ProcessRepliesIfAny(); we get CopyDone and send CopyDone as responce, message still present in output buffer so condition

        if (!pq_is_send_pending() && streamingDoneSending && streamingDoneReceiving)
            break;

will return false and we will execute next functions even if streamingDoneSending and streamingDoneReceiving equal to true

        /* Check for replication timeout. */
        WalSndCheckTimeOut(now);

        /* Send keepalive if the time has come */
        WalSndKeepaliveIfNecessary(now);
Contributor

Gordiychuk commented May 4, 2016

The second problem is that postgresql after reply with CopyData still send CopyData messages, it problem presend for logical and physical replication. And I also want fix it.

        /* Check for input from the client */
        ProcessRepliesIfAny();

        /*
         * If we have received CopyDone from the client, sent CopyDone
         * ourselves, and the output buffer is empty, it's time to exit
         * streaming.
         */
        if (!pq_is_send_pending() && streamingDoneSending && streamingDoneReceiving)
            break;

        /*
         * If we don't have any pending data in the output buffer, try to send
         * some more.  If there is some, we don't bother to call send_data
         * again until we've flushed it ... but we'd better assume we are not
         * caught up.
         */
        if (!pq_is_send_pending())
            send_data();
        else
            WalSndCaughtUp = false;

        /* Try to flush pending output to the client */
        if (pq_flush_if_writable() != 0)
            WalSndShutdown();

        /* If nothing remains to be sent right now ... */
        if (WalSndCaughtUp && !pq_is_send_pending())
        {
            /*
             * If we're in catchup state, move to streaming.  This is an
             * important state change for users to know about, since before
             * this point data loss might occur if the primary dies and we
             * need to failover to the standby. The state change is also
             * important for synchronous replication, since commits that
             * started to wait at that point might wait for some time.
             */
            if (MyWalSnd->state == WALSNDSTATE_CATCHUP)
            {
                ereport(DEBUG1,
                     (errmsg("standby \"%s\" has now caught up with primary",
                             application_name)));
                WalSndSetState(WALSNDSTATE_STREAMING);
            }

            /*
             * When SIGUSR2 arrives, we send any outstanding logs up to the
             * shutdown checkpoint record (i.e., the latest record), wait for
             * them to be replicated to the standby, and exit. This may be a
             * normal termination at shutdown, or a promotion, the walsender
             * is not sure which.
             */
            if (walsender_ready_to_stop)
                WalSndDone(send_data);
        }

        now = GetCurrentTimestamp();

        /* Check for replication timeout. */
        WalSndCheckTimeOut(now);

        /* Send keepalive if the time has come */
        WalSndKeepaliveIfNecessary(now);

In ProcessRepliesIfAny(); we get CopyDone and send CopyDone as responce, message still present in output buffer so condition

        if (!pq_is_send_pending() && streamingDoneSending && streamingDoneReceiving)
            break;

will return false and we will execute next functions even if streamingDoneSending and streamingDoneReceiving equal to true

        /* Check for replication timeout. */
        WalSndCheckTimeOut(now);

        /* Send keepalive if the time has come */
        WalSndKeepaliveIfNecessary(now);
@davecramer

This comment has been minimized.

Show comment
Hide comment
@davecramer

davecramer May 4, 2016

Member

I would suggest posting this to hackers to get their take first

Member

davecramer commented May 4, 2016

I would suggest posting this to hackers to get their take first

Gordiychuk pushed a commit to Gordiychuk/postgres that referenced this pull request May 6, 2016

Vladimir Gordiychuk
Stop logical decoding by get CopyDone
Logical decoding during decode WALs ignore message that can reponse receiver on XLogData.
So during big transaction for example that change 1 million record it can lead to two problem:

1. Receiver can disconect server because it not responce on keepalive message with required respose marker.
2. Receiver can't stop replication, until whole transaction will not send to receiver.

Not available stop replication it's main problem. Because receiver will fail during stop replication
with timeout and also backend will generate many not network traffic. This problem
was found during implement physical\logical replication protocol in pgjdbc driver
pgjdbc/pgjdbc#550 And it broke scenario when WALs consumer
receive decoded WALs and put it to external system asynchroneze were if some problem occurs
callback say which LSN was fail, so we can rollback to last success process LSN and
start logical replication again from it place.

I measure stopping replication with fix and without by this test:

For physical replicaion:

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table\n"
        + "  select id, md5(random()::text) as name from generate_series(1, 1000000) as id;");
    st.close();

    long start = System.nanoTime();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .physical()
            .withStartPosition(startLSN)
            .start();

    //read single message
    stream.read();
    long startStopping = System.nanoTime();

    stream.close();

    long now = System.nanoTime();

    long startAndStopTime = now - start;
    long stopTime = now - startStopping;

    System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime));
    System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime));

For logical replication:

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table\n"
        + "  select id, md5(random()::text) as name from generate_series(1, 1000000) as id;");
    st.close();

    long start = System.nanoTime();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .logical()
            .withSlotName(SLOT_NAME)
            .withStartPosition(startLSN)
            .withSlotOption("include-xids", false)
            .withSlotOption("skip-empty-xacts", true)
            .start();

    //read single message
    stream.read();
    long startStopping = System.nanoTime();

    stream.close();

    long now = System.nanoTime();

    long startAndStopTime = now - start;
    long stopTime = now - startStopping;

    System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime));
    System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime));

And get next timing:

Before
-----
logical start and stopping: 15446ms
logical stopping: 13820ms

physical start and stopping: 462ms
physical stopping: 348

After
-----
logical start and stopping: 2424ms
logical stopping: 26ms

physical start and stopping: 458ms
physical stopping: 329ms

As you can see, not it allow stop logical replication very fast. For do it, not we check
replies first and only after that send decoded data. After get CopyDone from frontend we
stoping decoding as soon as possible.

The second part of fix, it disable sending keep alive message to frontend if already got CopyDone.

Gordiychuk pushed a commit to Gordiychuk/postgres that referenced this pull request May 6, 2016

Vladimir Gordiychuk
Stop logical decoding by get CopyDone
Logical decoding during decode WALs ignore message that can reponse receiver on XLogData.
So during big transaction for example that change 1 million record it can lead to two problem:

1. Receiver can disconect server because it not responce on keepalive message with required respose marker.
2. Receiver can't stop replication, until whole transaction will not send to receiver.

Not available stop replication it's main problem. Because receiver will fail during stop replication
with timeout and also backend will generate many not network traffic. This problem
was found during implement physical\logical replication protocol in pgjdbc driver
pgjdbc/pgjdbc#550 And it broke scenario when WALs consumer
receive decoded WALs and put it to external system asynchroneze were if some problem occurs
callback say which LSN was fail, so we can rollback to last success process LSN and
start logical replication again from it place.

I measure stopping replication with fix and without by this test:

For physical replicaion:

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table\n"
        + "  select id, md5(random()::text) as name from generate_series(1, 1000000) as id;");
    st.close();

    long start = System.nanoTime();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .physical()
            .withStartPosition(startLSN)
            .start();

    //read single message
    stream.read();
    long startStopping = System.nanoTime();

    stream.close();

    long now = System.nanoTime();

    long startAndStopTime = now - start;
    long stopTime = now - startStopping;

    System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime));
    System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime));

For logical replication:

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table\n"
        + "  select id, md5(random()::text) as name from generate_series(1, 1000000) as id;");
    st.close();

    long start = System.nanoTime();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .logical()
            .withSlotName(SLOT_NAME)
            .withStartPosition(startLSN)
            .withSlotOption("include-xids", false)
            .withSlotOption("skip-empty-xacts", true)
            .start();

    //read single message
    stream.read();
    long startStopping = System.nanoTime();

    stream.close();

    long now = System.nanoTime();

    long startAndStopTime = now - start;
    long stopTime = now - startStopping;

    System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime));
    System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime));

And get next timing:

Before
-----
logical start and stopping: 15446ms
logical stopping: 13820ms

physical start and stopping: 462ms
physical stopping: 348

After
-----
logical start and stopping: 2424ms
logical stopping: 26ms

physical start and stopping: 458ms
physical stopping: 329ms

As you can see, not it allow stop logical replication very fast. For do it, not we check
replies first and only after that send decoded data. After get CopyDone from frontend we
stoping decoding as soon as possible.

The second part of fix, it disable sending keep alive message to frontend if already got CopyDone.

Gordiychuk pushed a commit to Gordiychuk/postgres that referenced this pull request May 6, 2016

Vladimir Gordiychuk
Stop logical decoding by get CopyDone
Logical decoding during decode WALs ignore message that can reponse receiver on XLogData.
So during big transaction for example that change 1 million record it can lead to two problem:

1. Receiver can disconect server because it not responce on keepalive message with required respose marker.
2. Receiver can't stop replication, until whole transaction will not send to receiver.

Not available stop replication it's main problem. Because receiver will fail during stop replication
with timeout and also backend will generate many not network traffic. This problem
was found during implement physical\logical replication protocol in pgjdbc driver
pgjdbc/pgjdbc#550 And it broke scenario when WALs consumer
receive decoded WALs and put it to external system asynchroneze were if some problem occurs
callback say which LSN was fail, so we can rollback to last success process LSN and
start logical replication again from it place.

I measure stopping replication with fix and without by this test:

For physical replicaion:

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table\n"
        + "  select id, md5(random()::text) as name from generate_series(1, 1000000) as id;");
    st.close();

    long start = System.nanoTime();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .physical()
            .withStartPosition(startLSN)
            .start();

    //read single message
    stream.read();
    long startStopping = System.nanoTime();

    stream.close();

    long now = System.nanoTime();

    long startAndStopTime = now - start;
    long stopTime = now - startStopping;

    System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime));
    System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime));

For logical replication:

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table\n"
        + "  select id, md5(random()::text) as name from generate_series(1, 1000000) as id;");
    st.close();

    long start = System.nanoTime();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .logical()
            .withSlotName(SLOT_NAME)
            .withStartPosition(startLSN)
            .withSlotOption("include-xids", false)
            .withSlotOption("skip-empty-xacts", true)
            .start();

    //read single message
    stream.read();
    long startStopping = System.nanoTime();

    stream.close();

    long now = System.nanoTime();

    long startAndStopTime = now - start;
    long stopTime = now - startStopping;

    System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime));
    System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime));

And get next timing:

Before
-----
logical start and stopping: 15446ms
logical stopping: 13820ms

physical start and stopping: 462ms
physical stopping: 348

After
-----
logical start and stopping: 2424ms
logical stopping: 26ms

physical start and stopping: 458ms
physical stopping: 329ms

As you can see, not it allow stop logical replication very fast. For do it, not we check
replies first and only after that send decoded data. After get CopyDone from frontend we
stoping decoding as soon as possible.

The second part of fix, it disable sending keep alive message to frontend if already got CopyDone.

Gordiychuk pushed a commit to Gordiychuk/postgres that referenced this pull request May 6, 2016

Vladimir Gordiychuk
Stop logical decoding by get CopyDone
Logical decoding during decode WALs ignore message that can reponse receiver on XLogData.
So during big transaction for example that change 1 million record it can lead to two problem:

1. Receiver can disconect server because it not responce on keepalive message with required respose marker.
2. Receiver can't stop replication, until whole transaction will not send to receiver.

Not available stop replication it's main problem. Because receiver will fail during stop replication
with timeout and also backend will generate many not network traffic. This problem
was found during implement physical\logical replication protocol in pgjdbc driver
pgjdbc/pgjdbc#550 And it broke scenario when WALs consumer
receive decoded WALs and put it to external system asynchroneze were if some problem occurs
callback say which LSN was fail, so we can rollback to last success process LSN and
start logical replication again from it place.

I measure stopping replication with fix and without by this test:

For physical replicaion:

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table\n"
        + "  select id, md5(random()::text) as name from generate_series(1, 1000000) as id;");
    st.close();

    long start = System.nanoTime();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .physical()
            .withStartPosition(startLSN)
            .start();

    //read single message
    stream.read();
    long startStopping = System.nanoTime();

    stream.close();

    long now = System.nanoTime();

    long startAndStopTime = now - start;
    long stopTime = now - startStopping;

    System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime));
    System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime));

For logical replication:

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table\n"
        + "  select id, md5(random()::text) as name from generate_series(1, 1000000) as id;");
    st.close();

    long start = System.nanoTime();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .logical()
            .withSlotName(SLOT_NAME)
            .withStartPosition(startLSN)
            .withSlotOption("include-xids", false)
            .withSlotOption("skip-empty-xacts", true)
            .start();

    //read single message
    stream.read();
    long startStopping = System.nanoTime();

    stream.close();

    long now = System.nanoTime();

    long startAndStopTime = now - start;
    long stopTime = now - startStopping;

    System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime));
    System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime));

And get next timing:

Before
-----
logical start and stopping: 15446ms
logical stopping: 13820ms

physical start and stopping: 462ms
physical stopping: 348

After
-----
logical start and stopping: 2424ms
logical stopping: 26ms

physical start and stopping: 458ms
physical stopping: 329ms

As you can see, not it allow stop logical replication very fast. For do it, not we check
replies first and only after that send decoded data. After get CopyDone from frontend we
stoping decoding as soon as possible.

The second part of fix, it disable sending keep alive message to frontend if already got CopyDone.

Gordiychuk added a commit to Gordiychuk/postgres that referenced this pull request May 6, 2016

Stop logical decoding by get CopyDone
Logical decoding during decode WALs ignore message that can reponse receiver on XLogData.
So during big transaction for example that change 1 million record it can lead to two problem:

1. Receiver can disconect server because it not responce on keepalive message with required respose marker.
2. Receiver can't stop replication, until whole transaction will not send to receiver.

Not available stop replication it's main problem. Because receiver will fail during stop replication
with timeout and also backend will generate many not network traffic. This problem
was found during implement physical\logical replication protocol in pgjdbc driver
pgjdbc/pgjdbc#550 And it broke scenario when WALs consumer
receive decoded WALs and put it to external system asynchroneze were if some problem occurs
callback say which LSN was fail, so we can rollback to last success process LSN and
start logical replication again from it place.

I measure stopping replication with fix and without by this test:

For physical replicaion:

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table\n"
        + "  select id, md5(random()::text) as name from generate_series(1, 1000000) as id;");
    st.close();

    long start = System.nanoTime();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .physical()
            .withStartPosition(startLSN)
            .start();

    //read single message
    stream.read();
    long startStopping = System.nanoTime();

    stream.close();

    long now = System.nanoTime();

    long startAndStopTime = now - start;
    long stopTime = now - startStopping;

    System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime));
    System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime));

For logical replication:

    LogSequenceNumber startLSN = getCurrentLSN();

    Statement st = sqlConnection.createStatement();
    st.execute("insert into test_logic_table\n"
        + "  select id, md5(random()::text) as name from generate_series(1, 1000000) as id;");
    st.close();

    long start = System.nanoTime();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .logical()
            .withSlotName(SLOT_NAME)
            .withStartPosition(startLSN)
            .withSlotOption("include-xids", false)
            .withSlotOption("skip-empty-xacts", true)
            .start();

    //read single message
    stream.read();
    long startStopping = System.nanoTime();

    stream.close();

    long now = System.nanoTime();

    long startAndStopTime = now - start;
    long stopTime = now - startStopping;

    System.out.println(TimeUnit.NANOSECONDS.toMillis(startAndStopTime));
    System.out.println(TimeUnit.NANOSECONDS.toMillis(stopTime));

And get next timing:

Before
-----
logical start and stopping: 15446ms
logical stopping: 13820ms

physical start and stopping: 462ms
physical stopping: 348

After
-----
logical start and stopping: 2424ms
logical stopping: 26ms

physical start and stopping: 458ms
physical stopping: 329ms

As you can see, not it allow stop logical replication very fast. For do it, not we check
replies first and only after that send decoded data. After get CopyDone from frontend we
stoping decoding as soon as possible.

The second part of fix, it disable sending keep alive message to frontend if already got CopyDone.
@jarreds

This comment has been minimized.

Show comment
Hide comment
@jarreds

jarreds May 7, 2016

Awesome. I'm really excited to get this in. I'd like to get some clarification around the applied/flushed LSN handling.

In this code snippet, there is no handling of flushed or applied LSN:

    while (true) {
      ByteBuffer read = stream.read();
      //process binary WAL logs
    }

I'm trying to grok how manual handling LSN data would work. Something like this:

    while (true) {
      ByteBuffer read = stream.read();

      // this is the current LSN of the copy data read above?
      LogSequenceNumber lsn = stream.getLastReceiveLSN();

      // do some external async processing
      processor.doAsyncProcessing(lsn, read);

      // for status updates to the server
      // get LSN apply/flush data from the async processor
      stream.setAppliedLSN(processor.getLastAppliedLSN());
      stream.setFlushedLSN(processor.getLastFlushedLSN());
    }

?

Let me know if you'd like any help. I'm really interested in this feature. It would really help simplify the current C based system we use for this today.

jarreds commented May 7, 2016

Awesome. I'm really excited to get this in. I'd like to get some clarification around the applied/flushed LSN handling.

In this code snippet, there is no handling of flushed or applied LSN:

    while (true) {
      ByteBuffer read = stream.read();
      //process binary WAL logs
    }

I'm trying to grok how manual handling LSN data would work. Something like this:

    while (true) {
      ByteBuffer read = stream.read();

      // this is the current LSN of the copy data read above?
      LogSequenceNumber lsn = stream.getLastReceiveLSN();

      // do some external async processing
      processor.doAsyncProcessing(lsn, read);

      // for status updates to the server
      // get LSN apply/flush data from the async processor
      stream.setAppliedLSN(processor.getLastAppliedLSN());
      stream.setFlushedLSN(processor.getLastFlushedLSN());
    }

?

Let me know if you'd like any help. I'm really interested in this feature. It would really help simplify the current C based system we use for this today.

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk May 7, 2016

Contributor

@jarreds Yes. And for logical decoding

stream.setAppliedLSN(processor.getLastAppliedLSN());

can be skip, because parameter uses only for physical replication.

I also think that need method for check active stream or not, in stream present or not pending messages. Because I think iterate in a cycle forever is not always convenient.

Contributor

Gordiychuk commented May 7, 2016

@jarreds Yes. And for logical decoding

stream.setAppliedLSN(processor.getLastAppliedLSN());

can be skip, because parameter uses only for physical replication.

I also think that need method for check active stream or not, in stream present or not pending messages. Because I think iterate in a cycle forever is not always convenient.

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
Contributor

Gordiychuk commented May 7, 2016

@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io May 12, 2016

Current coverage is 64.08% (diff: 80.72%)

Merging #550 into master will increase coverage by 0.74%

@@             master       #550   diff @@
==========================================
  Files           151        165    +14   
  Lines         14787      15129   +342   
  Methods           0          0          
  Messages          0          0          
  Branches       2934       2978    +44   
==========================================
+ Hits           9366       9695   +329   
+ Misses         4213       4200    -13   
- Partials       1208       1234    +26   

Powered by Codecov. Last update d32b077...518444e

codecov-io commented May 12, 2016

Current coverage is 64.08% (diff: 80.72%)

Merging #550 into master will increase coverage by 0.74%

@@             master       #550   diff @@
==========================================
  Files           151        165    +14   
  Lines         14787      15129   +342   
  Methods           0          0          
  Messages          0          0          
  Branches       2934       2978    +44   
==========================================
+ Hits           9366       9695   +329   
+ Misses         4213       4200    -13   
- Partials       1208       1234    +26   

Powered by Codecov. Last update d32b077...518444e

@davecramer

This comment has been minimized.

Show comment
Hide comment
@davecramer

davecramer May 12, 2016

Member

@Gordiychuk So given that the server won't fix their end until 9.7 where does that leave us?

On another note Travis is failing on a few things. checkstyle and test does not have replication privileges.

Checkstyle should be relatively easy to fix you can run mvn checkstyle:check to find the errors.

Your tests may have to run as user postgres to get them to work...

Member

davecramer commented May 12, 2016

@Gordiychuk So given that the server won't fix their end until 9.7 where does that leave us?

On another note Travis is failing on a few things. checkstyle and test does not have replication privileges.

Checkstyle should be relatively easy to fix you can run mvn checkstyle:check to find the errors.

Your tests may have to run as user postgres to get them to work...

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk May 12, 2016

Contributor

@Gordiychuk So given that the server won't fix their end until 9.7 where does that leave us?

@davecramer No, we can delivery solution with assumption that org.postgresql.replication.PGReplicationStream#close can take long time and as workaround can be use Connection#close method until fixes will not include to postgresql.

On another note Travis is failing on a few things. checkstyle and test does not have replication privileges.
Checkstyle should be relatively easy to fix you can run mvn checkstyle:check to find the errors.
Your tests may have to run as user postgres to get them to work...

Sorry, on this week I have not much time. I am planning finish api and fix CI problems on this weekend. After that start implementation by #558 issue. I think they should be merged to master together.

Contributor

Gordiychuk commented May 12, 2016

@Gordiychuk So given that the server won't fix their end until 9.7 where does that leave us?

@davecramer No, we can delivery solution with assumption that org.postgresql.replication.PGReplicationStream#close can take long time and as workaround can be use Connection#close method until fixes will not include to postgresql.

On another note Travis is failing on a few things. checkstyle and test does not have replication privileges.
Checkstyle should be relatively easy to fix you can run mvn checkstyle:check to find the errors.
Your tests may have to run as user postgres to get them to work...

Sorry, on this week I have not much time. I am planning finish api and fix CI problems on this weekend. After that start implementation by #558 issue. I think they should be merged to master together.

@davecramer

This comment has been minimized.

Show comment
Hide comment
@davecramer

davecramer May 12, 2016

Member

@Gordiychuk Thanks! No rush

Member

davecramer commented May 12, 2016

@Gordiychuk Thanks! No rush

Show outdated Hide outdated .travis.yml
addons:
postgresql: "9.4"
sudo: required
dist: trusty

This comment has been minimized.

@vlsi

vlsi May 14, 2016

Member

-1 for converting lots of jobs to sudo: required. It will make Travis a lot slower

@vlsi

vlsi May 14, 2016

Member

-1 for converting lots of jobs to sudo: required. It will make Travis a lot slower

Show outdated Hide outdated .travis.yml
@@ -4,6 +4,7 @@ language: java
before_script:
- test "x$XA" == 'x' || ./travis_install_dependencies.sh
- psql -U postgres -c "create user test with password 'test';"
- test "x$PG_VERSION" == 'x' || test $PG_VERSION == '8.4' || test $PG_VERSION == '9.0' || psql -U postgres -c "alter user test with replication;"

This comment has been minimized.

@vlsi

vlsi May 15, 2016

Member

test "x$REPLICATION" == 'x' || psql -U postgres -c "alter user test with replication;", please

@vlsi

vlsi May 15, 2016

Member

test "x$REPLICATION" == 'x' || psql -U postgres -c "alter user test with replication;", please

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk May 15, 2016

Contributor

I complete changes that I wanted. Now it can be review.

Contributor

Gordiychuk commented May 15, 2016

I complete changes that I wanted. Now it can be review.

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk May 15, 2016

Contributor

How I can trigger ccodecov/project again? It show in details that coverage equal to 71% but in PR status only 68. Also 71% was before rebase, it strange.

Contributor

Gordiychuk commented May 15, 2016

How I can trigger ccodecov/project again? It show in details that coverage equal to 71% but in PR status only 68. Also 71% was before rebase, it strange.

@Gordiychuk Gordiychuk changed the title from [WIP] feat: Add Support replication protocol to feat: Add Support replication protocol May 15, 2016

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk May 15, 2016

Contributor

Oh, I found where the problem. Coverage calculates for postgresql 9.4, 9.5 and 9.6. But test from current PR not work on 9.4 and result coverage use from last complete job, sometimes it 9.4, sometimes it 9.5.

Contributor

Gordiychuk commented May 15, 2016

Oh, I found where the problem. Coverage calculates for postgresql 9.4, 9.5 and 9.6. But test from current PR not work on 9.4 and result coverage use from last complete job, sometimes it 9.4, sometimes it 9.5.

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk May 15, 2016

Contributor
Failed tests: 
  LogicalReplicationStatusTest.testApplyLocationDoNotDependOnFlushLocation:323 Last applied LSN and last flushed LSN it two not depends parameters and they can be not equal between
Expected: not <LSN{0/58B3540}>
     but: was <LSN{0/58B3540}>
  LogicalReplicationStatusTest.testReceivedLSNDependentOnProcessMessage:117 After receive each new message current LSN updates in stream
Expected: not <LSN{0/58AC338}>
     but: was <LSN{0/58AC338}>

Oh, bug in 9.4, again... Test say as that last apply and last flush LSN it is same parameters in contrast to feature version. It bug reproduce only for 9.4.5 version. If try install postgresql 9.4.8 all tests will pass.

Contributor

Gordiychuk commented May 15, 2016

Failed tests: 
  LogicalReplicationStatusTest.testApplyLocationDoNotDependOnFlushLocation:323 Last applied LSN and last flushed LSN it two not depends parameters and they can be not equal between
Expected: not <LSN{0/58B3540}>
     but: was <LSN{0/58B3540}>
  LogicalReplicationStatusTest.testReceivedLSNDependentOnProcessMessage:117 After receive each new message current LSN updates in stream
Expected: not <LSN{0/58AC338}>
     but: was <LSN{0/58AC338}>

Oh, bug in 9.4, again... Test say as that last apply and last flush LSN it is same parameters in contrast to feature version. It bug reproduce only for 9.4.5 version. If try install postgresql 9.4.8 all tests will pass.

@colinmorelli

This comment has been minimized.

Show comment
Hide comment
@colinmorelli

colinmorelli Jul 30, 2016

Is anything else needed here? Would be great to get this support merged in. If there's something that's still missing I may be able to help get it added.

colinmorelli commented Jul 30, 2016

Is anything else needed here? Would be great to get this support merged in. If there's something that's still missing I may be able to help get it added.

@vlsi

This comment has been minimized.

Show comment
Hide comment
@vlsi

vlsi Jul 30, 2016

Member

Is anything else needed here?

At least a review is required. E.g. API names, etc, etc.
AFAIK, @ahachete had some comments.

Member

vlsi commented Jul 30, 2016

Is anything else needed here?

At least a review is required. E.g. API names, etc, etc.
AFAIK, @ahachete had some comments.

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk Jul 31, 2016

Contributor

@ahachete As I remember, the main problem in API it non-obvious method name PGReplicationStream#read, maybe name like readMessagePayload will be more obvious and acceptable?

Contributor

Gordiychuk commented Jul 31, 2016

@ahachete As I remember, the main problem in API it non-obvious method name PGReplicationStream#read, maybe name like readMessagePayload will be more obvious and acceptable?

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk Jul 31, 2016

Contributor

Is anything else needed here? Would be great to get this support merged in. If there's something that's still missing I may be able to help get it added.

The main issue that block current PR it #558, because without it not available with replication connection as with simple connection, for example pooling, check valid, ect.

I still can't find time to fix #558 because it required big refactoring.

Contributor

Gordiychuk commented Jul 31, 2016

Is anything else needed here? Would be great to get this support merged in. If there's something that's still missing I may be able to help get it added.

The main issue that block current PR it #558, because without it not available with replication connection as with simple connection, for example pooling, check valid, ect.

I still can't find time to fix #558 because it required big refactoring.

@colinmorelli

This comment has been minimized.

Show comment
Hide comment
@colinmorelli

colinmorelli Jul 31, 2016

Is that necessarily a strict dependency here? While it would be nice to support pooling for replication connections, I think the overwhelming majority of use cases for this functionality would a) not require, and b) likely not work with multiple connections in a pool anyway.

As it stands this seems to very clearly be separate, opt-in functionality. As long as this doesn't break isValid for standard connections, it seems like it could still be useful (with comments in the readme/javadocs that replication can't currently work with pooled connections)

Thoughts?

colinmorelli commented Jul 31, 2016

Is that necessarily a strict dependency here? While it would be nice to support pooling for replication connections, I think the overwhelming majority of use cases for this functionality would a) not require, and b) likely not work with multiple connections in a pool anyway.

As it stands this seems to very clearly be separate, opt-in functionality. As long as this doesn't break isValid for standard connections, it seems like it could still be useful (with comments in the readme/javadocs that replication can't currently work with pooled connections)

Thoughts?

@colinmorelli

This comment has been minimized.

Show comment
Hide comment
@colinmorelli

colinmorelli Aug 1, 2016

I also have a suggestion here -

This might benefit from a lower level replication stream interface that provides access to all of the message types from the server, and performs no status updates unless explicitly asked to. That way the application can coordinate potentially multiple threads handling stream data.

Imagine the case that I have a single thread streaming the events, and a pool of 10 threads that are sending data from the stream into message queues. Now assume I've received 10 events, and pushed them all to the thread pool. With the driver handling checkpoints for me, it may end up sending a checkpoint to the server that I've read up to LSN 10, when really I've only received the ack up to LSN 1 from the message broker, and the other 9 threads are still waiting. Providing checkpoint control in the application would allow me to tradeoff exactly-once message delivery guarantees for at-least-once guarantees with theoretically higher throughput.

colinmorelli commented Aug 1, 2016

I also have a suggestion here -

This might benefit from a lower level replication stream interface that provides access to all of the message types from the server, and performs no status updates unless explicitly asked to. That way the application can coordinate potentially multiple threads handling stream data.

Imagine the case that I have a single thread streaming the events, and a pool of 10 threads that are sending data from the stream into message queues. Now assume I've received 10 events, and pushed them all to the thread pool. With the driver handling checkpoints for me, it may end up sending a checkpoint to the server that I've read up to LSN 10, when really I've only received the ack up to LSN 1 from the message broker, and the other 9 threads are still waiting. Providing checkpoint control in the application would allow me to tradeoff exactly-once message delivery guarantees for at-least-once guarantees with theoretically higher throughput.

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk Aug 2, 2016

Contributor

I also have a suggestion here -

This might benefit from a lower level replication stream interface that provides access to all of the message types from the server, and performs no status updates unless explicitly asked to. That way the application can coordinate potentially multiple threads handling stream data.

Imagine the case that I have a single thread streaming the events, and a pool of 10 threads that are sending data from the stream into message queues. Now assume I've received 10 events, and pushed them all to the thread pool. With the driver handling checkpoints for me, it may end up sending a checkpoint to the server that I've read up to LSN 10, when really I've only received the ack up to LSN 1 from the message broker, and the other 9 threads are still waiting. Providing checkpoint control in the application would allow me to tradeoff exactly-once message delivery guarantees for at-least-once guarantees with theoretically higher throughput.

It scenario already available. For example how I use this API in my project

      while (active) {
        //some code for calculate flush LSN base on queue feedback

        ByteBuffer buffer = stream.readPending();

        if (buffer == null) {
          TimeUnit.MILLISECONDS.sleep(deleayTimeMillis);
          continue;
        }

        LogSequenceNumber currentLSN = stream.getLastReceiveLSN();

        String topic = protocol.getTopic(buffer);
        ByteBuffer key = protocol.getKey(buffer);
        ByteBuffer value = protocol.getValue(buffer);

        queueProducer.message()
            .withTopic(topic)
            .withKey(key)
            .withValue(value)
            .withCallback(new QueueCallback(currentLSN, startTime))
            .send();


        stream.setFlushedLSN(flushed);
        stream.setAppliedLSN(flushed);
      }

If simplify logic, callback function executes when message from queue success process, and it function modify flushed variable(in fact, there is a bit of magic on calculate current LSN position).

Contributor

Gordiychuk commented Aug 2, 2016

I also have a suggestion here -

This might benefit from a lower level replication stream interface that provides access to all of the message types from the server, and performs no status updates unless explicitly asked to. That way the application can coordinate potentially multiple threads handling stream data.

Imagine the case that I have a single thread streaming the events, and a pool of 10 threads that are sending data from the stream into message queues. Now assume I've received 10 events, and pushed them all to the thread pool. With the driver handling checkpoints for me, it may end up sending a checkpoint to the server that I've read up to LSN 10, when really I've only received the ack up to LSN 1 from the message broker, and the other 9 threads are still waiting. Providing checkpoint control in the application would allow me to tradeoff exactly-once message delivery guarantees for at-least-once guarantees with theoretically higher throughput.

It scenario already available. For example how I use this API in my project

      while (active) {
        //some code for calculate flush LSN base on queue feedback

        ByteBuffer buffer = stream.readPending();

        if (buffer == null) {
          TimeUnit.MILLISECONDS.sleep(deleayTimeMillis);
          continue;
        }

        LogSequenceNumber currentLSN = stream.getLastReceiveLSN();

        String topic = protocol.getTopic(buffer);
        ByteBuffer key = protocol.getKey(buffer);
        ByteBuffer value = protocol.getValue(buffer);

        queueProducer.message()
            .withTopic(topic)
            .withKey(key)
            .withValue(value)
            .withCallback(new QueueCallback(currentLSN, startTime))
            .send();


        stream.setFlushedLSN(flushed);
        stream.setAppliedLSN(flushed);
      }

If simplify logic, callback function executes when message from queue success process, and it function modify flushed variable(in fact, there is a bit of magic on calculate current LSN position).

@colinmorelli

This comment has been minimized.

Show comment
Hide comment
@colinmorelli

colinmorelli Aug 2, 2016

@Gordiychuk Perfect, looks like I misread the code there. Is your fork of the driver published anywhere accessible? Would be great to play around with this.

colinmorelli commented Aug 2, 2016

@Gordiychuk Perfect, looks like I misread the code there. Is your fork of the driver published anywhere accessible? Would be great to play around with this.

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk Aug 2, 2016

Contributor

@colinmorelli only inside internal nexus repository, but you can checkout my branch and build by yourself

Contributor

Gordiychuk commented Aug 2, 2016

@colinmorelli only inside internal nexus repository, but you can checkout my branch and build by yourself

@vlsi

This comment has been minimized.

Show comment
Hide comment
@vlsi

vlsi Aug 6, 2016

Member

@Gordiychuk , I've implemented QueryExecutor.QUERY_EXECUTE_AS_SIMPLE in #618 (and org.postgresql.PGProperty#SIMPLE_PROTOCOL_ONLY).

Even though not all the cases supported, it might be enough to support replication protocol.
Can you check that?

Member

vlsi commented Aug 6, 2016

@Gordiychuk , I've implemented QueryExecutor.QUERY_EXECUTE_AS_SIMPLE in #618 (and org.postgresql.PGProperty#SIMPLE_PROTOCOL_ONLY).

Even though not all the cases supported, it might be enough to support replication protocol.
Can you check that?

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk Aug 8, 2016

Contributor

@vlsi Good. Yes, it help us resolve issue #558. After merge #618, I rebase my branch and add few new tests to ensure that all fine.

Contributor

Gordiychuk commented Aug 8, 2016

@vlsi Good. Yes, it help us resolve issue #558. After merge #618, I rebase my branch and add few new tests to ensure that all fine.

Show outdated Hide outdated pgjdbc/src/test/java/org/postgresql/replication/LogicalReplicationTest.java
@Test(timeout = 1000)
public void testNotAvailableStartNotExistReplicationSlot() throws Exception {
exception.expect(PSQLException.class);
exception.expectMessage(CoreMatchers.containsString("does not exist"));

This comment has been minimized.

@vlsi

vlsi Aug 20, 2016

Member

Can this be refactored into PSQLState comparison?

@vlsi

vlsi Aug 20, 2016

Member

Can this be refactored into PSQLState comparison?

@colinmorelli

This comment has been minimized.

Show comment
Hide comment
@colinmorelli

colinmorelli Oct 11, 2016

Looks like it has been a little while since there was movement on this - is this still slated for 9.4.1212? Would be great to get this in.

colinmorelli commented Oct 11, 2016

Looks like it has been a little while since there was movement on this - is this still slated for 9.4.1212? Would be great to get this in.

@hchiorean

This comment has been minimized.

Show comment
Hide comment
@hchiorean

hchiorean Oct 18, 2016

@vlsi, @Gordiychuk quick question: according to the PG streaming replication docs Standby status update (F) messages should send back the LSN offsets +1.

Atm. the current code seems to be sending back the same values that were received from the server (I think this is also the case with the pg_recvlogical code). Can this present any problems with regard to the server's offset tracking for replication slots (i.e. progress tracking) ? thanks

hchiorean commented Oct 18, 2016

@vlsi, @Gordiychuk quick question: according to the PG streaming replication docs Standby status update (F) messages should send back the LSN offsets +1.

Atm. the current code seems to be sending back the same values that were received from the server (I think this is also the case with the pg_recvlogical code). Can this present any problems with regard to the server's offset tracking for replication slots (i.e. progress tracking) ? thanks

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk Oct 18, 2016

Contributor

@hchiorean good remark. It's bug current implementation and pg_recvlogical. I also recheck logic of walreceiver.c:
From XLogData reads start lsn https://github.com/postgres/postgres/blob/6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa/src/backend/replication/walreceiver.c#L891
then write payload to file https://github.com/postgres/postgres/blob/6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa/src/backend/replication/walreceiver.c#L1006 and safe it as value for feedback https://github.com/postgres/postgres/blob/6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa/src/backend/replication/walreceiver.c#L1027

So, replay LSN should looks like startLSN+payloadSize.

Can this present any problems with regard to the server's offset tracking for replication slots (i.e. progress tracking) ? thanks

Yes, it can. For example wals will not truncated, because slot receive it even if payload for LSN was already read.

Contributor

Gordiychuk commented Oct 18, 2016

@hchiorean good remark. It's bug current implementation and pg_recvlogical. I also recheck logic of walreceiver.c:
From XLogData reads start lsn https://github.com/postgres/postgres/blob/6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa/src/backend/replication/walreceiver.c#L891
then write payload to file https://github.com/postgres/postgres/blob/6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa/src/backend/replication/walreceiver.c#L1006 and safe it as value for feedback https://github.com/postgres/postgres/blob/6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa/src/backend/replication/walreceiver.c#L1027

So, replay LSN should looks like startLSN+payloadSize.

Can this present any problems with regard to the server's offset tracking for replication slots (i.e. progress tracking) ? thanks

Yes, it can. For example wals will not truncated, because slot receive it even if payload for LSN was already read.

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk Oct 19, 2016

Contributor

@hchiorean I fix the problem that you found in the last commit. Also add few test on restart from replication slot restart lsn

Contributor

Gordiychuk commented Oct 19, 2016

@hchiorean I fix the problem that you found in the last commit. Also add few test on restart from replication slot restart lsn

@hchiorean

This comment has been minimized.

Show comment
Hide comment
@hchiorean

hchiorean commented Oct 20, 2016

@Gordiychuk great, thanks

@vlsi vlsi modified the milestones: 9.4.1213, 9.4.1212 Nov 2, 2016

Gordiychuk added a commit to Gordiychuk/www that referenced this pull request Nov 13, 2016

docs: Replication protocol
PR contains sample of use replication protocol implemented in PR
pgjdbc/pgjdbc#550

@vlsi vlsi referenced this pull request Nov 25, 2016

Closed

refactor: Remove copy streams #694

Gordiychuk added some commits Aug 13, 2016

feat: Add support CopyBothResponse('W') package for Copy API
Replication protocol use CopyBothResponce package for initial bidirectional copy protocol.
Then WAL data is sent as a series of CopyData messages. With periodical exchange KeepAlive package.
feat: Add support Logical/Physical replication protocol
Replication for protocol version 3 work via CopyAPI. For protocol version 2 replication protocol not supports.
Main class for work with replication protocol it PGReplicationStream. It class hide low level
replication protocol and prediodical update status messages allow work only with payload.

Current implementation faced with logical replication protocol bug:
After close ReplicationStream backend not send CommandCompleate and ReadyForQuery packages.
As result it bug broke scenario when from replication stream fetches wals and send to another system for example elasticsearch asynchonize
- after get first problem during asynchronize message send, replication protocol close and restart replication from last success send wal record.

Example logical API:

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .logical()
            .withSlotName("test_decoding")
            .withSlotOption("include-xids", false)
            .withSlotOption("skip-empty-xacts", true)
            .start();
    while (true) {
      ByteBuffer buffer = stream.read();
      //process logical changes
    }

Example physical API:
    LogSequenceNumber lsn = getCurrentLSN();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .physical()
            .withStartPosition(lsn)
            .start();

    while (true) {
      ByteBuffer read = stream.read();
      //process binary WAL logs
    }
feat: Add replication settings for Travis CI
Now user in tests should have replication privilege and also postgres should be configure for accept replication connection, it necessary for test replication protocol
feat: Add ability receive chenges from replication stream without blo…
…cking

Periodical check replication stream allow for example process asynchronize
feedback from external system that accept changes from logical replication
stream more faster, because waiting new changes may take a lot of time that
not allow get feedback messages from external system where message can contain
fail and we should restart replication from fail position but we can't because
new changes absent in replication stream.

Example use:

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .logical()
            .withSlotName(SLOT_NAME)
            .withStartPosition(getCurrentLSN())
            .start();

    while (true) {
      ByteBuffer result = stream.readPending();
      if(result == null) {
        TimeUnit.MILLISECONDS.sleep(10);
        continue;
      }

      //process message
    }
test: Add ability ignore test if sever version too old
Add HaveMinimalServerVersion annotation with ServerVersionRule that allow
via annotations ignore test if current version too old.

Example
@HaveMinimalServerVersion("8.4")
public class CopyAPITest {
    @rule
    private ServerVersionRule versionRule = new ServerVersionRule();

    @test
    public void testCopyFromFile() throws Exception {
        // test copy api introduce in 8.4 version
    }
}

public class LogicalReplicationTest {
    @rule
    private ServerVersionRule versionRule = new ServerVersionRule();

    @test
    @HaveMinimalServerVersion("9.4")
    public void testStartLogicalReplication() throws Exception {
        // test logical replication introduced in 9.4
    }
}
feat: Add intermediate interface for Replication API
org.postgresql.replication.PGReplication necessary for extend replication
commands in futures.

Example start logical replication:

    pgConnection
        .getReplicationAPI()
        .createReplicationSlot()
        .logical()
        .withSlotName("mySlot")
        .withOutputPlugin("test_decoding")
        .make();

    PGReplicationStream stream =
        pgConnection
            .getReplicationAPI()
            .replicationStream()
            .logical()
            .withSlotName("test_decoding")
            .withSlotName("mySlot")
            .withSlotOption("include-xids", false)
            .withSlotOption("skip-empty-xacts", true)
            .start();
bug: lastServerLSN != lastReciveLSN
When first message received from server it keep allive we set as last receive
lsn last lsn position on server, as result during reply on keep alive we can
lost reserved wal. Also during process XLogData we can't use last server lsn
from message.
LastReceivedLSN should inclide payload offset
Official documentation say that status update should send back
lsn + offcet that was write or apply to dist:
"The location of the last WAL byte + 1 received
and written to disk in the standby.". In our case it means that
last receive LSN should contains also payload offset.
bug: Run replication test only if user have grants on it
Check max_wal_senders not enough for decide, run or not replication tests
because replication can be configure on server, but tests user doesn't have
grants to use replication. For deal with this scenario now we check not
only configured replication but also that user have replication grants.

@vlsi vlsi changed the title from feat: Add Support replication protocol to feat: add replication protocol API Nov 25, 2016

@vlsi vlsi merged commit c4e84f6 into pgjdbc:master Nov 25, 2016

2 checks passed

codecov/project 64.08% (+0.74%) compared to d32b077
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
import java.util.Queue;
public class CopyDualImpl extends CopyOperationImpl implements CopyDual {
private Queue<byte[]> received = new LinkedList<byte[]>();

This comment has been minimized.

@vlsi

vlsi Nov 25, 2016

Member

Just for the history: ArrayDeque should be preferred over LinkedList

@vlsi

vlsi Nov 25, 2016

Member

Just for the history: ArrayDeque should be preferred over LinkedList

@vlsi

This comment has been minimized.

Show comment
Hide comment
@vlsi

vlsi Nov 25, 2016

Member

@Gordiychuk , thanks for pushing it forward.

Member

vlsi commented Nov 25, 2016

@Gordiychuk , thanks for pushing it forward.

vlsi added a commit that referenced this pull request Nov 25, 2016

feat: add replication protocol API (#550)
The replication protocol is managed by PGReplicationStream. It hides low level
replication protocol details and enables end user deal with just payload data.

The entry point is `PGConnection#getReplicationAPI`.

Current PostgreSQL backend has issues with terminating of replication connection (e.g. "stop decode" message might be ignored for a while, so termination would take some time).
Relevant hacker's thread is https://www.postgresql.org/message-id/CAFgjRd3hdYOa33m69TbeOfNNer2BZbwa8FFjt2V5VFzTBvUU3w%40mail.gmail.com

Locgical replication API:

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .logical()
            .withSlotName("test_decoding")
            .withSlotOption("include-xids", false)
            .withSlotOption("skip-empty-xacts", true)
            .start();
    while (true) {
      ByteBuffer buffer = stream.read();
      //process logical changes
    }

Physical replication API:

    LogSequenceNumber lsn = getCurrentLSN();

    PGReplicationStream stream =
        pgConnection
            .replicationStream()
            .physical()
            .withStartPosition(lsn)
            .start();

    while (true) {
      ByteBuffer read = stream.read();
      //process binary WAL logs
    }

The main purpose for supporting of the replication protocol at the driver level is to provide an ability to create realtime time integration with external systems (e.g. Kafka+ElasticSearch)
@jorsol

This comment has been minimized.

Show comment
Hide comment
@jorsol

jorsol Dec 1, 2016

Contributor

I know this is already merged, but is V2ReplicationProtocol.java necessary?

The driver already dropped support for v2 protocol and it looks that V2ReplicationProtocol.java is not used anywhere.

Contributor

jorsol commented Dec 1, 2016

I know this is already merged, but is V2ReplicationProtocol.java necessary?

The driver already dropped support for v2 protocol and it looks that V2ReplicationProtocol.java is not used anywhere.

@vlsi

This comment has been minimized.

Show comment
Hide comment
@vlsi

vlsi Dec 1, 2016

Member

@jorsol , seems it can be dropped. I just did not catch that.

Member

vlsi commented Dec 1, 2016

@jorsol , seems it can be dropped. I just did not catch that.

davecramer added a commit to pgjdbc/www that referenced this pull request Feb 3, 2017

docs: Replication protocol (#41)
* docs: Replication protocol

PR contains sample of use replication protocol implemented in PR
pgjdbc/pgjdbc#550

* Update replication.md

@Gordyichuck, hopefully I haven't changed the essence ?

* Move replication to extension sub group
@davecramer

This comment has been minimized.

Show comment
Hide comment
@davecramer

davecramer Feb 19, 2017

Member

@Gordiychuk I'd like to add Autoclosable to both PGReplicationStream and PGReplicationConnection?

Thoughts ?

Member

davecramer commented Feb 19, 2017

@Gordiychuk I'd like to add Autoclosable to both PGReplicationStream and PGReplicationConnection?

Thoughts ?

@Gordiychuk

This comment has been minimized.

Show comment
Hide comment
@Gordiychuk

Gordiychuk Feb 20, 2017

Contributor

@davecramer good idea

Contributor

Gordiychuk commented Feb 20, 2017

@davecramer good idea

@jorsol

This comment has been minimized.

Show comment
Hide comment
@jorsol

jorsol Feb 20, 2017

Contributor

AutoCloseable is a Java 7 feature, the support for Java 6 will be dropped in the near future?

Contributor

jorsol commented Feb 20, 2017

AutoCloseable is a Java 7 feature, the support for Java 6 will be dropped in the near future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment