Smart replication feedback #913

CyberDem0n · 2019-05-06T09:16:09Z

This commit makes psycopg2 responsible for sending the status update (feedback) messages to the server regardless of whether a synchronous or asynchronous connection is used.

Feedback is sent every status_interval (default value is 10) seconds, which could be configured by passing a corresponding parameter to the start_replication() or start_replication_expert() methods.
The actual feedback message is sent by the pq_read_replication_message() when the status_interval timeout is reached.

The default behavior of the send_feedback() method is changed. It doesn't send a feedback message on every call anymore but just updates internal structures. There is still a way to force sending a message if force or reply parameters are set.

The new approach has certain advantages:

The client can simply call the send_feedback() for every processed message and the library will take care of not overwhelming the server. Actually, in the synchronous mode it is even mandatory to confirm every processed message.
The library tracks internally the pointer of the last received message which is not keepalive. If the client confirmed the last message and after that server sends only keepalives with increasing wal_end, the library can safely move forward flush position to the wal_end and later automatically report it to the server.

Reporting of the wal_end received from keepalive messages is very important. Not doing so casing:

Excessive disk usage, because the replication slot prevents from WAL being cleaned up.
The smart and fast shutdown of the server could last indefinitely because walsender waits until the client report flush position equal to the wal_end.

The new way of sending status update (feedback) messages is inspired by reading the source code of the pg_recvlogical and pgjdbc.

This implementation is only extending the existing API and therefore should not break any of the existing code.

This commit makes psycopg2 responsible for sending the status update (feedback) messages to the server regardless of whether a synchronous or asynchronous connection is used. Feedback is sent every *status_update* (default value is 10) seconds, which could be configured by passing a corresponding parameter to the `start_replication()` or `start_replication_expert()` methods. The actual feedback message is sent by the `pq_read_replication_message()` when the *status_update* timeout is reached. The default behavior of the `send_feedback()` method is changed. It doesn't send a feedback message on every call anymore but just updates internal structures. There is still a way to *force* sending a message if *force* or *reply* parameters are set. The new approach has certain advantages: 1. The client can simply call the `send_feedback()` for every processed message and the library will take care of not overwhelming the server. Actually, in the synchronous mode it is even mandatory to confirm every processed message. 2. The library tracks internally the pointer of the last received message which is not keepalive. If the client confirmed the last message and after that server sends only keepalives with increasing *wal_end*, the library can safely move forward *flush* position to the *wal_end* and later automatically report it to the server. Reporting of the *wal_end* received from keepalive messages is very important. Not doing so casing: 1. Excessive disk usage, because the replication slot prevents from WAL being cleaned up. 2. The smart and fast shutdown of the server could last indefinitely because walsender waits until the client report *flush* position equal to the *wal_end*. This implementation is only extending the existing API and therefore should not break any of the existing code.

dvarrazzo · 2019-05-06T10:20:38Z

Thank you for the MR. It seems important enough that we could release the change in a bugfix release, if we are sure no code is broken by the change.

I would like to ask for feedback to @a1exsh and @ringerc, who are the authors of the original streaming replication support (#322)

psycopg/replication_cursor_type.c

a1exsh · 2019-05-06T10:24:11Z

tests/test_replication.py


        def consume(msg):
-            pass
-        # we don't see the error from the server before we try to read the data


This was the behavior with older server versions, which is corrected in newer ones, e.g. v11. No reason to test specific server behavior if we care about the client being able to recover from error.

a1exsh · 2019-05-06T10:26:47Z

@dvarrazzo we've discussed the need for the change and specific approach taken (maintaining backwards compatibility was a priority) with @CyberDem0n before he has implemented it.

dvarrazzo

The new parameters would require a versionchanged in the function docs, and the new attributes a versionadded.

dvarrazzo · 2019-05-06T10:40:57Z

@dvarrazzo we've discussed the need for the change and specific approach taken (maintaining backwards compatibility was a priority) with @CyberDem0n before he has implemented it.

Thank you @a1exsh: really appreciated!

The previous default value was 10 seconds, what might cause silent overwrite of the *status_interval* specified in the `start_replication()`

psycopg/replication_cursor_type.c

dvarrazzo · 2019-05-07T13:23:21Z

Looks good to me. I will add a NEWS entry myself on merging. Is this in a mergeable state for you, @CyberDem0n, @a1exsh?

a1exsh · 2019-05-07T13:27:57Z

Looks good to me!

Jan-M · 2019-05-13T11:42:56Z

Thanks for the quick help and merging.

167: Update psycopg2 requirement from ~=2.8.3 to ~=2.8.4 in /config r=MiklerGM a=dependabot-preview[bot] Updates the requirements on [psycopg2](https://github.com/psycopg/psycopg2) to permit the latest version. <details> <summary>Changelog</summary> *Sourced from [psycopg2's changelog](https://github.com/psycopg/psycopg2/blob/master/NEWS).* > Current release > --------------- > > What's new in psycopg 2.8.4 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > - Fixed building with Python 3.8 (:ticket:`[#854](psycopg/psycopg2#854). > - Don't swallow keyboard interrupts on connect when a password is specified > in the connection string (:ticket:`[#898](psycopg/psycopg2#898). > - Don't advance replication cursor when the message wasn't confirmed > (:ticket:`[#940](psycopg/psycopg2#940). > - Fixed inclusion of ``time.h`` on linux (:ticket:`[#951](psycopg/psycopg2#951). > - Fixed int overflow for large values in `~psycopg2.extensions.Column.table_oid` > and `~psycopg2.extensions.Column.type_code` (:ticket:`[#961](psycopg/psycopg2#961). > - `~psycopg2.errorcodes` map and `~psycopg2.errors` classes updated to > PostgreSQL 12. > - Wheel package compiled against OpenSSL 1.1.1d and PostgreSQL at least 11.4. > > > What's new in psycopg 2.8.3 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > - Added *interval_status* parameter to > `~psycopg2.extras.ReplicationCursor.start_replication()` method and other > facilities to send automatic replication keepalives at periodic intervals > (:ticket:`[#913](psycopg/psycopg2#913). > - Fixed namedtuples caching introduced in 2.8 (:ticket:`[#928](psycopg/psycopg2#928). > > > What's new in psycopg 2.8.2 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > - Fixed `~psycopg2.extras.RealDictCursor` when there are repeated columns > (:ticket:`[#884](psycopg/psycopg2#884). > - Binary packages built with openssl 1.1.1b. Should fix concurrency problems > (:tickets:`[#543](psycopg/psycopg2#543), [#836](psycopg/psycopg2#836). > > > What's new in psycopg 2.8.1 > ^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > - Fixed `~psycopg2.extras.RealDictRow` modifiability (:ticket:`[#886](psycopg/psycopg2#886). > - Fixed "there's no async cursor" error polling a connection with no cursor > (:ticket:`[#887](psycopg/psycopg2#887). > > > What's new in psycopg 2.8 > ------------------------- > > New features: ></tr></table> ... (truncated) </details> <details> <summary>Commits</summary> - See full diff in [compare view](https://github.com/psycopg/psycopg2/commits) </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language - `@dependabot badge me` will comment on this PR with code to add a "Dependabot enabled" badge to your readme Additionally, you can set the following in your Dependabot [dashboard](https://app.dependabot.com): - Update frequency (including time of day and day of week) - Pull request limits (per update run and/or open at any time) - Automerge options (never/patch/minor, and dev/runtime dependencies) - Out-of-range updates (receive only lockfile updates, if desired) - Security updates (receive only security updates, if desired) </details> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>

230: Update psycopg2 requirement from ~=2.8.4 to ~=2.8.5 in /config r=whirish a=dependabot-preview[bot] Updates the requirements on [psycopg2](https://github.com/psycopg/psycopg2) to permit the latest version. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/psycopg/psycopg2/blob/master/NEWS">psycopg2's changelog</a>.</em></p> <blockquote> <h2>Current release</h2> <p>What's new in psycopg 2.8.5 ^^^^^^^^^^^^^^^^^^^^^^^^^^^</p> <ul> <li>Fixed use of <code>!connection_factory</code> and <code>!cursor_factory</code> together (:ticket:<code>[#1019](https://github.com/psycopg/psycopg2/issues/1019)</code>).</li> <li>Added support for <code>~logging.LoggerAdapter</code> in <code>~psycopg2.extras.LoggingConnection</code> (:ticket:<code>[#1026](https://github.com/psycopg/psycopg2/issues/1026)</code>).</li> <li><code>~psycopg2.extensions.Column</code> objects in <code>cursor.description</code> can be sliced (:ticket:<code>[#1034](https://github.com/psycopg/psycopg2/issues/1034)</code>).</li> <li>Added AIX support (:ticket:<code>[#1061](https://github.com/psycopg/psycopg2/issues/1061)</code>).</li> <li>Fixed <code>~copy.copy()</code> of <code>~psycopg2.extras.DictCursor</code> rows (:ticket:<code>[#1073](https://github.com/psycopg/psycopg2/issues/1073)</code>).</li> </ul> <p>What's new in psycopg 2.8.4 ^^^^^^^^^^^^^^^^^^^^^^^^^^^</p> <ul> <li>Fixed building with Python 3.8 (:ticket:<code>[#854](https://github.com/psycopg/psycopg2/issues/854)</code>).</li> <li>Don't swallow keyboard interrupts on connect when a password is specified in the connection string (:ticket:<code>[#898](https://github.com/psycopg/psycopg2/issues/898)</code>).</li> <li>Don't advance replication cursor when the message wasn't confirmed (:ticket:<code>[#940](https://github.com/psycopg/psycopg2/issues/940)</code>).</li> <li>Fixed inclusion of <code>time.h</code> on linux (:ticket:<code>[#951](https://github.com/psycopg/psycopg2/issues/951)</code>).</li> <li>Fixed int overflow for large values in <code>~psycopg2.extensions.Column.table_oid</code> and <code>~psycopg2.extensions.Column.type_code</code> (:ticket:<code>[#961](https://github.com/psycopg/psycopg2/issues/961)</code>).</li> <li><code>~psycopg2.errorcodes</code> map and <code>~psycopg2.errors</code> classes updated to PostgreSQL 12.</li> <li>Wheel package compiled against OpenSSL 1.1.1d and PostgreSQL at least 11.4.</li> </ul> <p>What's new in psycopg 2.8.3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^</p> <ul> <li>Added <em>interval_status</em> parameter to <code>~psycopg2.extras.ReplicationCursor.start_replication()</code> method and other facilities to send automatic replication keepalives at periodic intervals (:ticket:<code>[#913](https://github.com/psycopg/psycopg2/issues/913)</code>).</li> <li>Fixed namedtuples caching introduced in 2.8 (:ticket:<code>[#928](https://github.com/psycopg/psycopg2/issues/928)</code>).</li> </ul> <p>What's new in psycopg 2.8.2 ^^^^^^^^^^^^^^^^^^^^^^^^^^^</p> <ul> <li>Fixed <code>~psycopg2.extras.RealDictCursor</code> when there are repeated columns (:ticket:<code>[#884](https://github.com/psycopg/psycopg2/issues/884)</code>).</li> <li>Binary packages built with openssl 1.1.1b. Should fix concurrency problems (:tickets:<code>[#543](psycopg/psycopg2#543), [#836](https://github.com/psycopg/psycopg2/issues/836)</code>).</li> </ul> </tr></table> ... (truncated) </blockquote> </details> <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/psycopg/psycopg2/commits">compare view</a></li> </ul> </details> <br /> Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language - `@dependabot badge me` will comment on this PR with code to add a "Dependabot enabled" badge to your readme Additionally, you can set the following in your Dependabot [dashboard](https://app.dependabot.com): - Update frequency (including time of day and day of week) - Pull request limits (per update run and/or open at any time) - Automerge options (never/patch/minor, and dev/runtime dependencies) - Out-of-range updates (receive only lockfile updates, if desired) - Security updates (receive only security updates, if desired) </details> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>

a1exsh reviewed May 6, 2019

View reviewed changes

psycopg/replication_cursor_type.c Outdated Show resolved Hide resolved

a1exsh reviewed May 6, 2019

View reviewed changes

dvarrazzo requested changes May 6, 2019

View reviewed changes

Alexander Kukushkin added 2 commits May 6, 2019 15:26

Change the default value of keepalive_interval parameter to None

f827e49

The previous default value was 10 seconds, what might cause silent overwrite of the *status_interval* specified in the `start_replication()`

Improve docs

5eec11f

dvarrazzo reviewed May 7, 2019

View reviewed changes

psycopg/replication_cursor_type.c Outdated Show resolved Hide resolved

dvarrazzo approved these changes May 7, 2019

View reviewed changes

Address code-review

90755e6

dvarrazzo merged commit 90755e6 into psycopg:master May 12, 2019

tkaemming mentioned this pull request May 23, 2019

Correct behavior of keepalive task scheduling for PostgreSQL logical backend. getsentry/cdc#8

Merged

jbergknoff-rival mentioned this pull request Jul 2, 2019

Logical replication cursor advances when it should not #940

Closed

dependabot-preview bot mentioned this pull request Apr 29, 2020

Bump psycopg2-binary from 2.8.4 to 2.8.5 uwcirg/truenth-portal#3681

Closed

decmurphy mentioned this pull request Aug 8, 2023

Logical replication issue when client does not acknowledge keep alive message pgjdbc/pgjdbc#1490

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smart replication feedback #913

Smart replication feedback #913

CyberDem0n commented May 6, 2019 •

edited

dvarrazzo commented May 6, 2019

a1exsh May 6, 2019

a1exsh commented May 6, 2019

dvarrazzo left a comment

dvarrazzo commented May 6, 2019

dvarrazzo commented May 7, 2019

a1exsh commented May 7, 2019

Jan-M commented May 13, 2019

Smart replication feedback #913

Smart replication feedback #913

Conversation

CyberDem0n commented May 6, 2019 • edited

dvarrazzo commented May 6, 2019

a1exsh May 6, 2019

Choose a reason for hiding this comment

a1exsh commented May 6, 2019

dvarrazzo left a comment

Choose a reason for hiding this comment

dvarrazzo commented May 6, 2019

dvarrazzo commented May 7, 2019

a1exsh commented May 7, 2019

Jan-M commented May 13, 2019

CyberDem0n commented May 6, 2019 •

edited