Skip to content

ERROR [task: 2] backend: Connection reset by peer (os error 104) - replication timeout #991

@vitabaks

Description

@vitabaks

PgDog version
v0.1.40

Description

After the data sync was completed, logical replication started successfully and ran for about 3 hours before we received the following error: Connection reset by peer (os error 104).
It looks like we were stuck processing LSN 930CE/68936E98 for more than 1 minute, and therefore PostgreSQL terminated walsender by wal_sender_timeout.

The database is under heavy write load with a large volume of incoming data while REPLICA IDENTITY FULL is enabled.

Logs

2026-05-19T20:25:29.297257Z  INFO table sync for 278 tables complete [dbname, shard: 0]
2026-05-19T20:25:29.449398Z DEBUG [cleanup] no cleanup needed, server in "idle" state [postgres@******:5434/dbname]
2026-05-19T20:25:29.470350Z DEBUG streaming replication on [postgres@******:5434/dbname]
2026-05-19T20:25:29.470371Z DEBUG replication from slot "__pgdog_repl_cs035kai5owx5kvazvd_0" started [postgres@******:5434/dbname]
...
2026-05-19T23:15:22.173803Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:15:24.713620Z DEBUG queries for table "public"."table_l0_a01_i1" already prepared
2026-05-19T23:15:24.718284Z DEBUG confirmed 2586937866284744 flushed [postgres@******:5434/dbname]
2026-05-19T23:15:24.718797Z DEBUG confirmed 2586937867074832 flushed [postgres@******:5434/dbname]
2026-05-19T23:15:24.718819Z  INFO closing server connection [postgres@******:5434/dbname, state: error, reason: other]
2026-05-19T23:15:24.718854Z  INFO closing server connection [postgres@******:5434/dbname, state: idle, reason: other]
2026-05-19T23:15:24.719614Z  INFO closing server connection [postgres@*******:5440/dbname, state: idle, reason: other]
2026-05-19T23:15:24.719658Z  INFO closing server connection [postgres@*******:5441/dbname, state: idle, reason: other]
2026-05-19T23:15:24.719678Z  INFO closing server connection [postgres@*******:5442/dbname, state: idle, reason: other]
2026-05-19T23:15:24.719715Z ERROR [task: 2] backend: Connection reset by peer (os error 104)

Source:

2026-05-19 23:14:53 UTC [postgres] [34589]: [22] LOG:  terminating walsender process due to replication timeout
2026-05-19 23:14:53 UTC [postgres] [34589]: [23] CONTEXT:  slot "__pgdog_repl_cs035kai5owx5kvazvd_0", output plugin "pgoutput", in the change callback, associated LSN 930CE/735BEF08
2026-05-19 23:14:53 UTC [postgres] [34589]: [24] STATEMENT:  START_REPLICATION SLOT "__pgdog_repl_cs035kai5owx5kvazvd_0" LOGICAL 930CD/52015998 ("proto_version" '4', origin 'any', "publication_names" '"pgdog"')
sudo docker logs pgdog-proxy 2>&1 | grep replicated
2026-05-19T20:25:59.524340Z  INFO replicated 249.332 MB position 930CD/9BA87B38 [0.000 MB/sec]
2026-05-19T20:26:04.525434Z  INFO replicated 249.383 MB position 930CD/9BAF3860 [0.010 MB/sec]
2026-05-19T20:26:09.526918Z  INFO replicated 249.383 MB position 930CD/9BAF3860 [0.000 MB/sec]
2026-05-19T20:26:14.527978Z  INFO replicated 249.401 MB position 930CD/9BB4B270 [0.004 MB/sec]
2026-05-19T20:26:19.529168Z  INFO replicated 249.401 MB position 930CD/9BB4B270 [0.000 MB/sec]
2026-05-19T20:26:24.530748Z  INFO replicated 249.401 MB position 930CD/9BB4B270 [0.000 MB/sec]
2026-05-19T20:26:29.531714Z  INFO replicated 249.401 MB position 930CD/9BB4B270 [0.000 MB/sec]
2026-05-19T20:26:34.532759Z  INFO replicated 249.442 MB position 930CD/9BC0A998 [0.008 MB/sec]
2026-05-19T20:26:39.533062Z  INFO replicated 249.573 MB position 930CD/9BD28E38 [0.026 MB/sec]
2026-05-19T20:26:44.541778Z  INFO replicated 249.673 MB position 930CD/9BF51D60 [0.020 MB/sec]
2026-05-19T20:26:49.542615Z  INFO replicated 249.880 MB position 930CD/9C0E1E20 [0.041 MB/sec]
2026-05-19T20:26:54.543744Z  INFO replicated 249.956 MB position 930CD/9C1BFC18 [0.015 MB/sec]
...
2026-05-19T23:13:12.147654Z  INFO replicated 838.616 MB position 930CE/671D9A78 [0.037 MB/sec]
2026-05-19T23:13:17.148549Z  INFO replicated 838.733 MB position 930CE/67290048 [0.023 MB/sec]
2026-05-19T23:13:22.149153Z  INFO replicated 838.812 MB position 930CE/67337848 [0.016 MB/sec]
2026-05-19T23:13:27.150124Z  INFO replicated 839.141 MB position 930CE/6740C158 [0.066 MB/sec]
2026-05-19T23:13:32.151730Z  INFO replicated 839.312 MB position 930CE/6752F7A0 [0.034 MB/sec]
2026-05-19T23:13:37.152347Z  INFO replicated 839.529 MB position 930CE/675AF7F8 [0.043 MB/sec]
2026-05-19T23:13:42.153812Z  INFO replicated 839.693 MB position 930CE/67666130 [0.033 MB/sec]
2026-05-19T23:13:47.154265Z  INFO replicated 839.972 MB position 930CE/67734068 [0.056 MB/sec]
2026-05-19T23:13:52.155917Z  INFO replicated 840.132 MB position 930CE/677E8938 [0.032 MB/sec]
2026-05-19T23:13:57.156708Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.315 MB/sec]
2026-05-19T23:14:02.157855Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:14:07.158474Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:14:12.159049Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:14:17.160414Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:14:22.161115Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:14:27.162193Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:14:32.163720Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:14:37.164311Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:14:42.165553Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:14:47.166515Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:14:52.167401Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:14:57.168699Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:15:02.169150Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:15:07.170438Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:15:12.171431Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:15:17.172108Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
2026-05-19T23:15:22.173803Z  INFO replicated 841.709 MB position 930CE/68936E98 [0.000 MB/sec]
sudo docker logs pgdog-proxy 2>&1 | grep -E "WARN|ERROR"
2026-05-19T20:25:29.494458Z  WARN unknown xlog message: Y
2026-05-19T20:25:29.496028Z  WARN table "public"."table1_l0_a01_d" has REPLICA IDENTITY FULL and no primary key; replication performance will be degraded without an index on the destination table.
2026-05-19T20:25:29.795201Z  WARN table "public"."table2_l0_a01_d" has REPLICA IDENTITY FULL and no primary key; replication performance will be degraded without an index on the destination table.
2026-05-19T20:25:29.955121Z  WARN table "public"."table1_l0_a01_i1" has REPLICA IDENTITY FULL and no primary key; replication performance will be degraded without an index on the destination table.
2026-05-19T20:25:29.956267Z  WARN table "public"."table2_l0_a01_i1" has REPLICA IDENTITY FULL and no primary key; replication performance will be degraded without an index on the destination table.
2026-05-19T20:25:29.964584Z  WARN table "public"."table1_l0_a01_ib" has REPLICA IDENTITY FULL and no primary key; replication performance will be degraded without an index on the destination table.
2026-05-19T20:25:29.965460Z  WARN table "public"."table2_l0_a01_ib" has REPLICA IDENTITY FULL and no primary key; replication performance will be degraded without an index on the destination table.
2026-05-19T20:25:29.966674Z  WARN table "public"."table1_l0_a01_i5" has REPLICA IDENTITY FULL and no primary key; replication performance will be degraded without an index on the destination table.
2026-05-19T20:25:29.967332Z  WARN table "public"."table2_l0_a01_i5" has REPLICA IDENTITY FULL and no primary key; replication performance will be degraded without an index on the destination table.
2026-05-19T20:40:05.442308Z  WARN unknown xlog message: Y
2026-05-19T20:57:13.727257Z  WARN unknown xlog message: Y
2026-05-19T21:12:17.525976Z  WARN unknown xlog message: Y
2026-05-19T21:15:34.140599Z  WARN unknown xlog message: Y
2026-05-19T21:28:00.154738Z  WARN unknown xlog message: Y
2026-05-19T21:30:39.242300Z  WARN unknown xlog message: Y
2026-05-19T21:40:23.537104Z  WARN unknown xlog message: Y
2026-05-19T21:41:34.816622Z  WARN unknown xlog message: Y
2026-05-19T21:55:25.838761Z  WARN unknown xlog message: Y
2026-05-19T22:03:09.008919Z  WARN unknown xlog message: Y
2026-05-19T22:04:03.524010Z  WARN unknown xlog message: Y
2026-05-19T22:08:53.784770Z  WARN unknown xlog message: Y
2026-05-19T22:14:15.349172Z  WARN unknown xlog message: Y
2026-05-19T22:20:09.509660Z  WARN unknown xlog message: Y
2026-05-19T22:23:17.306118Z  WARN unknown xlog message: Y
2026-05-19T22:40:31.504293Z  WARN unknown xlog message: Y
2026-05-19T22:56:06.796354Z  WARN unknown xlog message: Y
2026-05-19T22:56:06.796670Z  WARN unknown xlog message: Y
2026-05-19T23:10:28.941311Z  WARN unknown xlog message: Y
2026-05-19T23:15:24.719715Z ERROR [task: 2] backend: Connection reset by peer (os error 104)

Metrics

Metrics from the test server containing all 3 shards:

Image Image Image Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions