Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Bug: Not valid calculate lastReceiveLSN for logical replication #801
lastReceiveLSN calculates not valid for logical replication. Right now it calculates like
Thanks Stefan Smith for report this problem .
 #550 (comment)
@@ Coverage Diff @@ ## master #801 +/- ## ============================================ - Coverage 65.25% 65.25% -0.01% Complexity 3517 3517 ============================================ Files 165 166 +1 Lines 15217 15226 +9 Branches 2465 2466 +1 ============================================ + Hits 9930 9935 +5 - Misses 4100 4101 +1 - Partials 1187 1190 +3
Algorithm of calculate lastReceiveLSN are different for logical and physical replication. For physical replication it startLsn from XLogData plus payload size. For logical replication as lastReceiveLSN should be use startLSN from XLogData. Add to result payload size not available, because from WAL reads not RAW data - as a result logical decoding message size can change and we get lsn from the future random future transaction.
Something to be aware of when keeping track of logical replication progress is that START_REPLICATION treats the LSN as *inclusive*, so if you START_REPLICATION 1/ABCD and there's a COMMIT record starting at exactly 1/ABCD, it'll get replayed. This matters for progress recording; it means you need to track progress with the end of the commit record of interest, or at least its start + 1, to avoid risking the same data being sent to you twice. It won't happen easily because at least in low-concurrency systems standby status updates usually advance the server's confirmed_flush_lsn quite promptly past the last confirmed commit lsn. The startpoint for replay is the max(confirmed_flush_lsn, specified_start_lsn) so that usually has the same effect as specifying a startpoint > the start of the last confirmed commit record. But you'll get bitten if the server crashes in the mean time, since we don't flush confirmed_flush_lsn advances from standby status updates to disk eagerly.
…bc#801) * Bug: Not valid receiveLSN that lead to lost parallel transactions Add test that reproduce issue from https://www.postgresql.org/message-id/CAHHbV7V4XvdHGw_jpR9Xyq3fz%3Df%2BO4oa%2B73sbizGTv_AvmDXhQ%40mail.gmail.com * bug: lastReceiveLSN not valid for logical replication Logical and Physical replication use different algorithms to calculate the lastReceiveLSN. For physical replication the calculation is: startLsn from XLogData plus the payloadsize; this is correct as we have the raw data. For logical replication the lastReceiveLSN uses startLSN from XLogData without the payload size as payload size is not available as a result logical decoding message size can change and we get LSN from the future random future transaction.