source-postgres-batch: Correctly handle XID wraparound #1623

willdonnelly · 2024-06-04T06:24:24Z

Description:

This PR modifies the source-postgres-batch connector in two related ways:

The polling query is tweaked so that it should work correctly even when the XID epoch wraps around (provided that the source table is updated more frequently than that wraparound happens -- if that's an issue I have ideas and believe it's fixable, but it felt out of scope for the current goal).
The resource config now leaves the template empty and expects the connector to supply the default behavior. Since the template used to hard-code in a table name, there are now two new resource config fields for the schema/table name, which simply get plumbed into the template. This means that if there's some minor issue with the polling query behavior we can fix it in the future by rolling out a new connector build, without having to go and update old captures which have "baked in" the query template initially provided by discovery.

I would have liked to have tested this PR more than I did, but exercising XID wraparound behavior in small-scale local testing is damnably hard. I have substituted a small amount of semi-manual testing involving pg_resetwal plus a whole bunch of thinking about the problem, so hopefully I didn't overlook anything important.

This change is

This commit changes the behavior of the PostgreSQL Batch source connector in two closely related ways. Firstly, it adds two new resource config properties `schema` and `table` which hold a schema name and table name, respectively, identifying the table from which a given resource is capturing. These properties are not required, however they are plumbed through and made available in the query template, and either the schema+table or the query template must be specified for a given resource config to be valid. The upshot of this is that the default query template can now just plug in those names instead of having them "hard coded" into the discovered template text, and that in turn means that we can leave the template unspecified in discovered resources and let the default behavior be updated in future connector releases instead of "baked in" at initial discovery time.

I believe that the new form of this query correctly handles XID wraparound, by the simple expedient of doing a wrapping 32-bit subtraction and using the result for ordering/filtering. This is one of those "I had to think about this and experiment for three days to confidently write what looks like a trivial three lines of code" situations.

mdibaiee · 2024-06-04T15:46:13Z

source-postgres-batch/main.go

+// an orderable count of "how far past the previous cursor" a given row is.
+//
+// We can emulate wrapping uint32 subtraction using PostgreSQL int64 values by simply writing
+// the expression `((x::bigint - y::bigint)<<32)>>32`.


I don't really understand how this is a wrapping subtraction, can you give a numerical example of a wrapped xid value and a cursor value and show how this calculation unfolds in practice please?

Suppose we have a table with four rows:

Row A has xmin = 0xFFFFC000

Row B has xmin = 0xFFFFF000. This is our cursor value, in this example.

Row C has xmin = 0xFFFFFFFF

Row D has xmin = 0x00001000

The correct comparison operation on XIDs is the u32 comparison-via-subtraction-and-sign-check function x - y > 0. In normal non-wrapping cases it should be obvious that this works because it's equivalent to x > y, and in wrapping cases we get the following results:

Row A 0xFFFFC000 - 0xFFFFF000 = -0x3000

Row B 0xFFFFF000 - 0xFFFFF000 = 0

Row C 0xFFFFFFFF - 0xFFFFF000 = 0xFFF

Row D 0x00001000 - 0xFFFFF000 = 0x2000

The (x<<32)>>32 operation takes the bottom 32 bits of an int64 quantity, shifts them up to the top of the int64 so that the 32-bit MSB becomes the int64's MSB, and then shifts them back down to the normal position. This is a fairly common idiom for "sign-extend an N-bit quantity into an M-bit one (where M > N)".

Working out the math on that final one in more detail, we have something like:

(((0x00001000::bigint - 0xFFFFF000::bigint)<<32)>>32)::int (((0x0000000000001000 - 0x00000000FFFFF000)<<32)>>32)::int ((0xFFFFFFFF00002000<<32)>>32)::int (0x0000200000000000>>32)::int 0x0000000000002000::int 0x00002000

But the final cast back to ::int is irrelevant and unnecessary for our purposes (we just want to filter for positive results and order by the result value) so we skip it.

I am assured that there is no longer any need to bound the size of an individual Flow transaction (if there ever was, it's possible that I misunderstood the issue from the start), and that every part of the system including all materializations should be capable of handling arbitrarily large transactions, even into the hundreds-of-millions-of-rows range. This is convenient, because it means that we can guarantee correct handling of the initial "backfill" query which might capture bazillions of rows with frozen XIDs. A task restart will result in restarting that backfill, but this is probably an acceptable tradeoff for now.

willdonnelly · 2024-06-04T19:24:54Z

Merging without LGTM because we've got a user eager to try this out. I'm reasonably confident that this PR is correct, and thanks to the changes around default template handling it should be less of an issue if we have to issue a followup to fix something.

willdonnelly added 2 commits June 3, 2024 17:55

willdonnelly requested a review from a team June 4, 2024 06:24

mdibaiee reviewed Jun 4, 2024

View reviewed changes

willdonnelly force-pushed the wgd/xmin-wraparound-20240529 branch from c8b498b to 47d2332 Compare June 4, 2024 19:17

willdonnelly merged commit ff661d5 into main Jun 4, 2024
45 of 46 checks passed

willdonnelly deleted the wgd/xmin-wraparound-20240529 branch June 4, 2024 19:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

source-postgres-batch: Correctly handle XID wraparound #1623

source-postgres-batch: Correctly handle XID wraparound #1623

willdonnelly commented Jun 4, 2024 •

edited by jgraettinger

Loading

mdibaiee Jun 4, 2024

willdonnelly Jun 4, 2024

willdonnelly Jun 4, 2024

willdonnelly commented Jun 4, 2024 •

edited

Loading

source-postgres-batch: Correctly handle XID wraparound #1623

source-postgres-batch: Correctly handle XID wraparound #1623

Conversation

willdonnelly commented Jun 4, 2024 • edited by jgraettinger Loading

mdibaiee Jun 4, 2024

Choose a reason for hiding this comment

willdonnelly Jun 4, 2024

Choose a reason for hiding this comment

willdonnelly Jun 4, 2024

Choose a reason for hiding this comment

willdonnelly commented Jun 4, 2024 • edited Loading

willdonnelly commented Jun 4, 2024 •

edited by jgraettinger

Loading

willdonnelly commented Jun 4, 2024 •

edited

Loading