feat(electric): Use two replications slots to allow clients to resume replication after Electric restarts #1043

alco · 2024-03-12T13:37:01Z

Continuing on from #1099, this PR introduces the concept of a "persistent replication slot". Now, instead of having an auto-advancing replication slot, Electric can keep around WAL records that contain transactions it has already seen and acknowledged previously. As a result, when Electric restarts, it can repopulate its in-memory cache of transactions and be able to resume clients' replication streams.

New configuration options ELECTRIC_TXN_CACHE_SIZE and ELECTRIC_RESUMABLE_WAL_WINDOW have been added. The relevant docs are updated in #1050.

linear · 2024-03-12T13:37:04Z

VAX-1686 Make "cached WAL window" and peristed "WAL window" configurable separately

icehaunter

Great work, exactly what I had in mind myself, and additional thanks for cleaning and testing the LSN module. One big question from me

components/electric/lib/electric/config.ex

icehaunter · 2024-03-14T13:22:58Z

components/electric/lib/electric/replication/postgres/logical_replication_producer.ex

+  # TODO: make sure we're not removing transactions that are about to be requested by a newly
+  # connected client. See VAX-1552.
+  #
+  # TODO(optimization): do not run this after every consumed transaction.


So I have 2 questions here:

Why are we running this whenever we see a Commit? We have ack_fn set to ack being the previous function - I believe it makes more sense to advance the slot as we do right now - once stored in memory at least (although I'll agree that it matters a bit less given that we're advancing a secondary slot here).

I see the "optimization" note you've left here, but I'm not sure I'm comfortable leaving this as a TODO. Advancing a slot on every transaction is fine-ish, however we're opening a new fully separate connection to PG on every received transaction. This is going to be extremely slow and costly for us, and PG won't be able to keep up with the amount of connections we're opening for one-shot queries. I'd be against even a pooling approach here - I believe we need to just open and hold a stable connection to PG to perform these tasks.

The optimization note wasn't meant to be left in the code for long. I split the code into multiple PRs to help myself draw boundaries between changes to different components and to get to a working version faster to then be able to focus on the edge cases and inefficiencies.

So, point taken. Holding a persistent connection might be the way to go if the current approach of advancing the slot often remains in place. Though I'm hoping to implement more of a batched approach, e.g. advancing every N transactions or in size-on-disk increments, at which point relying on a connection pool may be sufficient. I'll circle back to this discussion when I have more concrete ideas to share but I'm expecting those to be implemented in a follow-up PR in any case.

N.B.: I had a connection pool implemented as part of the early implementation work but had to take that out for now just to keep the scope of changes that are already spread over multiple PRs manageable.

With or without batching, holding a single open connection just for slot advancement is fine, since only one process is likely to ever do logical replication stream consumption. I think just spawn a sibling GenServer that holds the advancing connection - to split the responsibility a little - and just cast to that server on ack - this makes the whole approach easy and doesn't clog up logical replication consumer with ack message handling since casts will be sent to a different process. And if that genserver goes down and we lose a couple casts in the middle - no problem, worst case we'll have slightly more WAL stored at a point in time, but it'll resolve on any next cast.

I'm not sure how you want to structure the PR merge sequence, but I would insist on adding this functionality in this PR. It's not much code, but the difference in perfomance implications is huge.

The implementation has changed now that this PR is based on #1099 which is a fix for #1083.

See the new commit - 76addc4.

TL;DR: LogicalReplicationProducer now opens a persistent "service connection" during initialization and uses it to advance the main slot with the help of a timer.

components/electric/lib/electric/replication/postgres/slot_server.ex

magnetised · 2024-04-04T09:19:31Z

components/electric/lib/electric/postgres/lsn.ex

+  end
+
+  defimpl List.Chars do
+    def to_charlist(lsn), do: ~c'#{Electric.Postgres.Lsn.to_iolist(lsn)}'


nitpick: do you need to wrap this with the ~c sigil?

Mix formatter has been adding those since Elixir v1.15.0:

[Code] Code.format_string!/2 now converts 'charlists' into ~c"charlists" by default

magnetised

Super

icehaunter · 2024-04-04T12:27:19Z

components/electric/lib/electric/replication/postgres/logical_replication_producer.ex

    origin = state.origin

    %Transaction{
      transaction
      | lsn: end_lsn,
        # Make sure not to pass state.field into ack function, as this
        # will create a copy of the whole state in memory when sending a message
-        ack_fn: fn -> ack(conn, origin, end_lsn) end
+        ack_fn: fn -> ack(repl_conn, origin, end_lsn) end


I think this ack should not operate on repl_conn.

ack_fn function is going to be called in a different process later down the line - in particular, in WAL cache consumer stage. This means that if this repl connection is closed for any reason, old transcations will call it with a stale value and probably crash - I don't know how :epgsql.standby_status_update/3 acts when it's passed a nonexistent pid.

To decouple that a little and make sure that the process calling the ack_fn doesn't crash, I'd rather ack_fn sent a GenServer.cast/2 to a process that would actually hold the repl conn and do the acknowledge. If the cast fails because the holding GenServer is temporarily dead -- no problem, main code continues to operate, and worst case we acknowledge on the next magic write anyway. So I'd rather put this "advancing" repl conn into a separate GenServer that can restart and reconnect separately, and just GenServer.cast into it - maybe make it a :gproc via-tuple using the origin so that we don't even care where the repl process currently lives.

What do you think?

As we've discussed elsewhere, it is fine to acknowledge transactions inside LogicalReplicationProducer and rely on Elixir's supervision tree to restart things if anything fails. Since we now have a replication slot with a persistent starting point, we can replay already-seen transactions when a new replication connection opens.

icehaunter

Nice design, I like it! a couple of nitpicks, but nothing more. Let's coordinate a release before merging this though

icehaunter · 2024-04-10T08:29:13Z

components/electric/lib/electric/postgres/cached_wal/ets_backed.ex

-    |> Stream.map(fn %Transaction{lsn: lsn} = tx ->
-      {lsn_to_position(lsn), %{tx | ack_fn: nil}}
-    end)
+    |> Stream.map(fn %Transaction{} = tx -> {lsn_to_position(tx.lsn), tx} end)


nitpick: I don't understand why you don't like destructuring assignment of lsn, given that it's faster than dynamic dispatch of a . operator. Doesn't really matter in this case, just caught my eye.

It's not that I don't like it. As a rule of thumb, I use pattern matching in function heads when the same field is used multiple times in the function body. Because we have instances in the code where some fields are pattern-matched while others are accessed via the dot operator, the split usually looks arbitrary.

I believe that BEAM is smart enough to optimize uses of the dot operator if there's pattern-matching/guard in the function head. Can't find the source now, but here's at least a partial confirmation of that https://elixirforum.com/t/way-too-much-detail-about-matching-in-the-head-vs-accessing-maps-in-the-body/49167/2

icehaunter · 2024-04-10T08:39:55Z

components/electric/lib/electric/replication/postgres/logical_replication_producer.ex


-  defp emit_events(state, events) do
-    {:noreply, Enum.reverse(events), state}
+    state = %{state | queue: queue_remaining, queue_Len: queue_len - demand, demand: 0}


Typo that should lead to a runtime error, possibly means that this code path is not covered by test (although this is caught by Dialyzer)

Suggested change

state = %{state | queue: queue_remaining, queue_Len: queue_len - demand, demand: 0}

state = %{state | queue: queue_remaining, queue_len: queue_len - demand, demand: 0}

Yeah, I had to stop working on this abruptly yesterday and pushed what I had ready by then. This condition requires a specific test setup to cover. I think having dialyzer complain about it is good enough.

Now that we're using two replication slots, a magic write is no longer sufficient for releasing obsolete WAL records. We have to advance the main slot's starting point instead.

With a persistent main slot now in use, it is possible that Electric consumes the same transaction it has already seen before, after it restarts.

Control flow is simplified and it's easier to see opportunities for batching in the refactored code.

No need to spread the acknowledgement logic across multiple processes. Once a transaction has been emitted to the gen stage consumer, it doesn't need to be replayed by Postgres until Electric restars.

…ig (#1044) Follow-up to #1043. This PR makes the Elixir process run by `EtsBacked` scoped under the connector origin, similar to how other Postgres connector processes behave. It also updates telemetry metrics to use more precise names for what the `EtsBacked` cache actually stores: instead of "cache count" it's "transaction count".

alco mentioned this pull request Mar 12, 2024

chore(electric): Refactor CachedWal.EtsBacked to use the new WAL config #1044

Merged

alco force-pushed the alcoo/vax-1686-resumable-replication-slot branch from 79ee783 to f6de6d9 Compare March 12, 2024 21:45

This was referenced Mar 12, 2024

chore(docs): Document WAL window configuration options #1050

Merged

feat(electric): Persist shape subscriptions to be able to resume client replication after Electric restarts #1051

Closed

icehaunter requested changes Mar 14, 2024

View reviewed changes

alco force-pushed the alcoo/vax-1686-resumable-replication-slot branch 3 times, most recently from 9a369aa to a8e903e Compare March 26, 2024 23:02

alco changed the base branch from main to alco/vax-1751-unbounded-wal-size-growth-on-disk March 26, 2024 23:02

alco force-pushed the alcoo/vax-1686-resumable-replication-slot branch from a8e903e to bce37fa Compare April 2, 2024 10:54

magnetised reviewed Apr 4, 2024

View reviewed changes

magnetised approved these changes Apr 4, 2024

View reviewed changes

Base automatically changed from alco/vax-1751-unbounded-wal-size-growth-on-disk to main April 4, 2024 12:10

icehaunter requested changes Apr 4, 2024

View reviewed changes

alco force-pushed the alcoo/vax-1686-resumable-replication-slot branch 2 times, most recently from 8141723 to e9fcd3d Compare April 9, 2024 13:47

icehaunter approved these changes Apr 10, 2024

View reviewed changes

alco force-pushed the alcoo/vax-1686-resumable-replication-slot branch from fba560a to b2cc4d5 Compare April 10, 2024 10:53

alco added 11 commits April 10, 2024 23:20

Clean up Electric.Postgres.Lsn and add test coverage

855157f

WAL window config

c027c91

Clean up configuration and Connectors types

8712130

Use two replication slots for resumable and in-memory WAL caches

9694a8c

Replace magic write with replication slot advancement

3034028

Now that we're using two replication slots, a magic write is no longer sufficient for releasing obsolete WAL records. We have to advance the main slot's starting point instead.

Skip already applied migrations in MigrationConsumer

59f75c0

With a persistent main slot now in use, it is possible that Electric consumes the same transaction it has already seen before, after it restarts.

Add changeset

fe51e91

Add explanation for the source of transactions with empty changes

2ff5c7c

Refactor txn dispatch in LogicalReplicationProducer

57af070

Control flow is simplified and it's easier to see opportunities for batching in the refactored code.

Acknowledge transactions in LogicalReplicationProducer

8bb577e

No need to spread the acknowledgement logic across multiple processes. Once a transaction has been emitted to the gen stage consumer, it doesn't need to be replayed by Postgres until Electric restars.

Fix LogicalReplicationProducerTest

314550e

alco force-pushed the alcoo/vax-1686-resumable-replication-slot branch from b2cc4d5 to 314550e Compare April 10, 2024 20:20

alco merged commit cb17555 into main Apr 18, 2024
5 checks passed

alco deleted the alcoo/vax-1686-resumable-replication-slot branch April 18, 2024 13:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(electric): Use two replications slots to allow clients to resume replication after Electric restarts #1043

feat(electric): Use two replications slots to allow clients to resume replication after Electric restarts #1043

alco commented Mar 12, 2024 •

edited

Loading

linear bot commented Mar 12, 2024

icehaunter left a comment

icehaunter Mar 14, 2024

alco Mar 14, 2024

icehaunter Mar 17, 2024

alco Apr 2, 2024

magnetised Apr 4, 2024

alco Apr 4, 2024

magnetised left a comment

icehaunter Apr 4, 2024

alco Apr 10, 2024

icehaunter left a comment

icehaunter Apr 10, 2024

alco Apr 10, 2024

icehaunter Apr 10, 2024

alco Apr 10, 2024

	state = %{state \| queue: queue_remaining, queue_Len: queue_len - demand, demand: 0}
	state = %{state \| queue: queue_remaining, queue_len: queue_len - demand, demand: 0}

feat(electric): Use two replications slots to allow clients to resume replication after Electric restarts #1043

feat(electric): Use two replications slots to allow clients to resume replication after Electric restarts #1043

Conversation

alco commented Mar 12, 2024 • edited Loading

linear bot commented Mar 12, 2024

icehaunter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

magnetised left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

icehaunter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alco commented Mar 12, 2024 •

edited

Loading