Skip to content

Conversation

iambriccardo
Copy link
Contributor

@iambriccardo iambriccardo commented Sep 15, 2025

This PR introduces two key improvements to the apply loop:

  1. Delayed shutdown when a transaction is in progress

    • On receiving a shutdown signal, the system now checks whether it is currently in a transaction.
      • Not in a transaction → the apply loop terminates immediately.
      • In a transaction → the shutdown is deferred until the next transaction boundary element is processed.
    • This ensures that in-flight transactions are applied fully before shutdown. Since we do not use streamed transactions, each transaction is always received in full from Postgres, so the system can still terminate within a short timeframe.
  2. Distinction between stopping and completing the apply loop

    • The system can now differentiate whether the loop was:
      • Stopped (interrupted by shutdown), or
      • Completed (all work finished).
    • This distinction allows for correct resource cleanup and more predictable shutdown semantics.

@iambriccardo iambriccardo changed the title riccardobusetti/etl 288 implement mechanism to defer shutdown when in a transaction feat(apply): Add delayed shutdown mechanism Sep 15, 2025
@coveralls
Copy link

coveralls commented Sep 15, 2025

Pull Request Test Coverage Report for Build 17762933433

Details

  • 103 of 121 (85.12%) changed or added relevant lines in 3 files are covered.
  • 16 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.07%) to 83.1%

Changes Missing Coverage Covered Lines Changed/Added Lines %
etl/src/workers/apply.rs 17 18 94.44%
etl/src/workers/table_sync.rs 20 27 74.07%
etl/src/replication/apply.rs 66 76 86.84%
Files with Coverage Reduction New Missed Lines %
etl/src/workers/table_sync.rs 3 75.87%
etl/src/replication/client.rs 13 83.78%
Totals Coverage Status
Change from base Build 17733201470: 0.07%
Covered Lines: 13527
Relevant Lines: 16278

💛 - Coveralls

@iambriccardo
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines 43 to 46
impl Future for DeferredTimer {
type Output = ();

fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P0] Import Future trait for DeferredTimer

The new DeferredTimer implements impl Future for DeferredTimer but Future is never brought into scope, so the module does not compile (cannot find trait Future in this scope). Add use std::future::Future; (or fully qualify the trait) before implementing it so the delayed shutdown code can build.

Useful? React with 👍 / 👎.

Comment on lines 1095 to 1101
let mut continue_loop = hook.process_syncing_tables(end_lsn, false).await?;

// If we are in delayed shutdown, and we are processing a `COMMIT` event it means that the transaction
// was finished, so we want to end the apply loop.
if state.delayed_shutdown {
continue_loop = false;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Delayed shutdown never exits on transaction boundary

Setting continue_loop = false in the COMMIT handler when state.delayed_shutdown is true is intended to stop the apply loop once the in‑flight transaction completes, but this value is discarded by the caller. handle_replication_message_with_timeout builds its own continue_loop from batching and synchronization results and never consults the flag from handle_commit_message, so the loop keeps processing new events until the force‑shutdown timer expires. As a result, shutdowns requested mid‑transaction are always forced after the timeout and the promised graceful exit after the next boundary never occurs.

Useful? React with 👍 / 👎.

@iambriccardo iambriccardo marked this pull request as ready for review September 16, 2025 06:19
@iambriccardo iambriccardo requested a review from a team as a code owner September 16, 2025 06:19
@iambriccardo
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Here are some suggestions.

Reply with @codex fix comments to fix any unresolved comments.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

Comment on lines 1048 to 1051
// If discarded the shutdown, and we are processing a `COMMIT` event it means that the transaction
// was finished, so we want to end the apply loop.
if state.shutdown_discarded {
continue_loop = false;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Propagate delayed shutdown after commit

When a shutdown signal is received mid-transaction, handle_commit_message sets continue_loop = false based on state.shutdown_discarded, but that flag is only used to choose between return_boundary_event and finish_batch_and_return_boundary_event. handle_replication_message_with_timeout ignores the reason the batch was flushed and always returns true unless the hook reports a stop, so the apply loop continues to process new messages even though a shutdown was requested and the transaction finished. As a result the delayed shutdown never actually terminates the loop, defeating the feature’s intent.

Useful? React with 👍 / 👎.

@iambriccardo
Copy link
Contributor Author

@codex review

Copy link

Codex Review: Didn't find any major issues. What shall we delve into next?

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

@iambriccardo
Copy link
Contributor Author

@codex review

Copy link

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you open a pull request for review, mark a draft as ready, or comment "@codex review". If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex fix this CI failure" or "@codex address that feedback".

@iambriccardo iambriccardo changed the title feat(apply): Add delayed shutdown mechanism feat(apply): Add delayed shutdown mechanism and fix termination Sep 16, 2025
@iambriccardo iambriccardo merged commit 5d14c2b into main Sep 16, 2025
9 checks passed
@iambriccardo iambriccardo deleted the riccardobusetti/etl-288-implement-mechanism-to-defer-shutdown-when-in-a-transaction branch September 16, 2025 11:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants