Skip to content

feat(core): Implement memory based backpressure mechanism#605

Draft
iambriccardo wants to merge 18 commits intomainfrom
riccardobusetti/etl-525-implement-memory-aware-processing
Draft

feat(core): Implement memory based backpressure mechanism#605
iambriccardo wants to merge 18 commits intomainfrom
riccardobusetti/etl-525-implement-memory-aware-processing

Conversation

@iambriccardo
Copy link
Contributor

@iambriccardo iambriccardo commented Feb 17, 2026

This PR introduces a new memory backpressure mechanism that monitors memory usage using the sysinfo crate, taking cgroup limits into account. Memory measurements are taken every 100 ms. When memory usage exceeds defined thresholds, a watch channel emits the current backpressure state. This state is consumed by the two main data ingestion streams: table copy and table streaming.

As part of this change, the existing stream abstractions have been improved as follows:

  • A new BackpressureStream has been added. It listens for memory backpressure signals and pauses consumption of elements until memory pressure is relieved.
  • A BatchBackpressureStream variant has been introduced, combining the previous timed batch behavior with support for memory-based backpressure alerts.

One important consequence of this design is that, while the system is paused due to high memory usage, it may delay detection of connection closure messages or errors. To avoid forcing streams to be aware of connection status or to poll unnecessarily during backpressure periods (just to detect termination signals), a cleaner solution was chosen:

The connection task now broadcasts its status changes (via a watch channel or similar). In the select! branches of both the table copy and table streaming tasks, we immediately terminate processing upon receiving a connection error or closure signal. This approach is preferred over the previous reliance on errors propagating through tokio-postgres stream/table-copy layers, which often led to semantically confusing error handling.

For more context on the underlying behavior, see the tokio-postgres internals, particularly:
connection.rs, copy_out.rs, and copy_both.rs.

Another improvement: The batch stream no longer waits for shutdown signals internally (which was unclean). Instead, shutdown handling is now consistently managed via select! in the table copy logic as well.

@coderabbitai
Copy link

coderabbitai bot commented Feb 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR implements memory-aware processing by adding a MemoryMonitor that samples system memory and broadcasts a hysteresis-based blocked/unblocked signal. Two stream wrappers, BackpressureStream and BatchBackpressureStream, observe the memory signal and pause/resume or flush streams and batches accordingly. MemoryMonitor is created at pipeline startup and threaded through ApplyWorker/ApplyLoop/TableSyncWorker and table_copy call paths so table-copy and streaming flows respond to memory pressure. Several configuration fields and plumbing (config, workers, replication client updates) were added to propagate the monitor.

Sequence Diagram(s)

sequenceDiagram
    participant Monitor as MemoryMonitor (bg task)
    participant System as sysinfo::System
    participant Watch as watch::Sender<bool>
    participant Subscriber as MemoryMonitorSubscription

    loop Every MEMORY_REFRESH_INTERVAL
        Monitor->>System: sample memory stats
        System-->>Monitor: MemorySnapshot
        Monitor->>Monitor: compute_next_blocked(used_percent)
        Monitor->>Watch: send blocked state (if changed)
        Watch-->>Subscriber: broadcast update
    end

    Subscriber->>Subscriber: poll_update()/current_blocked()
Loading
sequenceDiagram
    participant Consumer as Consumer
    participant Stream as BackpressureStream
    participant Inner as EventsStream
    participant Memory as MemoryMonitorSubscription

    Consumer->>Stream: poll_next()
    Stream->>Memory: poll_update(cx)
    alt memory blocked
        Memory-->>Stream: Some(true)
        Stream->>Consumer: Pending
    else memory not blocked
        Memory-->>Stream: None/false
        Stream->>Inner: poll_next()
        Inner-->>Stream: Item / Pending / Done
        Stream-->>Consumer: Item / Pending / Done
    end
Loading

Assessment against linked issues

Objective Addressed Explanation
Stop streaming data from table copies and streaming when memory exceeds threshold [ETL-525]

Out-of-scope changes

Code Change Explanation
Added multiple top-level Cargo dependencies (Cargo.toml) These new deps (tracing-appender, tracing-log, tracing-subscriber, utoipa, utoipa-swagger-ui, uuid, x509-cert, etc.) are unrelated to the single objective of memory-aware processing.
Added workspace dependency/features and tokio-stream and expanded tokio-postgres features (etl/Cargo.toml) Adding tokio-stream and expanding tokio-postgres features is not required by ETL-525 and appears outside the stated objective.

Comment @coderabbitai help to get the list of available commands and usage tips.

@coveralls
Copy link

coveralls commented Feb 17, 2026

Pull Request Test Coverage Report for Build 22140054258

Details

  • 593 of 838 (70.76%) changed or added relevant lines in 16 files are covered.
  • 1884 unchanged lines in 23 files lost coverage.
  • Overall coverage decreased (-5.1%) to 69.115%

Changes Missing Coverage Covered Lines Changed/Added Lines %
etl-examples/src/main.rs 0 1 0.0%
etl/src/pipeline.rs 5 6 83.33%
etl/src/workers/apply.rs 4 7 57.14%
etl/src/replication/stream.rs 2 7 28.57%
etl/src/workers/table_sync.rs 2 7 28.57%
etl-api/src/configs/pipeline.rs 21 33 63.64%
etl/src/replication/apply.rs 26 52 50.0%
etl/src/replication/client.rs 18 47 38.3%
etl/src/concurrency/memory_monitor.rs 165 201 82.09%
etl/src/concurrency/stream.rs 267 303 88.12%
Files with Coverage Reduction New Missed Lines %
etl/src/workers/table_sync_copy.rs 1 69.13%
etl-config/src/shared/destination.rs 14 23.21%
etl/src/metrics.rs 15 60.53%
etl/src/concurrency/stream.rs 18 84.44%
etl-config/src/shared/pipeline.rs 19 34.93%
etl-postgres/src/replication/state.rs 20 71.12%
etl-config/src/shared/connection.rs 21 83.14%
etl-api/src/configs/pipeline.rs 23 72.85%
etl/src/store/both/memory.rs 24 0.0%
etl-postgres/src/replication/slots.rs 31 74.02%
Totals Coverage Status
Change from base Build 22071808522: -5.1%
Covered Lines: 18370
Relevant Lines: 26579

💛 - Coveralls

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@etl/src/concurrency/stream.rs`:
- Around line 47-70: The task stalls because when memory_monitor.poll_update(cx)
yields Ready(...) we set this.paused_for_memory but do not register the waker
for future updates; before returning Poll::Pending you must continue polling
memory_monitor.poll_update(cx) (or loop until it returns Poll::Pending) so the
watch channel registers the current waker. Modify the poll logic around
memory_monitor.poll_update(cx) in the stream's poll method (the match that sets
*this.paused_for_memory and calls this.memory_monitor.current_blocked()) to
consume Ready variants and only stop when poll_update returns Poll::Pending,
updating *this.paused_for_memory on each Ready, and then return Poll::Pending
(so the waker is registered for the next change).
- Around line 190-202: In BatchBackpressureStream's poll implementation, when
*this.paused_for_memory is true and this.items is empty you currently return
Poll::Pending without registering the waker; change the branch so you capture
and store the current task waker (e.g. this.waker = Some(cx.waker().clone()) or
equivalent) before returning Poll::Pending so the stream can be woken when
memory state changes, keeping the existing behavior of flushing when items exist
(the symbols to modify are this.paused_for_memory, this.items, this.reset_timer
and the Poll::Pending return).

Comment on lines 47 to 70
match this.memory_monitor.poll_update(cx) {
Poll::Ready(Some(blocked)) => {
*this.paused_for_memory = blocked;
}
Poll::Ready(None) => {
*this.paused_for_memory = false;
}
Poll::Pending => {
let currently_blocked = this.memory_monitor.current_blocked();
if *this.paused_for_memory != currently_blocked {
*this.paused_for_memory = currently_blocked;
}
}
}

if !was_paused && *this.paused_for_memory {
info!("backpressure active, stream paused");
} else if was_paused && !*this.paused_for_memory {
info!("backpressure released, stream resumed");
}

if *this.paused_for_memory {
return Poll::Pending;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Missing waker registration causes indefinite task stall when backpressure activates.

When poll_update(cx) returns Ready(Some(true)), the waker is not registered with the watch channel because a Ready result was obtained. Returning Poll::Pending on line 69 without a registered waker means the task will never be woken when memory pressure is released.

After receiving a Ready from the watch stream, you must poll again to register for the next update before returning Pending.

Proposed fix
         if *this.paused_for_memory {
+            // Ensure waker is registered for the next state change.
+            let _ = this.memory_monitor.poll_update(cx);
             return Poll::Pending;
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
match this.memory_monitor.poll_update(cx) {
Poll::Ready(Some(blocked)) => {
*this.paused_for_memory = blocked;
}
Poll::Ready(None) => {
*this.paused_for_memory = false;
}
Poll::Pending => {
let currently_blocked = this.memory_monitor.current_blocked();
if *this.paused_for_memory != currently_blocked {
*this.paused_for_memory = currently_blocked;
}
}
}
if !was_paused && *this.paused_for_memory {
info!("backpressure active, stream paused");
} else if was_paused && !*this.paused_for_memory {
info!("backpressure released, stream resumed");
}
if *this.paused_for_memory {
return Poll::Pending;
}
match this.memory_monitor.poll_update(cx) {
Poll::Ready(Some(blocked)) => {
*this.paused_for_memory = blocked;
}
Poll::Ready(None) => {
*this.paused_for_memory = false;
}
Poll::Pending => {
let currently_blocked = this.memory_monitor.current_blocked();
if *this.paused_for_memory != currently_blocked {
*this.paused_for_memory = currently_blocked;
}
}
}
if !was_paused && *this.paused_for_memory {
info!("backpressure active, stream paused");
} else if was_paused && !*this.paused_for_memory {
info!("backpressure released, stream resumed");
}
if *this.paused_for_memory {
// Ensure waker is registered for the next state change.
let _ = this.memory_monitor.poll_update(cx);
return Poll::Pending;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@etl/src/concurrency/stream.rs` around lines 47 - 70, The task stalls because
when memory_monitor.poll_update(cx) yields Ready(...) we set
this.paused_for_memory but do not register the waker for future updates; before
returning Poll::Pending you must continue polling memory_monitor.poll_update(cx)
(or loop until it returns Poll::Pending) so the watch channel registers the
current waker. Modify the poll logic around memory_monitor.poll_update(cx) in
the stream's poll method (the match that sets *this.paused_for_memory and calls
this.memory_monitor.current_blocked()) to consume Ready variants and only stop
when poll_update returns Poll::Pending, updating *this.paused_for_memory on each
Ready, and then return Poll::Pending (so the waker is registered for the next
change).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@etl/src/concurrency/stream.rs`:
- Around line 95-121: The code sets paused_for_memory on Ready from
memory_subscription.poll_update but doesn't register the current waker for
future updates, so returning Poll::Pending can sleep forever; fix by repeatedly
calling memory_subscription.poll_update(cx) in a loop (or otherwise re-invoking
it) until it returns Poll::Pending so the waker is registered for the next
change, updating *this.paused_for_memory on each Ready(Some/None) result; apply
this change to the same polling logic in both BackpressureStream and
BatchBackpressureStream (use the existing memory_subscription.poll_update,
paused_for_memory and was_paused symbols to locate and update the code).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@etl/src/lib.rs`:
- Around line 98-99: Remove the inline code example from the Rust doc comment
that contains the settings memory_backpressure_activate_percentage and
memory_backpressure_resume_percentage; locate the doc comment block in the
module/lib doc (the block that lists those two example lines) and delete the
example lines (or move them to external docs) so the doc comment no longer
contains runnable code while preserving any plain-text descriptions.

---

Duplicate comments:
In `@etl/src/concurrency/stream.rs`:
- Around line 42-72: The stream's poll_next currently can return Poll::Pending
while pausing without ensuring the watch has registered the current task waker;
fix poll_next (and the analogous wrapper at the other location) so that before
returning Poll::Pending when *paused_for_memory is true you repeatedly call
this.memory_subscription.poll_update(cx) until it returns Poll::Pending (or
Ready(None)/unblocked) to allow the watch to register the waker, updating
*paused_for_memory from poll_update results (use
memory_subscription.current_blocked() as fallback) and only then return
Poll::Pending if still blocked.

bnjjj
bnjjj previously approved these changes Feb 18, 2026
#[derive(Clone, Debug, Serialize, Deserialize)]
#[cfg_attr(feature = "utoipa", derive(ToSchema))]
#[serde(rename_all = "snake_case")]
pub struct BatchConfig {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moved it to align with how/where backpressure it written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants