Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@

- To prevent false positives, non-public email addresses (e.g. `user@localhost`) are no longer scrubbed by default. ([#5737](https://github.com/getsentry/relay/pull/5737))

**Features**:

- Envelope buffer: Add option to disable flush-to-disk on shutdown. ([#5751](https://github.com/getsentry/relay/pull/5751))


**Internal**:

- Calculate and track accepted bytes per individual trace metric item via `TraceMetricByte` data category. ([#5744](https://github.com/getsentry/relay/pull/5744))
Expand Down
14 changes: 14 additions & 0 deletions relay-config/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1024,6 +1024,14 @@ pub struct EnvelopeSpool {
/// Defaults to 1.
#[serde(default = "spool_envelopes_partitions")]
pub partitions: NonZeroU8,
/// Whether the database defined in `path` is on an ephemeral storage disk.
///
/// With `ephemeral: true`, Relay does not spool in-flight data to disk
/// during graceful shutdown. Instead, it attempts to process all data before it terminates.
Comment thread
jjbayer marked this conversation as resolved.
///
/// Defaults to `false`.
#[serde(default)]
pub ephemeral: bool,
}

impl Default for EnvelopeSpool {
Expand All @@ -1036,6 +1044,7 @@ impl Default for EnvelopeSpool {
disk_usage_refresh_frequency_ms: spool_disk_usage_refresh_frequency_ms(),
max_backpressure_memory_percent: spool_max_backpressure_memory_percent(),
partitions: spool_envelopes_partitions(),
ephemeral: false,
}
}
}
Expand Down Expand Up @@ -2347,6 +2356,11 @@ impl Config {
self.values.spool.envelopes.partitions
}

/// Returns `true` if the data is stored on ephemeral disks.
pub fn spool_ephemeral(&self) -> bool {
self.values.spool.envelopes.ephemeral
}

/// Returns the maximum size of an event payload in bytes.
pub fn max_event_size(&self) -> usize {
self.values.limits.max_event_size.as_bytes()
Expand Down
21 changes: 12 additions & 9 deletions relay-server/src/services/buffer/envelope_buffer/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ impl PolymorphicEnvelopeBuffer {
/// Returns true if the implementation stores all envelopes in RAM.
pub fn is_memory(&self) -> bool {
match self {
PolymorphicEnvelopeBuffer::InMemory(_) => true,
PolymorphicEnvelopeBuffer::Sqlite(_) => false,
Self::InMemory(_) => true,
Self::Sqlite(_) => false,
}
}

Expand Down Expand Up @@ -183,13 +183,16 @@ impl PolymorphicEnvelopeBuffer {
// Currently, we want to flush the buffer only for disk, since the in memory implementation
// tries to not do anything and pop as many elements as possible within the shutdown
// timeout.
let Self::Sqlite(buffer) = self else {
relay_log::trace!("PolymorphicEnvelopeBuffer: shutdown procedure not needed");
return false;
};
buffer.flush().await;

true
match self {
Self::Sqlite(buffer) if !buffer.stack_provider.ephemeral() => {
buffer.flush().await;
true
}
_ => {
relay_log::trace!("shutdown procedure not needed");
false
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ephemeral SQLite blocked by memory backpressure during shutdown

Medium Severity

When an ephemeral SQLite buffer returns false from shutdown(), the service loop continues to drain envelopes — matching the in-memory behavior. However, system_ready still gates unspooling on buffer.is_memory() || self.memory_ready(), and is_memory() returns false for all SQLite variants including ephemeral. Under high memory pressure (>80%), the ephemeral buffer gets stuck waiting indefinitely in system_ready, unlike the in-memory buffer which always bypasses that check. Data on ephemeral storage would then be lost when the process is eventually killed.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended behavior. The memory implementation bypasses the memory check because it would otherwise get stuck -- it cannot lower memory pressure without unspooling & processing envelopes. The disk spooler, even the ephemeral one, can lower memory pressure by pausing the dequeue.

}

/// Returns the partition tag for this [`PolymorphicEnvelopeBuffer`].
Expand Down
7 changes: 7 additions & 0 deletions relay-server/src/services/buffer/stack_provider/sqlite.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ pub struct SqliteStackProvider {
batch_size_bytes: usize,
max_disk_size: usize,
partition_id: u8,
ephemeral: bool,
}

#[warn(dead_code)]
Expand All @@ -31,9 +32,15 @@ impl SqliteStackProvider {
batch_size_bytes: config.spool_envelopes_batch_size_bytes(),
max_disk_size: config.spool_envelopes_max_disk_size(),
partition_id,
ephemeral: config.spool_ephemeral(),
})
}

/// Returns `true` if data is stored on non-persistent disks.
pub fn ephemeral(&self) -> bool {
self.ephemeral
}

/// Returns `true` when there might be data residing on disk, `false` otherwise.
fn assume_data_on_disk(stack_creation_type: StackCreationType) -> bool {
matches!(stack_creation_type, StackCreationType::Initialization)
Expand Down
22 changes: 15 additions & 7 deletions tests/integration/test_basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@
from requests import HTTPError


def test_graceful_shutdown_with_in_memory_buffer(mini_sentry, relay):
@pytest.mark.parametrize("backend", ["memory", "disk"])
def test_graceful_shutdown_with_ephemeral_buffer(mini_sentry, relay, backend):
from time import sleep

get_project_config_original = mini_sentry.app.view_functions["get_project_config"]
Expand All @@ -24,14 +25,21 @@ def get_project_config():
project_id = 42
mini_sentry.add_basic_project_config(project_id)

relay = relay(
mini_sentry,
{"limits": {"shutdown_timeout": 2}},
)
with tempfile.TemporaryDirectory() as db_dir:
db_file_path = (
os.path.join(db_dir, "database.db") if backend == "disk" else None
)
relay = relay(
mini_sentry,
{
"limits": {"shutdown_timeout": 5},
"spool": {"envelopes": {"path": db_file_path, "ephemeral": True}},
},
)

relay.send_event(project_id)
relay.send_event(project_id)

relay.shutdown(sig=signal.SIGTERM)
relay.shutdown(sig=signal.SIGTERM)

# When using the memory envelope buffer, we optimistically do not do anything on shutdown, which means that the
# buffer will try and pop as always as long as it can (within the shutdown timeout).
Expand Down
Loading