ref(service): Add task spawning with panic isolation to StorageService#322

Merged

jan-auer merged 11 commits intomainfrom

ref/task-spawning

Feb 20, 2026

Member

jan-auer commented Feb 20, 2026 •

edited

Loading

Each StorageService operation now runs in a spawned tokio task, communicating results via a oneshot channel. This gives two guarantees:

run-to-completion (operations finish even if the caller is cancelled, critical for tombstone consistency)
panic isolation (a backend panic yields Error::Panic instead of propagating).

Business logic moved from StorageService to a private StorageServiceInner struct. StorageService is now a thin public API that takes owned values (ObjectId, Metadata), spawns each operation in a separate task, and delegates to the inner type. Existing tests target StorageServiceInner methods directly (same-module access), so they required no changes. New tests exercise the public spawn-based API including panic recovery and receiver-drop completion.

Next steps

Two-tier concurrency limiting: Add a high web-request limit (middleware) and a lower service-operation limit (semaphore in ServiceRunner), replacing the current InFlightRequestsLayer with purpose-specific controls.
Bounded task queue: Replace direct tokio::spawn with a bounded channel dispatching to a fixed worker pool, enabling backpressure, fire-and-forget operations, and priority scheduling.
Cancellation: Use CancellationToken to actively cancel backend operations on client disconnect, with safety invariants protecting tombstone writes from cancellation.

Ref FS-171


          ref(service): Add task spawning with panic isolation to StorageService

e644e79

Public methods now take owned values and spawn each operation in an
isolated task. This ensures run-to-completion on caller cancellation
(critical for tombstone consistency) and prevents backend panics from
propagating.

jan-auer requested a review from a team as a code owner

February 20, 2026 14:51

jan-auer added 4 commits

February 20, 2026 15:52


          Merge branch 'main' into ref/task-spawning

7eee488

* main:
  meta(ai): Improve AGENTS.md development workflow (#321)


          ref(service): Split TaskFailed into Panic and Cancelled error variants

f6ea54d


          ref(service): Simplify catch_unwind match to unwrap_or_else

f11d1be


          ref(service): Move business logic to StorageServiceInner

62e04ab

jan-auer mentioned this pull request

feat: Add backpressure management config for concurrent requests #315

Closed

jan-auer added 2 commits

February 20, 2026 16:32


          ref(service): Use consistent API in tests

54a159a

Integration tests (real backends) now use the public StorageService API.
Unit tests use StorageServiceInner directly.


          ref(service): Rename Cancelled error variant to Dropped

a0ae5b9

jan-auer commented

View reviewed changes

Member Author

jan-auer left a comment

I am considering to split the public and inner types in two modules eventually. For now, they can remain in one.

objectstore-service/src/service.rs

+              /// isolated from panics in backend code — a failure in one operation does not
+              /// bring down other in-flight work. See [`Error::Panic`].
+              #[derive(Clone, Debug)]
+              pub struct StorageService(Arc<StorageServiceInner>);

Member Author

jan-auer Feb 20, 2026

This merely moved so that the struct definition and impl block aren't separated, and another section on task isolation was added.

objectstore-service/src/service.rs

+                  {
+                      let (tx, rx) = tokio::sync::oneshot::channel();
+                      tokio::spawn(async move {
+                          let result = std::panic::AssertUnwindSafe(f)

Member Author

jan-auer Feb 20, 2026

We have a panic handler in requests, too. This one allows the request handler to catch panics individually, such as in the batch endpoint.

objectstore-service/src/service.rs

+                  long_term_backend: BoxedBackend,
+              }
+              impl StorageServiceInner {

Member Author

jan-auer Feb 20, 2026

All actual business logic has been moved to StorageServiceInner now. This avoid an indirection via self.0 in the implementation.

objectstore-service/src/service.rs

                           .unwrap();
-                      let (_metadata, stream) = service.get_object(&key).await.unwrap().unwrap();
+                      let (_metadata, stream) = service.get_object(key).await.unwrap().unwrap();

Member Author

jan-auer Feb 20, 2026

Integration tests use the public API while all lower-level logic tests use the inner service.

lcian reviewed

View reviewed changes

objectstore-server/src/state.rs Show resolved Hide resolved

lcian reviewed

View reviewed changes

objectstore-service/src/service.rs Outdated Show resolved Hide resolved

lcian approved these changes

View reviewed changes

cursor bot reviewed

View reviewed changes

cursor bot left a comment

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

objectstore-service/src/service.rs Show resolved Hide resolved

jan-auer added 4 commits

February 20, 2026 18:54


          ref: Restore original comment

54653f0


          fix(service): Test run-to-completion through the public API

0b77563


          ref: Shorten assertions

85460aa


          meta: Remove outdated comment

b7eb72a

jan-auer merged commit 4f5938f into main

20 checks passed

jan-auer deleted the ref/task-spawning branch

February 20, 2026 18:52

jan-auer mentioned this pull request

feat(service): Add concurrency limit to StorageService #324

Merged

claude bot added the claude-code-assisted label

jan-auer added a commit that referenced this pull request


          feat(service): Add concurrency limit to StorageService (#324)

033cf40

Builds on [#322](#322)
(task spawning with panic isolation).

Adds a semaphore-based concurrency limit to `StorageService` that caps
the total number of in-flight backend operations. When the limit is
reached, new operations are rejected immediately with HTTP 429 rather
than queueing, preventing backend overload during traffic bursts.

The limit is acquired before spawning each operation task and the permit
is held until the task completes (including after panics). Configured
via `service.max_concurrency` (default: 500, effectively unlimited
without configuration).

Includes tests for rejection at capacity and permit release after
panics.

linear bot commented Feb 23, 2026

FS-171 Implement backpressure

jan-auer mentioned this pull request

feat(service): Add stream-level cancellation for inserts #331

Draft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-code-assisted