Skip to content

fix: use CancellationToken instead of JoinHandle for long running provider task to support cloning#882

Merged
ayushjain17 merged 3 commits intomainfrom
fix/provider-cloning
Feb 18, 2026
Merged

fix: use CancellationToken instead of JoinHandle for long running provider task to support cloning#882
ayushjain17 merged 3 commits intomainfrom
fix/provider-cloning

Conversation

@ayushjain17
Copy link
Copy Markdown
Collaborator

@ayushjain17 ayushjain17 commented Feb 18, 2026

Problem

tokio's JoinHandle is not cloneable

Solution

use CancellationToken for closing the polling task, which is cloneable

Summary by CodeRabbit

  • Chores

    • Updated project dependencies and aligned with workspace configurations.
  • Refactor

    • Improved internal polling task management for more efficient resource cleanup.
    • Enhanced error handling in configuration polling operations.
  • Tests

    • Added integration test to verify provider behavior consistency.

@ayushjain17 ayushjain17 requested a review from a team as a code owner February 18, 2026 07:40
@semanticdiff-com
Copy link
Copy Markdown

semanticdiff-com bot commented Feb 18, 2026

Review changes with  SemanticDiff

Changed Files
File Status
  crates/superposition_provider/src/client.rs  66% smaller
  crates/superposition_provider/src/provider.rs  50% smaller
  crates/superposition_provider/tests/integration_test.rs  7% smaller
  Cargo.lock Unsupported file format
  crates/superposition_provider/Cargo.toml Unsupported file format

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 18, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

The PR refactors polling task management to use CancellationToken-based graceful shutdown instead of JoinHandle abortion. Key structs (CacConfig, ExperimentationConfig, SuperpositionProvider) now derive Clone and become shareable in async contexts. Public re-exports for OpenFeature types are introduced, with integration tests verifying cloned provider functionality.

Changes

Cohort / File(s) Summary
Dependency Management
crates/superposition_provider/Cargo.toml
Reorganized dependencies by adding aws-smithy-types, chrono, tokio-util, uuid and adjusting workspace-level references. Removed async-trait and SDK scaffolding blocks.
Polling Task Lifecycle Refactoring
crates/superposition_provider/src/client.rs
Replaced JoinHandle-based polling task abortion with CancellationToken-based graceful cancellation. Modified start_polling to return CancellationToken instead of JoinHandle. Updated close() methods across CacConfig and ExperimentationConfig to cancel tokens. Added Clone derives to config structs and introduced tokio::select! for cancellation handling.
Public Type Re-exports & Cloneable Provider
crates/superposition_provider/src/provider.rs
Wrapped status field in Arc to enable safe sharing across clones. Added Clone derive to SuperpositionProvider struct. Updated initialization to use Arc::new(RwLock::new(...)).
Provider Cloning Integration Test
crates/superposition_provider/tests/integration_test.rs
Added new integration test verifying that cloned SuperpositionProvider instances preserve state and behavior through resolve_full_config calls.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • knutties

Poem

🐰 Hops with delight
Tokens now cancel with grace, no more joins to abort,
Clones share the burden, distributed across async courts,
Configuration flows smoothly through Arc-wrapped gates,
A refactored dream where cancellation awaits! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: replacing JoinHandle with CancellationToken to enable provider cloning, which is the core objective addressed across the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/provider-cloning

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@knutties knutties changed the title fix: Support provider cloning in rust fix: use CancellationToken instead of JoinHandle for long running provider task to support cloning Feb 18, 2026
&Value::from("Rupee"),
"Currency should be default Rupee"
);
println!(" ✓ Test passed\n");
Copy link
Copy Markdown
Collaborator

@knutties knutties Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayushjain17 - Do we want to clone in scope and drop it and then try if we can resolve using the original object or we just try the implementation details of CancellationToken?

@Datron had this concern.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/superposition_provider/src/client.rs (2)

407-450: ⚠️ Potential issue | 🔴 Critical

Same pre-existing deadlock in ExperimentationConfig::on_demand_config

Line 413 acquires self.last_updated.read().await and holds it for the lifetime of the function. Line 433 then calls self.last_updated.write().await on the same RwLock in the same task — same deadlock as in CacConfig::on_demand_config above.

🐛 Proposed fix
-    let last_updated = self.last_updated.read().await;
-
-    let should_refresh = match *last_updated {
-        Some(last) => (now - last).num_seconds() > ttl as i64,
-        None => true,
-    };
+    let should_refresh = {
+        let last_updated = self.last_updated.read().await;
+        match *last_updated {
+            Some(last) => (now - last).num_seconds() > ttl as i64,
+            None => true,
+        }
+    }; // guard dropped here
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/superposition_provider/src/client.rs` around lines 407 - 450, The
function on_demand_config currently holds self.last_updated.read().await across
await points which later try to acquire self.last_updated.write().await, causing
a deadlock; change the code so you only hold the read lock long enough to
compute should_refresh (e.g. create a short-lived scope or explicitly
drop(last_updated) after computing should_refresh) before calling
join!(self.get_experiments(...), self.get_experiment_groups(...)); keep the
subsequent write lock use when updating self.cached_experiments,
self.cached_experiment_groups, and setting self.last_updated as-is but ensure
those writes happen after the read lock has been released so no read->write lock
escalation occurs in on_demand_config.

138-178: ⚠️ Potential issue | 🔴 Critical

Pre-existing deadlock in on_demand_config: read lock held while acquiring write lock

The RwLockReadGuard stored in last_updated is declared outside the inner {} block, so it is not dropped when that block ends — it lives until the end of the function. When should_refresh is true, self.last_updated.write().await is called (Line 155) while the read lock is still held by the same task. tokio::sync::RwLock does not allow a read-to-write upgrade: the writer waits for all readers to release, but this task holds a read lock it will never release → deadlock.

This affects any OnDemandStrategy user once the TTL first expires.

🐛 Proposed fix — drop the guard before the write path
-    let last_updated;
-    {
-        last_updated = self.last_updated.read().await;
-    }
-    let should_refresh = match *last_updated {
-        Some(last) => (now - last).num_seconds() > ttl as i64,
-        None => true,
-    };
+    let should_refresh = {
+        let last_updated = self.last_updated.read().await;
+        match *last_updated {
+            Some(last) => (now - last).num_seconds() > ttl as i64,
+            None => true,
+        }
+    }; // guard dropped here
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/superposition_provider/src/client.rs` around lines 138 - 178, The read
guard for self.last_updated is accidentally held for the entire function causing
a deadlock when later calling self.last_updated.write().await; to fix, ensure
the RwLockReadGuard is dropped before any write by narrowing its scope or by
copying the inner value into a local (e.g., read the Option<DateTime<Utc>> into
a local like last_updated_val inside the small block) and then use that local to
compute should_refresh; keep the existing logic in on_demand_config (including
the get_config call, updating self.cached_config, and writing self.last_updated)
but remove any lingering read guard before calling
self.last_updated.write().await so no read lock is held during the write.
🧹 Nitpick comments (3)
crates/superposition_provider/Cargo.toml (2)

14-14: aws-smithy-types is only used in tests — move it to [dev-dependencies]

aws-smithy-types is only used in tests/integration_test.rs (for Document construction). Keeping it in [dependencies] bloats the transitive dependency graph for downstream consumers of this library.

♻️ Proposed fix
-aws-smithy-types = { version = "1.3.0" }

Add a [dev-dependencies] section (or extend the existing one):

[dev-dependencies]
aws-smithy-types = { version = "1.3.0" }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/superposition_provider/Cargo.toml` at line 14, The dependency
aws-smithy-types is only used by tests (tests/integration_test.rs) but is
declared in [dependencies], increasing downstream bloat; remove the
aws-smithy-types = { version = "1.3.0" } entry from the [dependencies] section
in Cargo.toml and add it under [dev-dependencies] instead (or create a
[dev-dependencies] section if missing) so the crate is only pulled in for test
builds.

25-25: Avoid features = ["full"] in a library crate

features = ["full"] enables ~15 tokio features (multi-thread scheduler, net, signal, fs, process, …) that this library doesn't use, unnecessarily inflating compile times and the binary footprint of every downstream consumer. Library crates should declare only the features they actually need.

♻️ Proposed minimal feature set
-tokio = { version = "1.0", features = ["full"] }
+tokio = { version = "1.0", features = ["rt", "sync", "time", "macros"] }

Adjust based on the actual tokio APIs used (e.g., add "rt-multi-thread" if #[tokio::test] / #[tokio::main] with multi-thread is needed in tests/examples).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/superposition_provider/Cargo.toml` at line 25, The tokio dependency
currently enables the broad featureset via the line tokio = { version = "1.0",
features = ["full"] } in Cargo.toml; replace this with a minimal feature list
that matches only the tokio APIs your crate actually uses (e.g., "rt" or
"rt-multi-thread" for runtime in tests/examples, "macros" if using
#[tokio::test]/#[tokio::main], plus "sync", "net", "time", etc. as required).
Edit the tokio entry in Cargo.toml to remove "full" and enumerate the exact
features needed by this crate and its tests/examples to reduce compile time and
downstream footprint.
crates/superposition_provider/src/client.rs (1)

5-8: Consider moving these re-exports to lib.rs instead of client.rs

Placing pub use open_feature::{...} inside an implementation module leaks the location of these symbols (crate::client::ProviderStatus, etc.) in diagnostic output and makes the crate's public API harder to navigate. Re-exports meant to be part of the crate's surface should live in lib.rs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/superposition_provider/src/client.rs` around lines 5 - 8, Move the
public re-exports of ProviderMetadata, ProviderStatus, ResolutionDetails and
EvaluationContext out of client.rs and into your crate root (lib.rs) so they are
exposed as crate-level symbols instead of crate::client::...; remove the pub use
open_feature::{ provider::{ProviderMetadata, ProviderStatus, ResolutionDetails},
EvaluationContext,}; from client.rs and add an equivalent pub use ... block in
lib.rs to make these types part of the crate's public surface.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/superposition_provider/src/client.rs`:
- Around line 569-582: In close(), after clearing cached_experiments, also clear
cached_experiment_groups to avoid leaving stale data: acquire a write lock on
self.cached_experiment_groups (similar to how self.cached_experiments is
handled) and set it to None so that get_cached_experiment_groups() and
get_applicable_variants() see a consistent cleared state; ensure you follow the
same async locking pattern as used for polling_task_cancellation_token and
cached_experiments within the close() method.

In `@crates/superposition_provider/src/provider.rs`:
- Around line 22-28: The struct currently derives Clone causing all
SuperpositionProvider clones to share the same Arc<RwLock<ProviderStatus>>, so
calling initialize() on any clone mutates the shared status; to make status
independent implement Clone manually for SuperpositionProvider: remove or
override the derived Clone and write impl Clone for SuperpositionProvider that
clones metadata, cac_config, and exp_config but creates a new status with
Arc::new(RwLock::new(ProviderStatus::NotReady)) (i.e., fresh RwLock per clone)
so each clone has its own readiness state when initialize() runs.

In `@crates/superposition_provider/tests/integration_test.rs`:
- Around line 587-591: The assertion compares a float Value::from(1.0) against a
JSON integer created earlier with Document::from(1), which will fail because
serde_json Number variants are sensitive; change the expected value to match the
integer variant (e.g., use Value::from(1) or json!(1)) so the assertion in the
test comparing all_fields.get("price") with the expected Value uses the same
numeric variant as the override (update the assertion that currently references
Value::from(1.0) to Value::from(1)).

---

Outside diff comments:
In `@crates/superposition_provider/src/client.rs`:
- Around line 407-450: The function on_demand_config currently holds
self.last_updated.read().await across await points which later try to acquire
self.last_updated.write().await, causing a deadlock; change the code so you only
hold the read lock long enough to compute should_refresh (e.g. create a
short-lived scope or explicitly drop(last_updated) after computing
should_refresh) before calling join!(self.get_experiments(...),
self.get_experiment_groups(...)); keep the subsequent write lock use when
updating self.cached_experiments, self.cached_experiment_groups, and setting
self.last_updated as-is but ensure those writes happen after the read lock has
been released so no read->write lock escalation occurs in on_demand_config.
- Around line 138-178: The read guard for self.last_updated is accidentally held
for the entire function causing a deadlock when later calling
self.last_updated.write().await; to fix, ensure the RwLockReadGuard is dropped
before any write by narrowing its scope or by copying the inner value into a
local (e.g., read the Option<DateTime<Utc>> into a local like last_updated_val
inside the small block) and then use that local to compute should_refresh; keep
the existing logic in on_demand_config (including the get_config call, updating
self.cached_config, and writing self.last_updated) but remove any lingering read
guard before calling self.last_updated.write().await so no read lock is held
during the write.

---

Nitpick comments:
In `@crates/superposition_provider/Cargo.toml`:
- Line 14: The dependency aws-smithy-types is only used by tests
(tests/integration_test.rs) but is declared in [dependencies], increasing
downstream bloat; remove the aws-smithy-types = { version = "1.3.0" } entry from
the [dependencies] section in Cargo.toml and add it under [dev-dependencies]
instead (or create a [dev-dependencies] section if missing) so the crate is only
pulled in for test builds.
- Line 25: The tokio dependency currently enables the broad featureset via the
line tokio = { version = "1.0", features = ["full"] } in Cargo.toml; replace
this with a minimal feature list that matches only the tokio APIs your crate
actually uses (e.g., "rt" or "rt-multi-thread" for runtime in tests/examples,
"macros" if using #[tokio::test]/#[tokio::main], plus "sync", "net", "time",
etc. as required). Edit the tokio entry in Cargo.toml to remove "full" and
enumerate the exact features needed by this crate and its tests/examples to
reduce compile time and downstream footprint.

In `@crates/superposition_provider/src/client.rs`:
- Around line 5-8: Move the public re-exports of ProviderMetadata,
ProviderStatus, ResolutionDetails and EvaluationContext out of client.rs and
into your crate root (lib.rs) so they are exposed as crate-level symbols instead
of crate::client::...; remove the pub use open_feature::{
provider::{ProviderMetadata, ProviderStatus, ResolutionDetails},
EvaluationContext,}; from client.rs and add an equivalent pub use ... block in
lib.rs to make these types part of the crate's public surface.

Comment on lines 569 to 582
pub async fn close(&self) -> Result<()> {
// Stop polling task
let mut polling_task = self.polling_task.write().await;
if let Some(task) = polling_task.take() {
task.abort();
let mut polling_task_cancellation_token =
self.polling_task_cancellation_token.write().await;
if let Some(token) = polling_task_cancellation_token.take() {
token.cancel();
}

// Clear caches
let mut cached_experiments = self.cached_experiments.write().await;
*cached_experiments = None;
let mut evaluation_cache = self.evaluation_cache.write().await;
evaluation_cache.clear();

Ok(())
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

close() leaves cached_experiment_groups populated — inconsistent cleanup

cached_experiments is set to None (Line 578–579) but cached_experiment_groups is never cleared. After close(), any reader calling get_cached_experiment_groups() will still receive stale data, and get_applicable_variants() will see a None/Some mismatch that routes to the error branch.

🐛 Proposed fix
     // Clear caches
     let mut cached_experiments = self.cached_experiments.write().await;
     *cached_experiments = None;
+    let mut cached_experiment_groups = self.cached_experiment_groups.write().await;
+    *cached_experiment_groups = None;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub async fn close(&self) -> Result<()> {
// Stop polling task
let mut polling_task = self.polling_task.write().await;
if let Some(task) = polling_task.take() {
task.abort();
let mut polling_task_cancellation_token =
self.polling_task_cancellation_token.write().await;
if let Some(token) = polling_task_cancellation_token.take() {
token.cancel();
}
// Clear caches
let mut cached_experiments = self.cached_experiments.write().await;
*cached_experiments = None;
let mut evaluation_cache = self.evaluation_cache.write().await;
evaluation_cache.clear();
Ok(())
}
pub async fn close(&self) -> Result<()> {
// Stop polling task
let mut polling_task_cancellation_token =
self.polling_task_cancellation_token.write().await;
if let Some(token) = polling_task_cancellation_token.take() {
token.cancel();
}
// Clear caches
let mut cached_experiments = self.cached_experiments.write().await;
*cached_experiments = None;
let mut cached_experiment_groups = self.cached_experiment_groups.write().await;
*cached_experiment_groups = None;
Ok(())
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/superposition_provider/src/client.rs` around lines 569 - 582, In
close(), after clearing cached_experiments, also clear cached_experiment_groups
to avoid leaving stale data: acquire a write lock on
self.cached_experiment_groups (similar to how self.cached_experiments is
handled) and set it to None so that get_cached_experiment_groups() and
get_applicable_variants() see a consistent cleared state; ensure you follow the
same async locking pattern as used for polling_task_cancellation_token and
cached_experiments within the close() method.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai ignore here
supposed to be fixed in #881

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if PR `#881` exists and get its details
gh pr view 881 --json title,url,state,body

Length of output: 1123


@ayushjain17 Understood! I can see that PR #881 is open and will handle the provider implementation changes. I'll skip this comment for now since it'll be addressed there.

@ayushjain17 ayushjain17 added this pull request to the merge queue Feb 18, 2026
Merged via the queue into main with commit 93ce1b3 Feb 18, 2026
27 of 28 checks passed
@ayushjain17 ayushjain17 deleted the fix/provider-cloning branch February 18, 2026 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants