fix: use CancellationToken instead of JoinHandle for long running provider task to support cloning#882
Conversation
Changed Files
|
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThe PR refactors polling task management to use CancellationToken-based graceful shutdown instead of JoinHandle abortion. Key structs (CacConfig, ExperimentationConfig, SuperpositionProvider) now derive Clone and become shareable in async contexts. Public re-exports for OpenFeature types are introduced, with integration tests verifying cloned provider functionality. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
a01cddb to
34efdec
Compare
34efdec to
6d3e7e4
Compare
| &Value::from("Rupee"), | ||
| "Currency should be default Rupee" | ||
| ); | ||
| println!(" ✓ Test passed\n"); |
There was a problem hiding this comment.
@ayushjain17 - Do we want to clone in scope and drop it and then try if we can resolve using the original object or we just try the implementation details of CancellationToken?
@Datron had this concern.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/superposition_provider/src/client.rs (2)
407-450:⚠️ Potential issue | 🔴 CriticalSame pre-existing deadlock in
ExperimentationConfig::on_demand_configLine 413 acquires
self.last_updated.read().awaitand holds it for the lifetime of the function. Line 433 then callsself.last_updated.write().awaiton the sameRwLockin the same task — same deadlock as inCacConfig::on_demand_configabove.🐛 Proposed fix
- let last_updated = self.last_updated.read().await; - - let should_refresh = match *last_updated { - Some(last) => (now - last).num_seconds() > ttl as i64, - None => true, - }; + let should_refresh = { + let last_updated = self.last_updated.read().await; + match *last_updated { + Some(last) => (now - last).num_seconds() > ttl as i64, + None => true, + } + }; // guard dropped here🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/superposition_provider/src/client.rs` around lines 407 - 450, The function on_demand_config currently holds self.last_updated.read().await across await points which later try to acquire self.last_updated.write().await, causing a deadlock; change the code so you only hold the read lock long enough to compute should_refresh (e.g. create a short-lived scope or explicitly drop(last_updated) after computing should_refresh) before calling join!(self.get_experiments(...), self.get_experiment_groups(...)); keep the subsequent write lock use when updating self.cached_experiments, self.cached_experiment_groups, and setting self.last_updated as-is but ensure those writes happen after the read lock has been released so no read->write lock escalation occurs in on_demand_config.
138-178:⚠️ Potential issue | 🔴 CriticalPre-existing deadlock in
on_demand_config: read lock held while acquiring write lockThe
RwLockReadGuardstored inlast_updatedis declared outside the inner{}block, so it is not dropped when that block ends — it lives until the end of the function. Whenshould_refreshistrue,self.last_updated.write().awaitis called (Line 155) while the read lock is still held by the same task.tokio::sync::RwLockdoes not allow a read-to-write upgrade: the writer waits for all readers to release, but this task holds a read lock it will never release → deadlock.This affects any
OnDemandStrategyuser once the TTL first expires.🐛 Proposed fix — drop the guard before the write path
- let last_updated; - { - last_updated = self.last_updated.read().await; - } - let should_refresh = match *last_updated { - Some(last) => (now - last).num_seconds() > ttl as i64, - None => true, - }; + let should_refresh = { + let last_updated = self.last_updated.read().await; + match *last_updated { + Some(last) => (now - last).num_seconds() > ttl as i64, + None => true, + } + }; // guard dropped here🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/superposition_provider/src/client.rs` around lines 138 - 178, The read guard for self.last_updated is accidentally held for the entire function causing a deadlock when later calling self.last_updated.write().await; to fix, ensure the RwLockReadGuard is dropped before any write by narrowing its scope or by copying the inner value into a local (e.g., read the Option<DateTime<Utc>> into a local like last_updated_val inside the small block) and then use that local to compute should_refresh; keep the existing logic in on_demand_config (including the get_config call, updating self.cached_config, and writing self.last_updated) but remove any lingering read guard before calling self.last_updated.write().await so no read lock is held during the write.
🧹 Nitpick comments (3)
crates/superposition_provider/Cargo.toml (2)
14-14:aws-smithy-typesis only used in tests — move it to[dev-dependencies]
aws-smithy-typesis only used intests/integration_test.rs(forDocumentconstruction). Keeping it in[dependencies]bloats the transitive dependency graph for downstream consumers of this library.♻️ Proposed fix
-aws-smithy-types = { version = "1.3.0" }Add a
[dev-dependencies]section (or extend the existing one):[dev-dependencies] aws-smithy-types = { version = "1.3.0" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/superposition_provider/Cargo.toml` at line 14, The dependency aws-smithy-types is only used by tests (tests/integration_test.rs) but is declared in [dependencies], increasing downstream bloat; remove the aws-smithy-types = { version = "1.3.0" } entry from the [dependencies] section in Cargo.toml and add it under [dev-dependencies] instead (or create a [dev-dependencies] section if missing) so the crate is only pulled in for test builds.
25-25: Avoidfeatures = ["full"]in a library crate
features = ["full"]enables ~15 tokio features (multi-thread scheduler, net, signal, fs, process, …) that this library doesn't use, unnecessarily inflating compile times and the binary footprint of every downstream consumer. Library crates should declare only the features they actually need.♻️ Proposed minimal feature set
-tokio = { version = "1.0", features = ["full"] } +tokio = { version = "1.0", features = ["rt", "sync", "time", "macros"] }Adjust based on the actual tokio APIs used (e.g., add
"rt-multi-thread"if#[tokio::test]/#[tokio::main]with multi-thread is needed in tests/examples).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/superposition_provider/Cargo.toml` at line 25, The tokio dependency currently enables the broad featureset via the line tokio = { version = "1.0", features = ["full"] } in Cargo.toml; replace this with a minimal feature list that matches only the tokio APIs your crate actually uses (e.g., "rt" or "rt-multi-thread" for runtime in tests/examples, "macros" if using #[tokio::test]/#[tokio::main], plus "sync", "net", "time", etc. as required). Edit the tokio entry in Cargo.toml to remove "full" and enumerate the exact features needed by this crate and its tests/examples to reduce compile time and downstream footprint.crates/superposition_provider/src/client.rs (1)
5-8: Consider moving these re-exports tolib.rsinstead ofclient.rsPlacing
pub use open_feature::{...}inside an implementation module leaks the location of these symbols (crate::client::ProviderStatus, etc.) in diagnostic output and makes the crate's public API harder to navigate. Re-exports meant to be part of the crate's surface should live inlib.rs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/superposition_provider/src/client.rs` around lines 5 - 8, Move the public re-exports of ProviderMetadata, ProviderStatus, ResolutionDetails and EvaluationContext out of client.rs and into your crate root (lib.rs) so they are exposed as crate-level symbols instead of crate::client::...; remove the pub use open_feature::{ provider::{ProviderMetadata, ProviderStatus, ResolutionDetails}, EvaluationContext,}; from client.rs and add an equivalent pub use ... block in lib.rs to make these types part of the crate's public surface.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/superposition_provider/src/client.rs`:
- Around line 569-582: In close(), after clearing cached_experiments, also clear
cached_experiment_groups to avoid leaving stale data: acquire a write lock on
self.cached_experiment_groups (similar to how self.cached_experiments is
handled) and set it to None so that get_cached_experiment_groups() and
get_applicable_variants() see a consistent cleared state; ensure you follow the
same async locking pattern as used for polling_task_cancellation_token and
cached_experiments within the close() method.
In `@crates/superposition_provider/src/provider.rs`:
- Around line 22-28: The struct currently derives Clone causing all
SuperpositionProvider clones to share the same Arc<RwLock<ProviderStatus>>, so
calling initialize() on any clone mutates the shared status; to make status
independent implement Clone manually for SuperpositionProvider: remove or
override the derived Clone and write impl Clone for SuperpositionProvider that
clones metadata, cac_config, and exp_config but creates a new status with
Arc::new(RwLock::new(ProviderStatus::NotReady)) (i.e., fresh RwLock per clone)
so each clone has its own readiness state when initialize() runs.
In `@crates/superposition_provider/tests/integration_test.rs`:
- Around line 587-591: The assertion compares a float Value::from(1.0) against a
JSON integer created earlier with Document::from(1), which will fail because
serde_json Number variants are sensitive; change the expected value to match the
integer variant (e.g., use Value::from(1) or json!(1)) so the assertion in the
test comparing all_fields.get("price") with the expected Value uses the same
numeric variant as the override (update the assertion that currently references
Value::from(1.0) to Value::from(1)).
---
Outside diff comments:
In `@crates/superposition_provider/src/client.rs`:
- Around line 407-450: The function on_demand_config currently holds
self.last_updated.read().await across await points which later try to acquire
self.last_updated.write().await, causing a deadlock; change the code so you only
hold the read lock long enough to compute should_refresh (e.g. create a
short-lived scope or explicitly drop(last_updated) after computing
should_refresh) before calling join!(self.get_experiments(...),
self.get_experiment_groups(...)); keep the subsequent write lock use when
updating self.cached_experiments, self.cached_experiment_groups, and setting
self.last_updated as-is but ensure those writes happen after the read lock has
been released so no read->write lock escalation occurs in on_demand_config.
- Around line 138-178: The read guard for self.last_updated is accidentally held
for the entire function causing a deadlock when later calling
self.last_updated.write().await; to fix, ensure the RwLockReadGuard is dropped
before any write by narrowing its scope or by copying the inner value into a
local (e.g., read the Option<DateTime<Utc>> into a local like last_updated_val
inside the small block) and then use that local to compute should_refresh; keep
the existing logic in on_demand_config (including the get_config call, updating
self.cached_config, and writing self.last_updated) but remove any lingering read
guard before calling self.last_updated.write().await so no read lock is held
during the write.
---
Nitpick comments:
In `@crates/superposition_provider/Cargo.toml`:
- Line 14: The dependency aws-smithy-types is only used by tests
(tests/integration_test.rs) but is declared in [dependencies], increasing
downstream bloat; remove the aws-smithy-types = { version = "1.3.0" } entry from
the [dependencies] section in Cargo.toml and add it under [dev-dependencies]
instead (or create a [dev-dependencies] section if missing) so the crate is only
pulled in for test builds.
- Line 25: The tokio dependency currently enables the broad featureset via the
line tokio = { version = "1.0", features = ["full"] } in Cargo.toml; replace
this with a minimal feature list that matches only the tokio APIs your crate
actually uses (e.g., "rt" or "rt-multi-thread" for runtime in tests/examples,
"macros" if using #[tokio::test]/#[tokio::main], plus "sync", "net", "time",
etc. as required). Edit the tokio entry in Cargo.toml to remove "full" and
enumerate the exact features needed by this crate and its tests/examples to
reduce compile time and downstream footprint.
In `@crates/superposition_provider/src/client.rs`:
- Around line 5-8: Move the public re-exports of ProviderMetadata,
ProviderStatus, ResolutionDetails and EvaluationContext out of client.rs and
into your crate root (lib.rs) so they are exposed as crate-level symbols instead
of crate::client::...; remove the pub use open_feature::{
provider::{ProviderMetadata, ProviderStatus, ResolutionDetails},
EvaluationContext,}; from client.rs and add an equivalent pub use ... block in
lib.rs to make these types part of the crate's public surface.
| pub async fn close(&self) -> Result<()> { | ||
| // Stop polling task | ||
| let mut polling_task = self.polling_task.write().await; | ||
| if let Some(task) = polling_task.take() { | ||
| task.abort(); | ||
| let mut polling_task_cancellation_token = | ||
| self.polling_task_cancellation_token.write().await; | ||
| if let Some(token) = polling_task_cancellation_token.take() { | ||
| token.cancel(); | ||
| } | ||
|
|
||
| // Clear caches | ||
| let mut cached_experiments = self.cached_experiments.write().await; | ||
| *cached_experiments = None; | ||
| let mut evaluation_cache = self.evaluation_cache.write().await; | ||
| evaluation_cache.clear(); | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
close() leaves cached_experiment_groups populated — inconsistent cleanup
cached_experiments is set to None (Line 578–579) but cached_experiment_groups is never cleared. After close(), any reader calling get_cached_experiment_groups() will still receive stale data, and get_applicable_variants() will see a None/Some mismatch that routes to the error branch.
🐛 Proposed fix
// Clear caches
let mut cached_experiments = self.cached_experiments.write().await;
*cached_experiments = None;
+ let mut cached_experiment_groups = self.cached_experiment_groups.write().await;
+ *cached_experiment_groups = None;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub async fn close(&self) -> Result<()> { | |
| // Stop polling task | |
| let mut polling_task = self.polling_task.write().await; | |
| if let Some(task) = polling_task.take() { | |
| task.abort(); | |
| let mut polling_task_cancellation_token = | |
| self.polling_task_cancellation_token.write().await; | |
| if let Some(token) = polling_task_cancellation_token.take() { | |
| token.cancel(); | |
| } | |
| // Clear caches | |
| let mut cached_experiments = self.cached_experiments.write().await; | |
| *cached_experiments = None; | |
| let mut evaluation_cache = self.evaluation_cache.write().await; | |
| evaluation_cache.clear(); | |
| Ok(()) | |
| } | |
| pub async fn close(&self) -> Result<()> { | |
| // Stop polling task | |
| let mut polling_task_cancellation_token = | |
| self.polling_task_cancellation_token.write().await; | |
| if let Some(token) = polling_task_cancellation_token.take() { | |
| token.cancel(); | |
| } | |
| // Clear caches | |
| let mut cached_experiments = self.cached_experiments.write().await; | |
| *cached_experiments = None; | |
| let mut cached_experiment_groups = self.cached_experiment_groups.write().await; | |
| *cached_experiment_groups = None; | |
| Ok(()) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/superposition_provider/src/client.rs` around lines 569 - 582, In
close(), after clearing cached_experiments, also clear cached_experiment_groups
to avoid leaving stale data: acquire a write lock on
self.cached_experiment_groups (similar to how self.cached_experiments is
handled) and set it to None so that get_cached_experiment_groups() and
get_applicable_variants() see a consistent cleared state; ensure you follow the
same async locking pattern as used for polling_task_cancellation_token and
cached_experiments within the close() method.
There was a problem hiding this comment.
@coderabbitai ignore here
supposed to be fixed in #881
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if PR `#881` exists and get its details
gh pr view 881 --json title,url,state,body
Length of output: 1123
@ayushjain17 Understood! I can see that PR #881 is open and will handle the provider implementation changes. I'll skip this comment for now since it'll be addressed there.
Problem
tokio's
JoinHandleis not cloneableSolution
use
CancellationTokenfor closing the polling task, which is cloneableSummary by CodeRabbit
Chores
Refactor
Tests