Skip to content

Conversation

@thlorenz
Copy link
Collaborator

@thlorenz thlorenz commented Oct 28, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Improved account cleanup to include accounts stuck in intermediate or pending states so eligible accounts are no longer skipped during removal, increasing reliability.
  • Tests

    • Adjusted integration timing to ensure subscriptions reflect consecutive airdrops and processing delays for consistent balance updates.
    • Increased retry limit in deployment tests to reduce transient failures and improve test stability.

This PR fixes an edge case where accounts in the process of undelegation can get stuck if the validator misses the update of the undelegation completion.
The fix ensures that such accounts are properly cleaned up on restart.

Details

The issue occurred when accounts were being undelegated but the process didn't complete while the validator was active or we missed the update.

@github-actions
Copy link

github-actions bot commented Oct 28, 2025

Manual Deploy Available

You can trigger a manual deploy of this PR branch to testnet:

Deploy to Testnet 🚀

Alternative: Comment /deploy on this PR to trigger deployment directly.

⚠️ Note: Manual deploy requires authorization. Only authorized users can trigger deployments.

Comment updated automatically when the PR is synchronized.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 28, 2025

Walkthrough

Extended account-removal predicate to also allow removing accounts owned by dlp::id() even when not marked delegated; adjusted an integration test to remove the initial airdrop and instead perform two successive airdrops with sleeps before asserting lamports; increased retry limit in a cloning test from 20 to 50.

Changes

Cohort / File(s) Summary
Account removal filter update
magicblock-chainlink/src/chainlink/mod.rs
Added ReadableAccount import and extended reset_accounts_bank removal filter to treat accounts owned by dlp::id() as eligible for removal regardless of the delegation flag, while keeping blacklist checks.
Integration test timing change
test-integration/test-chainlink/tests/ix_remote_account_provider.rs
Removed the initial airdrop before first fetch; test now fetches the account, issues two successive airdrops with sleeps to allow subscription updates, and then asserts the updated lamports.
Retry limit increase in cloning test
test-integration/test-cloning/tests/01_program-deploy.rs
Increased MAX_RETRIES from 20 to 50 in two locations; added parentheses in a bump arithmetic expression (no behavioral change).

Sequence Diagram(s)

(Skipped — changes are localized and do not introduce new cross-component control flow requiring a sequence diagram.)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Review reset_accounts_bank predicate for any unintended removals and ensure blacklist logic remains intact.
  • Run integration tests that rely on subscriptions/timing to confirm sleeps and successive airdrops are reliable.
  • Check the increased MAX_RETRIES impact on test duration and whether the change masks flakiness; verify the arithmetic parenthesis change is harmless.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "fix: resolve stuck accounts in process of undelegation" is directly aligned with the primary change in the codebase. The main modification in magicblock-chainlink/src/chainlink/mod.rs extends the removal filter in reset_accounts_bank to properly handle accounts owned by dlp::id() that are in the process of undelegation but not yet completed, which is exactly what the title describes. The title is concise, specific, and uses the conventional commit format with a clear "fix:" prefix. It accurately conveys the core purpose of the changeset and provides sufficient context for a developer scanning the repository history to understand the primary issue being addressed.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch thlorenz/hot-fix-stuck-accounts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@bmuddha bmuddha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@GabrielePicco GabrielePicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

(!account.delegated()
// This fixes the edge-case of accounts that were in the process of
// being undelegated but never completed while the validator was running
|| account.owner().eq(&dlp::id()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be fixed if we set delegation flag to false in magic-program right when undelegation was requested? @thlorenz @GabrielePicco

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be an improper fix since we don't know if the account was properly undelegated yet.
There is a reason for this two step process.

We will provide a more proper fix soon .. this is just an unblocker.

Copy link
Contributor

@taco-paco taco-paco Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it improper? In context of ER it was undelegated. Everything else is the matter of properly committing it. We shall not bring a network or Intent service related issues into Account state.
In context of ER account it whether delegated or not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting the delegation flag to false is necessary but not sufficient to properly handle all edge cases. We also need additional context to determine when an account is awaiting_undelegation, so we can correctly handle transactions that attempt to use that account while it is in the process of being undelegated versus after it has already been undelegated. In the first case, transactions in the ER should fail. In the second case, the cloning pipeline should "re-fetch" the account

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da97db6 and 4df513e.

📒 Files selected for processing (1)
  • test-integration/test-chainlink/tests/ix_remote_account_provider.rs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
test-integration/test-chainlink/tests/ix_remote_account_provider.rs (1)
test-integration/test-tools/src/integration_test_context.rs (1)
  • airdrop (609-634)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run_make_ci_lint
  • GitHub Check: run_make_ci_test
  • GitHub Check: run_make_ci_test
🔇 Additional comments (2)
test-integration/test-chainlink/tests/ix_remote_account_provider.rs (2)

110-117: Consider timing flakiness with fixed sleep durations.

The test relies on fixed sleep durations (500ms and 1500ms) to allow subscription updates to propagate. These hardcoded values could lead to flaky test behavior on slower systems or under load.

Consider whether these durations are sufficient for all environments, or if there's a more robust way to verify subscription updates have been processed (e.g., polling with a timeout, or waiting for specific events).

Have you observed this test running reliably across different environments (CI, local development, various system loads)?


94-128: Clarify connection to PR objective.

The PR objective describes fixing stuck accounts during undelegation, and the related changes in magicblock-chainlink/src/chainlink/mod.rs update the removal filter for accounts owned by dlp::id(). However, this test modification focuses on airdrop subscription updates rather than undelegation scenarios.

Could you clarify how these test changes relate to the undelegation fix? Is this testing a side effect of the chainlink changes, or validating that the subscription system properly handles state changes that could affect undelegation processing?

Adding a comment in the test explaining this connection would help reviewers and future maintainers understand the broader context.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4df513e and 8e9f2f2.

📒 Files selected for processing (1)
  • test-integration/test-cloning/tests/01_program-deploy.rs (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: run_make_ci_lint
  • GitHub Check: run_make_ci_test
  • GitHub Check: run_make_ci_test
  • GitHub Check: run_make_ci_format
🔇 Additional comments (1)
test-integration/test-cloning/tests/01_program-deploy.rs (1)

181-181: Verify the rationale for increasing MAX_RETRIES.

The retry limit increased from 20 to 50 (2.5x). While this may address timing changes from the stuck account fix, please verify that this change is intentional and not masking a deeper issue with upgrade propagation timing.

Consider documenting why the increase was necessary, especially if the stuck account fix impacts upgrade timing characteristics.

ctx.wait_for_delta_slot_ephem(5).unwrap();

let bump = remaining_retries - MAX_RETRIES + 1;
let bump = (remaining_retries - MAX_RETRIES) + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix the incorrect bump calculation formula.

The current formula produces negative values after the first iteration:

  • Iteration 1: remaining_retries=50, bump = (50-50)+1 = 1
  • Iteration 2: remaining_retries=49, bump = (49-50)+1 = 0
  • Iteration 3: remaining_retries=48, bump = (48-50)+1 = -1

This results in messages like "Hola Mundo 0", "Hola Mundo -1", which is incorrect for a retry counter.

Apply this diff to fix the formula:

-            let bump = (remaining_retries - MAX_RETRIES) + 1;
+            let bump = (MAX_RETRIES - remaining_retries) + 1;

This will produce incrementing values: 1, 2, 3, 4, ... as expected for a retry counter.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let bump = (remaining_retries - MAX_RETRIES) + 1;
let bump = (MAX_RETRIES - remaining_retries) + 1;
🤖 Prompt for AI Agents
In test-integration/test-cloning/tests/01_program-deploy.rs around line 186, the
bump calculation uses (remaining_retries - MAX_RETRIES) + 1 which yields
negative or zero values after the first iteration; replace that formula with
(MAX_RETRIES - remaining_retries) + 1 so bump increments 1,2,3,... as retries
decrease.

@thlorenz thlorenz merged commit 9e2d817 into master Oct 28, 2025
7 checks passed
@thlorenz thlorenz deleted the thlorenz/hot-fix-stuck-accounts branch October 28, 2025 15:51
Dodecahedr0x pushed a commit that referenced this pull request Nov 18, 2025
## <!-- This is an auto-generated comment: release notes by
coderabbit.ai -->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved account cleanup to include accounts stuck in intermediate or
pending states so eligible accounts are no longer skipped during
removal, increasing reliability.

* **Tests**
* Adjusted integration timing to ensure subscriptions reflect
consecutive airdrops and processing delays for consistent balance
updates.
* Increased retry limit in deployment tests to reduce transient failures
and improve test stability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

This PR fixes an edge case where accounts in the process of undelegation
can get stuck if the validator misses the update of the undelegation
completion.
The fix ensures that such accounts are properly cleaned up on restart.

## Details

The issue occurred when accounts were being undelegated but the process
didn't complete while the validator was active or we missed the update.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants