Skip to content

Add TestInstanceOperationEvacuationCancel with independent tests#153

Merged
proud-parselmouth merged 3 commits into
devfrom
anubagar/TestInstanceOperationEvacuationCancel
Apr 16, 2026
Merged

Add TestInstanceOperationEvacuationCancel with independent tests#153
proud-parselmouth merged 3 commits into
devfrom
anubagar/TestInstanceOperationEvacuationCancel

Conversation

@proud-parselmouth
Copy link
Copy Markdown
Collaborator

Issues

  • My PR addresses the following Helix issues and references them in the PR description:

This PR is part of the TestInstanceOperation dependency chain reduction effort. See TestInstanceOperation Dependency Chain Reduction for full analysis. Depends on #152

Description

  • Here are some details about my PR, including screenshots of any UI changes:

What: Extracts 3 evacuation cancellation tests from the monolithic TestInstanceOperation into a new independent test class TestInstanceOperationEvacuationCancel, extending TestInstanceOperationBase.

How:

  • Created TestInstanceOperationEvacuationCancel.java with 3 tests extracted from TestInstanceOperation:
    • testEvacuateAndCancelBeforeBootstrapFinish — set EVACUATE on an instance, cancel it before bootstrap transitions finish, verify the instance remains in the cluster and partitions rebalance back
    • testEvacuateAndCancelBeforeDropFinish — set EVACUATE on an instance with delayed state model, cancel it before drop transitions finish, verify the instance remains and partitions recover
    • testMarkEvacuationAfterEMM — enter maintenance mode, mark an instance for evacuation, exit maintenance mode, verify evacuation completes and the instance is ready for decommission
  • Each test explicitly calls enabledTopologyAwareRebalance() at its start to be fully self-contained (the base class @BeforeMethod disables it by default).
  • Removed all dependsOnMethods chains — every test runs independently.
  • For testEvacuateAndCancelBeforeDropFinish, explicitly adds two extra MasterSlave resources (TEST_DB3_DELAYED_CRUSHED, TEST_DB4_DELAYED_WAGED) to match the original test environment and ensure evacuation has enough work to properly test delayed drop transitions.
  • For testMarkEvacuationAfterEMM, wraps the isReadyForPreparingJoiningCluster check in a verifier() call with increased timeout (60s) to allow delayed state transitions to complete.
  • TestInstanceOperation.java is not modified in this PR.

Tests

  • The following tests are written for this issue:
  1. TestInstanceOperationEvacuationCancel.testEvacuateAndCancelBeforeBootstrapFinish
  2. TestInstanceOperationEvacuationCancel.testEvacuateAndCancelBeforeDropFinish
  3. TestInstanceOperationEvacuationCancel.testMarkEvacuationAfterEMM
  • The following is the result of the mvn test command on the appropriate module:
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 220.87 s - in org.apache.helix.integration.rebalancer.TestInstanceOperationEvacuationCancel [INFO] [INFO] Results: [INFO] [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 03:47 min [INFO] Finished at: 2026-03-27T08:42:19+05:30 [INFO] ------------------------------------------------------------------------

(If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)

Changes that Break Backward Compatibility (Optional)

  • My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:

(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)

Documentation (Optional)

  • In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

_gSetupTool.getClusterManagementTool()
.manuallyEnableMaintenanceMode(CLUSTER_NAME, true, null, null);
String newParticipantName = PARTICIPANT_PREFIX + "_" + _nextStartPort.get();
addParticipant(newParticipantName);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new Participants needs to be cleaned up. The base class before method just restores it if particiapnt are less but never trims them so this test can cause flakiness for others

// 5. Restore participant count to START_NUM_NODE
    while (_participants.size() < START_NUM_NODE) {
      addParticipant(PARTICIPANT_PREFIX + "_" + _nextStartPort.get());
    }

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the fix at the end of method to syncstop the participant.
Before method removes all offline or inactive participants.

Copy link
Copy Markdown
Collaborator

@ngngwr ngngwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment. Rest looks good to me.

@proud-parselmouth proud-parselmouth force-pushed the anubagar/TestInstanceOperationSwapDisabledPartitions branch from 8445cd8 to 4712b3b Compare April 15, 2026 15:25
Base automatically changed from anubagar/TestInstanceOperationSwapDisabledPartitions to dev April 16, 2026 06:49
Extract 3 evacuation-cancel tests from TestInstanceOperation into
TestInstanceOperationEvacuationCancel extending TestInstanceOperationBase:
- testEvacuateAndCancelBeforeBootstrapFinish (cancel during slow bootstrap)
- testEvacuateAndCancelBeforeDropFinish (cancel during slow drop)
- testMarkEvacuationAfterEMM (evacuate during/after maintenance mode)

All tests run independently with no dependsOnMethods chains. Tests call
enabledTopologyAwareRebalance() explicitly. Both bootstrap and drop tests
create extra MasterSlave resources to match the original test environment.
testMarkEvacuationAfterEMM uses a 60s verifier for isReadyForPreparing
JoiningCluster since delayed drop transitions need time to complete.
TestInstanceOperation is not modified.

Made-with: Cursor
Extract duplicated resource creation into createAdditionalMasterSlaveResources()
helper method. Add comments explaining _stateModelDelay sign convention
(negative = slow upward transitions, positive = slow downward transitions)
and why cluster convergence succeeds only because state transition
cancellation is triggered by the ENABLE operation.

Made-with: Cursor
@proud-parselmouth proud-parselmouth force-pushed the anubagar/TestInstanceOperationEvacuationCancel branch from 19b1e52 to deb4797 Compare April 16, 2026 06:52
Copy link
Copy Markdown
Collaborator

@ngngwr ngngwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@proud-parselmouth proud-parselmouth merged commit 5400cb0 into dev Apr 16, 2026
1 of 2 checks passed
@proud-parselmouth proud-parselmouth deleted the anubagar/TestInstanceOperationEvacuationCancel branch April 16, 2026 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants