Skip to content

Conversation

atakavci
Copy link
Contributor

@atakavci atakavci commented Oct 6, 2025

Circuit Breaker Two-Threshold Failover and MultiDbClient Introduction

This PR introduces a new dual-threshold circuit breaker mechanism and a simplified MultiDbClient API for Redis multi-endpoint deployments. The changes enhance failover precision by requiring both minimum failure count AND failure rate thresholds to be exceeded before triggering failover, preventing false positives from small sample sizes.

🚀 New Features Added

1. MultiDbClient - Simplified Multi-Endpoint Redis Client

  • NEW: MultiDbClient class extending UnifiedJedis for high-availability Redis connectivity
  • NEW: MultiDbClientBuilder abstract builder for creating multi-db Redis clients
  • NEW: Simplified API for managing multiple weighted Redis endpoints
  • NEW: Built-in automatic failover and failback capabilities
  • NEW: Event-driven database switch notifications via databaseSwitchListener
  • NEW: Dynamic endpoint management (add/remove endpoints at runtime)
  • NEW: Force active endpoint functionality for maintenance scenarios

2. Dual-Threshold Circuit Breaker System

  • NEW: circuitBreakerMinNumOfFailures configuration - Minimum number of failures required before circuit breaker can trip
  • NEW: CircuitBreakerThresholdsAdapter - Disables Resilience4j's built-in evaluation to use Jedis custom logic
  • NEW: evaluateThresholds() method in Cluster class for custom threshold evaluation
  • NEW: Both minimum failures AND failure rate thresholds must be exceeded to trigger failover
  • NEW: Prevents false positives from small sample sizes in low-traffic scenarios

3. Enhanced Configuration Flexibility

  • NEW: MultiClusterClientConfig.builder() - No-argument builder for dynamic endpoint addition
  • NEW: endpoint(Endpoint, float, JedisClientConfig) - Simplified endpoint addition method
  • NEW: endpoint(ClusterConfig) - Pre-configured cluster addition method
  • NEW: Validation to prevent both thresholds being zero simultaneously

🔧 Core Improvements

1. Circuit Breaker Logic Overhaul

  • CHANGED: Default failure rate threshold reduced from 50% to 10% for faster failover detection
  • CHANGED: Default sliding window size reduced from 100 to 2 seconds for quicker response
  • CHANGED: Default minimum failures set to 1000 to prevent premature failover in low-traffic scenarios
  • CHANGED: Resilience4j circuit breaker evaluation disabled in favor of custom dual-threshold logic
  • IMPROVED: Circuit breaker now evaluates thresholds on every error event for immediate response
  • IMPROVED: Forced circuit breaker state transitions when both thresholds are exceeded

2. Configuration Defaults Optimization

  • CHANGED: Default failback check interval increased from 5 seconds to 2 minutes (120000ms)
  • CHANGED: Default grace period increased from 10 seconds to 1 minute (60000ms)
  • CHANGED: Default health check interval increased from 1 second to 5 seconds (5000ms)
  • CHANGED: Default lag tolerance in LagAwareStrategy increased from 100ms to 5 seconds (5000ms)
  • CHANGED: Default delay between probes increased from 100ms to 500ms
  • IMPROVED: More conservative defaults to reduce unnecessary failover churn in production

3. API Modernization and Consistency

  • CHANGED: ClusterConfig constructor now uses Endpoint interface instead of HostAndPort
  • CHANGED: getHostAndPort() method renamed to getEndpoint() for consistency
  • CHANGED: Builder pattern enhanced with fluent endpoint configuration methods
  • IMPROVED: Type safety with Endpoint interface usage throughout the codebase
  • IMPROVED: Consistent naming conventions across all configuration classes

4. Health Check System Enhancements

  • CHANGED: EchoStrategy now uses connection pool with maximum 2 connections for health checks
  • IMPROVED: Health check strategies now have more conservative default intervals
  • IMPROVED: Better resource management for health check connections

📦 Package and Structure Changes

1. Test Infrastructure Improvements

  • NEW: CircuitBreakerThresholdsTest - Comprehensive tests for dual-threshold behavior
  • NEW: ClusterEvaluateThresholdsTest - Unit tests for threshold evaluation logic
  • NEW: DefaultValuesTest - Validation of all default configuration values
  • NEW: MultiDbClientTest - Integration tests for the new MultiDbClient API
  • NEW: ReflectionTestUtil - Utility class for test reflection operations
  • UPDATED: All existing tests updated to use new dual-threshold configuration

2. Maven Configuration Updates

  • NEW: excludedGroupsForUnitTests property for flexible test group exclusion
  • IMPROVED: Test execution configuration with parameterized excluded groups
  • IMPROVED: Coverage configuration includes new MultiDbClient and test utilities

🔄 API Changes

1. MultiClusterClientConfig Builder

// NEW: No-argument builder for dynamic configuration
MultiClusterClientConfig config = MultiClusterClientConfig.builder()
    .endpoint(endpoint1, 100.0f, clientConfig1)
    .endpoint(endpoint2, 50.0f, clientConfig2)
    .circuitBreakerMinNumOfFailures(1000)        // NEW: Minimum failures threshold
    .circuitBreakerFailureRateThreshold(10.0f)   // CHANGED: Default reduced to 10%
    .build();

2. MultiDbClient Usage

// NEW: Simplified multi-endpoint client
MultiDbClient client = MultiDbClient.builder()
    .multiDbConfig(config)
    .databaseSwitchListener(event -> 
        System.out.println("Switched to: " + event.getEndpoint()))
    .build();

// NEW: Dynamic endpoint management
client.addEndpoint(newEndpoint, 25.0f, clientConfig);
client.removeEndpoint(oldEndpoint);
client.forceActiveEndpoint(endpoint, Duration.ofMinutes(5).toMillis());

3. ClusterConfig Constructor Changes

// OLD: Using HostAndPort
ClusterConfig config = new ClusterConfig(hostAndPort, clientConfig);

// NEW: Using Endpoint interface
ClusterConfig config = new ClusterConfig(endpoint, clientConfig);

🐛 Bug Fixes

1. Circuit Breaker State Management

  • FIXED: Circuit breaker now properly transitions to FORCED_OPEN state when thresholds exceeded
  • FIXED: Immediate failover trigger when circuit breaker opens during command execution
  • SOLUTION: Added circuit breaker state checking in command executors with automatic failover

2. Test Configuration Consistency

  • FIXED: Test configurations updated to use new dual-threshold parameters
  • FIXED: Default value assertions updated to match new configuration defaults
  • SOLUTION: Comprehensive test updates across all failover and circuit breaker tests

🎯 Behavioral Changes

1. Circuit Breaker Failover Logic

Before:

  • Single threshold (failure rate OR minimum calls) could trigger failover
  • Small sample sizes could cause false positive failovers
  • Resilience4j handled all circuit breaker state transitions

After:

  • Dual threshold system requires BOTH minimum failures AND failure rate to be exceeded
  • Prevents false positives in low-traffic scenarios
  • Custom Jedis logic controls circuit breaker state transitions
  • More precise failover decisions based on actual failure patterns

2. Configuration Defaults

Before:

  • Aggressive defaults optimized for quick failover detection
  • 50% failure rate threshold, 100-call sliding window
  • 5-second failback checks, 10-second grace periods

After:

  • Conservative defaults optimized for production stability
  • 10% failure rate threshold, 2-second sliding window, 1000 minimum failures
  • 2-minute failback checks, 1-minute grace periods
  • Reduced unnecessary failover churn while maintaining responsiveness

3. API Usage Patterns

Before:

  • Required pre-configured ClusterConfig arrays
  • HostAndPort-based endpoint specification
  • Limited dynamic endpoint management

After:

  • Flexible builder pattern with dynamic endpoint addition
  • Endpoint interface for consistent type usage
  • Full runtime endpoint management capabilities

📊 Configuration Examples

Basic Dual-Threshold Configuration

MultiClusterClientConfig config = MultiClusterClientConfig.builder()
    .endpoint(primary, 100.0f, primaryConfig)
    .endpoint(secondary, 50.0f, secondaryConfig)
    .circuitBreakerMinNumOfFailures(500)         // Require 500+ failures
    .circuitBreakerFailureRateThreshold(15.0f)   // AND 15%+ failure rate
    .circuitBreakerSlidingWindowSize(3)          // Over 3-second window
    .build();

MultiDbClient with Event Handling

MultiDbClient client = MultiDbClient.builder()
    .multiDbConfig(config)
    .databaseSwitchListener(event -> {
        logger.info("Database switched: {} -> {} (reason: {})",
            event.getPreviousEndpoint(), event.getEndpoint(), event.getReason());
        metrics.incrementCounter("db.switch", "reason", event.getReason().name());
    })
    .build();

Dynamic Endpoint Management

// Add new endpoint at runtime
client.addEndpoint(newEndpoint, 75.0f, newClientConfig);

// Force specific endpoint for maintenance
client.forceActiveEndpoint(maintenanceEndpoint, Duration.ofMinutes(10).toMillis());

// Check endpoint health
if (client.isHealthy(endpoint)) {
    client.setActiveDatabase(endpoint);
}

⚠️ Breaking Changes

API Changes

  1. ClusterConfig constructor:

    • Changed: ClusterConfig(HostAndPort, JedisClientConfig)ClusterConfig(Endpoint, JedisClientConfig)
    • Impact: Direct constructor usage needs to be updated
  2. ClusterConfig getter method:

    • Changed: getHostAndPort()getEndpoint()
    • Impact: Code accessing endpoint information needs updates
  3. Circuit breaker configuration:

    • Removed: Several Resilience4j-specific configuration methods
    • Added: circuitBreakerMinNumOfFailures() method
    • Impact: Circuit breaker configuration needs dual-threshold setup

Default Value Changes

  • Failure rate threshold: 50% → 10%
  • Sliding window size: 100 → 2 seconds
  • Failback check interval: 5s → 120s
  • Grace period: 10s → 60s
  • Health check interval: 1s → 5s

Migration Guide

// OLD: Single threshold configuration
builder.circuitBreakerFailureRateThreshold(50.0f)
       .circuitBreakerSlidingWindowMinCalls(100);

// NEW: Dual threshold configuration
builder.circuitBreakerFailureRateThreshold(10.0f)
       .circuitBreakerMinNumOfFailures(1000)
       .circuitBreakerSlidingWindowSize(2);

🧪 Testing

All changes validated with:

  • ✅ Comprehensive dual-threshold circuit breaker tests
  • ✅ MultiDbClient integration tests with dynamic endpoint management
  • ✅ Default configuration validation tests
  • ✅ Backward compatibility tests for existing configurations
  • ✅ Performance tests for new threshold evaluation logic
  • ✅ Edge case testing for threshold boundary conditions

📝 Additional Notes

  • New dual-threshold system enables more control on fail-over process in alternative ways in both low & high traffic scenarios
  • MultiDbClient provides a simplified API for common multi-endpoint use cases
  • Endpoint interface usage improves type safety and API consistency
  • All existing functionality remains available through the enhanced configuration system
  • Test infrastructure improvements provide better coverage and reliability validation

atakavci and others added 4 commits October 3, 2025 12:03
…components (#4298)

* - set & test default values

* - format

* - fix tests failing due to changing defaults
…ure rate) capabililty to circuit breaker (#4295)

* [automatic failover] Remove the check for 'GenericObjectPool.getNumWaiters()' in 'TrackingConnectionPool' (#4270)

- remove the check for number of waitiers in TrackingConnectionPool

* [automatic failover] Configure max total connections for EchoStrategy (#4268)

- set maxtotal connections for echoStrategy

* [automatic failover] Replace 'CircuitBreaker' with 'Cluster' for 'CircuitBreakerFailoverBase.clusterFailover' (#4275)

* - replace CircuitBreaker with Cluster for CircuitBreakerFailoverBase.clusterFailover
- improve thread safety with provider initialization

* - formatting

* [automatic failover] Minor optimizations on fast failover (#4277)

* - minor optimizations on fail fast

* -  volatile failfast

* [automatic failover] Implement health check retries (#4273)

* - replace minConsecutiveSuccessCount with numberOfRetries
- add retries into healtCheckImpl
- apply changes to strategy implementations config classes
- fix unit tests

* - fix typo

* - fix failing tests

* - add tests for retry logic

* - formatting

* - format

* - revisit numRetries for healthCheck ,replace with numProbes and implement built in policies
- new types probecontext, ProbePolicy, HealthProbeContext
- add delayer executor pool to healthcheckımpl
-  adjustments on  worker pool of healthCheckImpl for shared use of workers

* - format

* - expand comment with example case

* - drop pooled executor for delays

* - polish

* - fix tests

* - formatting

* - checking failing tests

* - fix test

* - fix flaky tests

* - fix flaky test

* - add tests for builtin probing policies

* - fix flaky test

* [automatic failover] Move failover provider to mcf (#4294)

* - move failover provider to mcf

* - make iterateActiveCluster package private

* [automatic failover]  Add SSL configuration support to LagAwareStrategy  (#4291)

* User-provided ssl config for lag-aware health check

* ssl scenario test for lag-aware healthcheck

* format

* format

* address review comments

  - use getters instead of fields

* [automatic failover] Implement max number of failover attempts (#4293)

* - implement max failover attempt
- add tests

* - fix user receive the intended exception

* -clean+format

* - java doc for exceptions

* format

* - more tests on excaption types in max failover attempts mechanism

* format

* fix failing timing in test

* disable health checks

* rename to switchToHealthyCluster

* format

* - Add dual-threshold (min failures + failure rate) failover to circuit breaker executor
- Map config to resilience4j via CircuitBreakerThresholdsAdapter
- clean up/simplfy config: drop slow-call and window type
- Add thresholdMinNumOfFailures; update some of the defaults
- Update provider to use thresholds adapter
- Update docs; align examples with new defaults
- Add tests for 0% rate, edge thresholds

* polish

* Update src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* - fix typo

* - fix min total calls calculation

* format

* - merge issues fixed

* fix javadoc ref

* - move threshold evaluations to failoverbase
- simplfy executer and cbfailoverconnprovider
- adjust config getters
- fix failing tests due to COUNT_BASED -> TIME_BASED
- new tests for thresholds calculations and impact on circuit state transitions

* - avoid facilitating actual CBConfig type in tests

* Update src/test/java/redis/clients/jedis/failover/FailoverIntegrationTest.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Trigger workflows

* - evaluate only in failure recorded and failover immediately
- add more test on threshold calculations
- enable command line arg for overwriting surefire.excludedGroups

* format

* check pom

* - fix error prone test

* [automatic failover] Set and test default values for failover config&components (#4298)

* - set & test default values

* - format

* - fix tests failing due to changing defaults

* - fix flaky test

* - remove unnecessary checks for failover attempt

* - clean and trim adapter class
- add docs and more explanantion

* fix javadoc issue

* - switch to all_succes to fix flaky timing

* - fix issue in CircuitBreakerFailoverConnectionProvider

* introduce ReflectionTestUtil

---------

Co-authored-by: Ivo Gaydazhiev <ivo.gaydazhiev@redis.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…over and circuit breaker support (#4300)

* feat: introduce ResilientRedisClient with multi-endpoint failover support

Add ResilientRedisClient extending UnifiedJedis with automatic failover
capabilities across multiple weighted Redis endpoints. Includes circuit
breaker pattern, health monitoring, and configurable retry logic for
high-availability Redis deployments.

* format

* mark ResilientRedisClientTest as integration one

* fix test
  - make sure endpoint is healthy before activating it

* Rename ResilientClient to align with design

 - ResilientClient -> MultiDbClient (builder, tests, etc)

* Rename setActiveEndpoint to setActiveDatabaseEndpoint

* Rename clusterSwitchListener to databaseSwitchListener

* Rename multiClusterConfig to multiDbConfig

* fix api doc's error

* fix compilation error after rebase

* format

* fix example in javadoc

* Update ActiveActiveFailoverTest scenariou test to use builder's

# Conflicts:
#	src/test/java/redis/clients/jedis/scenario/ActiveActiveFailoverTest.java

* rename setActiveDatabaseEndpoint -. setActiveDatabase

* is healthy throw exception if cluster does not exists

* format
…ti db (#4302)

[clean up] Use Endpoint interface where possible
@atakavci atakavci requested review from uglide, ggivo and Copilot October 6, 2025 13:19
@atakavci atakavci self-assigned this Oct 6, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a dual-threshold circuit breaker mechanism and a new MultiDbClient for enhanced Redis multi-endpoint failover. The changes replace single-threshold circuit breaking with a system requiring both minimum failure count AND failure rate thresholds to be exceeded, preventing false positives in low-traffic scenarios.

  • Implementation of dual-threshold circuit breaker system requiring both minimum failures and failure rate thresholds
  • Introduction of MultiDbClient and MultiDbClientBuilder for simplified multi-endpoint Redis management
  • Updated default configuration values for more production-ready behavior (conservative timeouts, intervals)

Reviewed Changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/test/java/redis/clients/jedis/util/ReflectionTestUtil.java Utility for accessing private fields in tests using reflection
src/test/java/redis/clients/jedis/util/ClientTestUtil.java Test utility for extracting connection providers from UnifiedJedis
src/main/java/redis/clients/jedis/MultiDbClient.java New high-availability Redis client with multi-endpoint support
src/main/java/redis/clients/jedis/builders/MultiDbClientBuilder.java Abstract builder for creating multi-database Redis clients
src/main/java/redis/clients/jedis/mcf/CircuitBreakerThresholdsAdapter.java Adapter to disable Resilience4j evaluation in favor of custom logic
Multiple test files Updated to use new dual-threshold configuration and MultiDbClient
src/main/java/redis/clients/jedis/MultiClusterClientConfig.java Major refactoring to support dual-threshold circuit breaker and new builder pattern

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

AtomicReference<String> interruptedThreadName = new AtomicReference<>();
AtomicReference<Throwable> thrownException = new AtomicReference<>();
AtomicReference<Boolean> isInterrupted = new AtomicReference<>();
// When: Interrupt thse waiting thread
Copy link
Preview

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'thse' to 'the'.

Suggested change
// When: Interrupt thse waiting thread
// When: Interrupt the waiting thread

Copilot uses AI. Check for mistakes.


public ClusterSwitchEventArgs(SwitchReason reason, Endpoint endpoint, Cluster cluster) {
this.reason = reason;
// TODO: @ggivo do we need cluster name?
Copy link
Preview

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO comment should be resolved or converted to a proper issue tracker item before merging to main branch.

Suggested change
// TODO: @ggivo do we need cluster name?

Copilot uses AI. Check for mistakes.

Copy link

github-actions bot commented Oct 6, 2025

Test Results

   283 files  + 4    283 suites  +4   11m 17s ⏱️ +21s
10 076 tests +56  9 620 ✅ +56  456 💤 ±0  0 ❌ ±0 
 2 714 runs  +56  2 714 ✅ +56    0 💤 ±0  0 ❌ ±0 

Results for commit d851ca0. ± Comparison against base commit f8de2fe.

♻️ This comment has been updated with latest results.

uglide
uglide previously approved these changes Oct 6, 2025
ggivo
ggivo previously approved these changes Oct 6, 2025
Copy link
Collaborator

@ggivo ggivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@atakavci atakavci dismissed stale reviews from ggivo and uglide via d851ca0 October 6, 2025 15:02
Copy link
Collaborator

@ggivo ggivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@atakavci atakavci merged commit 303ed10 into master Oct 6, 2025
24 of 27 checks passed
@atakavci atakavci added this to the 7.0.0 milestone Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants