Skip to content

fix(kafka): support combined mode with brokers.replicaCount=0#87

Merged
mberlofa merged 1 commit into
mainfrom
fix/kafka-combined-mode
Apr 13, 2026
Merged

fix(kafka): support combined mode with brokers.replicaCount=0#87
mberlofa merged 1 commit into
mainfrom
fix/kafka-combined-mode

Conversation

@mberlofa
Copy link
Copy Markdown
Contributor

Summary

Enable combined mode where each KRaft controller also acts as a broker (process.roles=broker,controller) by setting cluster.brokers.replicaCount: 0.

This allows 3-node HA deployments without separate broker StatefulSets, ideal for cost-optimized production environments with moderate throughput requirements.

Fixes

Template Corrections:

  • _helpers.tpl: internalReplicationFactor now uses controllers.replicaCount when brokers=0
  • _helpers.tpl: Validation allows brokers=0 or ≥3 (not 1-2)
  • _helpers.tpl: minInSyncReplicas validation for both dedicated and combined modes
  • configmap-scripts.yaml: Heredoc YAML indentation fixed with nindent
  • configmap-scripts.yaml: Controller generates process.roles=broker,controller when brokers=0
  • configmap-scripts.yaml: Controller exposes CLIENT listener in combined mode
  • service-client.yaml: Selector routes to controller when brokers=0
  • statefulset-cluster.yaml: Controller exposes port 9092 in combined mode
  • statefulset-cluster.yaml: Broker StatefulSet skipped when brokers=0

Testing

  • helm lint --strict passed
  • helm template with default values passed
  • helm template with all ci/*.yaml scenarios passed
  • Combined mode renders only controller StatefulSet (3 replicas)
  • Normal cluster mode still renders both StatefulSets (controllers + brokers)
  • Service selector correctly routes to controller in combined mode
  • Controller pods expose both ports (9092 + 9093) in combined mode

Documentation

  • README.md: Added combined mode to architecture table and quick start examples
  • docs/combined-mode.md: Comprehensive architecture guide with when to use, trade-offs, validation steps
  • examples/combined-mode/: Production-ready values.yaml with resources, PDB, and detailed README
  • ci/combined-mode.yaml: Test scenario for CI pipeline
  • values.yaml: Documented brokers.replicaCount=0 behavior
  • values.schema.json: Schema updated with combined mode documentation and minimum: 0

Configuration Example

architecture: cluster

cluster:
  minInSyncReplicas: 2
  controllers:
    replicaCount: 3
    persistence:
      size: 50Gi  # Sized for broker data, not just metadata
  brokers:
    replicaCount: 0  # Combined mode

pdb:
  enabled: true

When to Use Combined Mode

Ideal for:

  • 3-node HA production deployments
  • Cost-optimized production with moderate throughput
  • Simplified operational model with proper quorum

Trade-offs:

  • ✅ Fewer pods (3 instead of 6)
  • ✅ Lower resource consumption
  • ❌ Cannot scale brokers independently
  • ❌ Controllers handle both metadata and client traffic

Files Changed

11 files changed, 320 insertions(+), 23 deletions(-)
- charts/kafka/README.md
- charts/kafka/templates/_helpers.tpl
- charts/kafka/templates/configmap-scripts.yaml
- charts/kafka/templates/service-client.yaml
- charts/kafka/templates/statefulset-cluster.yaml
- charts/kafka/values.schema.json
- charts/kafka/values.yaml
+ charts/kafka/ci/combined-mode.yaml
+ charts/kafka/docs/combined-mode.md
+ charts/kafka/examples/combined-mode/README.md
+ charts/kafka/examples/combined-mode/values.yaml

Checklist

  • Follows Conventional Commits
  • Branch created from updated main
  • helm lint --strict passes
  • helm template tested with all CI scenarios
  • values.schema.json updated
  • Chart README updated
  • Architecture documentation created
  • Example configuration provided
  • CI test scenario added

Enable combined mode where each KRaft controller also acts as a broker
(process.roles=broker,controller) by setting cluster.brokers.replicaCount=0.
This allows 3-node HA deployments without separate broker StatefulSets.

Fixes:
- _helpers.tpl: internalReplicationFactor now uses controllers when brokers=0
- _helpers.tpl: validation allows brokers=0 or >=3 (not 1-2)
- _helpers.tpl: minInSyncReplicas validation for both dedicated and combined modes
- configmap-scripts.yaml: heredoc YAML indentation fixed with nindent
- configmap-scripts.yaml: controller start.sh generates process.roles=broker,controller when brokers=0
- configmap-scripts.yaml: controller exposes CLIENT listener in combined mode
- service-client.yaml: selector routes to controller when brokers=0
- statefulset-cluster.yaml: controller exposes port 9092 in combined mode
- statefulset-cluster.yaml: broker StatefulSet skipped when brokers=0

Documentation:
- README.md: added combined mode to architecture table and quick start
- values.yaml: documented brokers.replicaCount=0 behavior
- values.schema.json: documented combined mode in schema
- docs/combined-mode.md: comprehensive architecture guide
- examples/combined-mode/: values.yaml and README.md
- ci/combined-mode.yaml: test scenario for combined mode

Tested with helm lint --strict and helm template across all ci/ scenarios.
@mberlofa mberlofa merged commit 51fe78b into main Apr 13, 2026
7 checks passed
@mberlofa mberlofa deleted the fix/kafka-combined-mode branch April 13, 2026 22:58
mberlofa added a commit that referenced this pull request Apr 14, 2026
… combined mode

When brokers.replicaCount=0 (combined mode), the controller headless service must
expose both ports 9093 (controller/quorum) and 9092 (client/inter-broker) to enable
proper inter-broker communication between controller pods acting as brokers.

Problem:
- In combined mode, controllers have process.roles=broker,controller
- Inter-broker replication uses inter.broker.listener.name=CLIENT (port 9092)
- Controller headless service only exposed port 9093
- Result: Inter-broker replication failed because pods couldn't reach each other
  via kafka-controller-X.kafka-controller-headless:9092

Impact:
- CRITICAL: Breaks topic replication in combined mode
- Affects: Direct pod-to-pod broker connections via headless DNS
- Symptom: Connection refused when brokers try to replicate data

Solution:
- Conditionally expose port 9092 on controller headless service when brokers=0
- Skip rendering broker headless service entirely when brokers=0 (no broker pods exist)
- Update docs/combined-mode.md to document headless service behavior

Changes:
- templates/service-headless.yaml: Add conditional client port to controller headless
- templates/service-headless.yaml: Wrap broker headless in {{ if gt brokers 0 }}
- docs/combined-mode.md: Document headless service port exposure

Testing:
✓ helm lint --strict
✓ Combined mode: controller headless exposes 9093+9092, broker headless not rendered
✓ Cluster mode: controller headless exposes 9093, broker headless exposes 9092+9094
✓ Single-broker mode: unchanged (exposes both ports as before)

Fixes: Inter-broker communication in combined mode
Related: PR #87 (combined mode support)
mberlofa added a commit that referenced this pull request Apr 14, 2026
… combined mode (#89)

## Summary

Fix critical bug in combined mode where the controller headless service
only exposed port 9093 (controller/quorum), breaking inter-broker
communication on port 9092 (client).

## Problem

In combined mode (`brokers.replicaCount=0`), controller pods run with
`process.roles=broker,controller` and handle both:
- **Controller traffic** on port 9093 (KRaft quorum)
- **Broker traffic** on port 9092 (client + inter-broker replication)

The controller headless service (`kafka-controller-headless`) was only
exposing port 9093, causing inter-broker replication to fail when
brokers tried to connect via stable pod DNS like:
```
kafka-controller-0.kafka-controller-headless:9092
kafka-controller-1.kafka-controller-headless:9092
```

## Impact

| Severity | Area | Effect |
|----------|------|--------|
| 🔴 **CRITICAL** | Inter-broker replication | Connection refused -
topics cannot replicate |
| 🔴 **CRITICAL** | Combined mode production | Completely broken for any
multi-replica workload |
| ✅ OK | Bootstrap via ClusterIP | Works (uses `kafka:9092` service) |
| ✅ OK | Controller quorum | Works (uses port 9093 which was already
exposed) |

## Root Cause

```yaml
# BEFORE (broken)
apiVersion: v1
kind: Service
metadata:
  name: kafka-controller-headless
spec:
  ports:
    - name: controller
      port: 9093
      # ❌ Missing: port 9092 for inter-broker communication
```

When Kafka's inter-broker replication tries to connect:
```properties
# server.properties (generated in combined mode)
inter.broker.listener.name=CLIENT  # Uses port 9092
```

Result: `kafka-controller-1` tries to replicate from
`kafka-controller-0.kafka-controller-headless:9092` → **Connection
refused**

## Solution

1. **Expose port 9092 on controller headless service** when
`brokers.replicaCount=0`
2. **Skip rendering broker headless service** when
`brokers.replicaCount=0` (no broker pods exist)

```yaml
# AFTER (fixed)
apiVersion: v1
kind: Service
metadata:
  name: kafka-controller-headless
spec:
  ports:
    - name: controller
      port: 9093
    {{- if eq (.Values.cluster.brokers.replicaCount | int) 0 }}
    - name: client
      port: 9092  # ✅ Now exposed in combined mode
    {{- end }}
```

## Testing

### Combined Mode (brokers=0)
```bash
$ helm template kafka charts/kafka -f charts/kafka/ci/combined-mode.yaml \
  --show-only templates/service-headless.yaml

# Result:
# ✅ Controller headless service ports: 9093 (controller) + 9092 (client)
# ✅ Broker headless service: NOT RENDERED (no broker pods)
```

### Normal Cluster Mode (brokers=3)
```bash
$ helm template kafka charts/kafka -f charts/kafka/ci/cluster.yaml \
  --show-only templates/service-headless.yaml

# Result:
# ✅ Controller headless service ports: 9093 only
# ✅ Broker headless service ports: 9092 (client) + 9094 (internal)
```

### All CI Scenarios
```
✓ helm lint --strict
✓ single-broker.yaml
✓ cluster.yaml
✓ combined-mode.yaml
✓ cluster-tuned.yaml
✓ metrics.yaml
```

## Changes

```diff
2 files changed, 10 insertions(+), 1 deletion(-)

charts/kafka/templates/service-headless.yaml:
+ Expose port 9092 on controller headless when brokers=0
+ Wrap broker headless service in {{ if gt brokers 0 }}

charts/kafka/docs/combined-mode.md:
+ Document headless service port exposure behavior
```

## Verification in Production

After this fix is deployed, you can verify:

```bash
# 1. Check headless service endpoints
kubectl get endpoints kafka-controller-headless -o yaml

# Expected in combined mode:
# ports:
#   - name: controller
#     port: 9093
#   - name: client
#     port: 9092  # ✅ Now present

# 2. Test inter-broker connectivity
kubectl exec -it kafka-controller-1 -- \
  /opt/kafka/bin/kafka-broker-api-versions.sh \
  --bootstrap-server kafka-controller-0.kafka-controller-headless:9092

# Expected: Success (no connection refused)

# 3. Check topic replication
kubectl exec -it kafka-controller-0 -- \
  /opt/kafka/bin/kafka-topics.sh --describe --topic __consumer_offsets \
  --bootstrap-server localhost:9092

# Expected: All replicas in-sync
```

## Related

- Relates to: PR #87 (combined mode support) - **MERGED**
- Fixes: Critical bug blocking production use of combined mode

## User Impact

- **Breaking changes**: None
- **Upgrade path**: Seamless - just upgrade to new chart version
- **Backward compatibility**: Fully compatible with existing deployments
- **Performance**: No impact (just exposes additional port)

---

**This is a critical production bug fix for combined mode. Without this
fix, combined mode cannot be used for any workload requiring topic
replication.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant