TCP KCC v1.0 (Kalman Congestion Control)

TCP KCC v1.0 (Kalman Congestion Control)

TCP congestion control module for shared-bandwidth VPS environments combining the BBRv1 state machine with a Kalman filter for propagation-delay estimation.

Design Principles

Congestion control algorithms must balance throughput, latency, fairness, and loss tolerance. KCC takes a pragmatic approach:

BBRv1 provides a proven foundation. State machine, pacing, cycle gains, STARTUP/DRAIN/PROBE_BW/PROBE_RTT — KCC adopts these mechanisms without modification.
The Kalman filter improves estimation accuracy. Separating true propagation delay from queuing delay and jitter yields a more accurate min_rtt estimate, enabling tighter BDP calculation, better-calibrated CWND, and more stable pacing.
Inter-algorithm dynamics follow standard TCP competitive equilibrium. KCC does not artificially limit its send rate in response to queue detected from external flows. Gain decay (queue-based probe reduction) is available as opt-in via kcc_cycle_decay_mask but disabled by default to preserve full probe intensity.
Intra-KCC fairness is actively maintained. Kalman convergence ensures KCC flows on the same host share a consistent min_rtt estimate, eliminating the winner-takes-all feedback loop that causes severe unfairness in pure BBR multi-flow deployments.

Algorithm Overview

TCP KCC implements a sender-side congestion control module for the Linux kernel as a loadable tcp_kcc.ko. The congestion control function kcc_main() is invoked on each ACK from tcp_ack(), receiving a rate_sample structure that contains kernel-level bandwidth and RTT samples along with delivery and loss counts. The algorithm operates in two temporal regimes: a per-ACK fast path that updates measurement state and computes instantaneous pacing and window targets, and a per-round slower path that evaluates state-transition conditions and recomputes gains.

The core measurement pipeline consists of two components:

Sliding-window maximum bandwidth filter (minmax_running_max from linux/win_minmax.h): window covering the last kcc_bw_rt_cycle_len (default 10) round trips. Provides the BBR-compatible max_bw estimate.
Kalman filter propagation-delay estimator: replaces BBRv1's sliding-window minimum RTT, and is the default source for the BDP RTT estimate (see Model RTT Strategy). A single-state Kalman filter (Kalman 1960) operating in kcc_kalman_scale × µs fixed-point units, modeling the true propagation delay as a random walk:
- State: x[k] = x[k−1] + w[k], w ~ N(0, Q)
- Observation: z[k] = x[k] + v[k], v ~ N(0, R)

Fixed-point conventions: BW_UNIT = 1 << 24 for bandwidth (segments * 2^24 / µs), BBR_UNIT = 1 << 8 = 256 as the dimensionless gain unit.

Model RTT Strategy

KCC introduces a configurable strategy for the RTT estimate used in BDP (Bandwidth-Delay Product) calculation, controlled by kcc_rtt_mode:

Mode	Value	Behavior	Use Case
FILTER	1 (default)	Use `x_est_us` directly — the raw Kalman/sliding-window filter estimate	Production WAN/VPS: resilient to route changes, zero-throughput-cliff
MIN	0	`min(x_est_us, min_rtt_us)` — clamp Kalman estimate against windowed minimum	Kernel-module stability verification; static-RTT links

Why FILTER is the default:

Route-change resilience: When a BGP reroute increases physical RTT (e.g., 50 ms → 100 ms), the Kalman gain K_k reacts within a few RTTs and pulls the estimate to the new path latency. MIN mode deadlocks on the old min_rtt_us until the window expires, cutting BDP in half.
Built-in defenses: Outlier gating rejects queue-spike samples before they enter the filter. Adaptive Q/R noise estimation reduces Kalman gain when the network is noisy, so the filter naturally distrusts transient queue-bloat and keeps the estimate near true propagation delay.
PROBE_RTT decoupling: FILTER mode enables the kcc_probe_rtt_decouple feature — the Kalman filter tracks the RTT floor without requiring the periodic 4-packet drain.

Runtime switch: echo 0 > /proc/sys/net/kcc/kcc_rtt_mode to revert to MIN mode.

State Machine

    ┌───> STARTUP ────┐
    │       │         │
    │       ▼         │
    │     DRAIN  ─────┤
    │       │         │
    │       ▼         │
    └─── PROBE_BW ────┘
    │      ^    │
    │      │    │
    │      └────┘
    │
    └─── PROBE_RTT <──┘

Four modes encoded as the 2-bit mode field in struct KCC:

STARTUP (0): Initial state. pacing_gain ≈ 2.885x (kcc_high_gain_val), cwnd_gain also 2.885x. Exponential bandwidth probing.
DRAIN (1): Entered after STARTUP exit. pacing_gain ≈ 0.347x (kcc_drain_gain_val), cwnd_gain remains 2.885x. Drains the queue accumulated during STARTUP.
PROBE_BW (2): Steady state. Cycles through a 256-slot gain table (default 8-phase pattern repeated: 1.25x/0.75x/8×1.0x).
PROBE_RTT (3): Periodically drains inflight to kcc_cwnd_min_target (default 4 segments) to obtain a fresh RTT sample.

STARTUP → DRAIN

Triggered when full_bw_reached is set — after kcc_full_bw_cnt (default 3) consecutive rounds where max_bw fails to grow by at least kcc_full_bw_thresh_val (default 1.25x) compared to the previously observed peak. The BDP at 1.0x gain is written to snd_ssthresh. qdelay_avg is reset to zero to prevent the STARTUP queue buildup from affecting PROBE_BW.

DRAIN → PROBE_BW

Triggered when estimated inflight-at-EDT ≤ target inflight at 1.0x BDP gain. Drain-skip optimization: when the Kalman filter is converged AND qdelay_avg is below kcc_drain_skip_qdelay_us (default 1000 µs), the DRAIN phase is skipped — converts early to PROBE_BW.

On PROBE_BW entry, the cycle phase index is randomized: cycle_idx = len − 1 − rand(kcc_probe_bw_cycle_rand) (default len − 1 − rand(8)), which decorrelates concurrent flows sharing a bottleneck link.

PROBE_BW → PROBE_RTT

Triggered when the PROBE_RTT filter interval expires — the timestamp min_rtt_stamp has not been updated within the computed interval. cwnd is saved in prior_cwnd, pacing set to drain.

PROBE_RTT → PROBE_BW

After inflight drops to kcc_cwnd_min_target or a round boundary is observed, persists for at least kcc_probe_rtt_mode_ms_val (default 200 ms) and at least one full round observed, then exits. cwnd is restored to at least prior_cwnd, pacing is temporarily overridden with kcc_high_gain_val for rapid pipe refill.

Recovery & Loss

On TCP_CA_Loss: full_bw and full_bw_cnt reset, round_start set to 1, packet_conservation cleared to 0. If LT BW is not active, injects a synthetic loss event to trigger LT sampling.
Recovery entry (TCP_CA_Recovery): packet_conservation enabled, cwnd = inflight + acked.
Recovery exit: restored to prior_cwnd, packet_conservation cleared.
kcc_undo_cwnd(): resets full_bw and full_bw_cnt (preserving full_bw_reached), clears LT BW state.

Round Detection (BBR Alignment)

next_rtt_delivered is initialized to 0 (matching stock BBR; Cardwell et al. 2016), so the first ACK immediately starts round 1 detection without a synthetic offset. Round boundaries are detected when prior_delivered >= next_rtt_delivered, with the interval_us <= 0 guard matching BBR's delivered < 0 || interval_us <= 0 — catching both zero and negative intervals that would otherwise corrupt the measurement pipeline.

Core Measurements

Bandwidth Estimation

Sliding-window max bandwidth filter (minmax_running_max from linux/win_minmax.h) over kcc_bw_rt_cycle_len (default 10) rounds. Instantaneous bw = delivered × BW_UNIT / interval_us computed per ACK. Fed into the sliding window only when not app-limited or when bw ≥ current max (BBR rule).

When lt_use_bw is active, the active bandwidth estimate switches to lt_bw (Long-Term bandwidth estimate).

Kalman Filter

Single-state scalar Kalman recursion (O(1) complexity):

Predict:
  x_pred = x_est          (identity state transition)
  p_pred = p_est + Q      (covariance prediction)

Update:
  innov   = z − x_pred    (innovation)
  K       = p_pred / (p_pred + R)   (Kalman gain [0,1])
  x_est   = x_pred + K × innov      (state update)
  p_est   = (1 − K) × p_pred        (posterior covariance)

Adaptive process noise Q:

Q_base   = kcc_kalman_q (default 100)
q_factor = max(kcc_kalman_q_min_factor_val, min_rtt_us / kcc_kalman_q_rtt_div)
Q        = min(Q_base × q_factor, Q_base × kcc_kalman_q_scale_cap)
Q        = min(Q, kcc_kalman_q_max)

Adaptive measurement noise R:

R = R_base + max(0, jitter_ewma − kcc_jitter_r_thresh_us) × R_base / kcc_jitter_r_scale
R = min(R, R_base × kcc_kalman_r_max_boost)

Q-Boost path-change detection: when |innovation| > kcc_kalman_q_boost_thresh_val (default ≈ 4 ms RTT shift) AND the filter has converged (p_est ≤ kcc_kalman_converged_p_est_val, default 500), p_est is reset to kcc_kalman_p_est_init_val, boosting Kalman gain toward 1.0 for rapid convergence. A cooldown of kcc_kalman_qboost_cdwn (default 15) samples between successive qboost events prevents runaway triggering on lossy paths with high RTT jitter.

Outlier gating: dynamic threshold dyn_thresh = max(outlier_ms × 1000 × scale, jitter_ewma × outlier_jitter_mult × scale). Applied only when p_pred ≤ kcc_kalman_converged_p_est_val. After kcc_kalman_max_consec_reject (default 25) consecutive rejections, the next sample is force-accepted to prevent self-reinforcing lock-in.

Covariance-matched noise estimation (BBR-S): q_est = (1−α) × q_est + α × (K × innov)², r_est = (1−β) × r_est + β × max(0, innov² − p_pred). Combination mode: mode 0 = heuristic only, mode 1 = max (default), mode 2 = weighted blend.

Kalman takeover: when x_est > 0 and sample_cnt ≥ kcc_kalman_min_samples (default 5), min_rtt_us is replaced by x_est / kcc_kalman_scale. min_rtt_stamp is not updated — PROBE_RTT interval trigger remains independent.

Model RTT strategy: The RTT estimate used for BDP calculation is controlled by kcc_rtt_mode. In FILTER mode (default), model_rtt = x_est_us directly — the Kalman/sliding-window estimate is used without clamping. In MIN mode, model_rtt = min(x_est_us, min_rtt_us) — the Kalman estimate is clamped against the windowed minimum to guarantee BDP never inflates. The FILTER default is recommended for production WAN/VPS deployments where path latency can change abruptly (BGP reroutes, LEO handovers, mobile cell switches). See Model RTT Strategy.

BBR Enhancements

Gain Decay

Enabled by the 256-bit bitmap kcc_cycle_decay_mask[] for specific PROBE_BW phases. Decay formula (on accepted Kalman sample):

max_red       = probe_gain − BBR_UNIT
conf_scale    = inverse scaling of p_est (BBR_UNIT at full)
qdelay_decay  = min(max(0, qdelay_avg − qthresh) × BBR_UNIT / qscale, max_red)
                    × conf_scale / BBR_UNIT
jitter_decay  = min(max(0, jitter_ewma − jthresh) × BBR_UNIT / jscale, remaining)
                    × conf_scale / BBR_UNIT
effective     = max(probe_gain − qdelay_decay − jitter_decay, BBR_UNIT)

Kalman confidence scaling: when p_est > kcc_kalman_converged_p_est, decay is proportionally reduced, avoiding excessive backoff when the filter is uncertain.

ECN Backoff

Activation conditions (all must hold):

kcc_ecn_enable_val != 0
Kalman converged (p_est < converged, sample_cnt >= min_samples)
ecn_ewma > 0 (CE marks observed)
qdelay_avg > kcc_ecn_qdelay_thresh_us_val (default 2000 µs)
Mode is NOT PROBE_BW (cwnd_gain is fixed at 2x in PROBE_BW)

During probing phases (pacing_gain > BBR_UNIT), ECN backoff is graduated by BBR_UNIT² / pacing_gain — ~80% of backoff at 1.25x probe, ~65% at 2.89x STARTUP gain.

ECN mark ratio EWMA: updated on round boundaries by kcc_ecn_ewma_retained / kcc_ecn_ewma_total (default 3/4), with gentle per-ACK decay of kcc_ecn_idle_decay_num / kcc_ecn_idle_decay_den (default 31/32) on each ACK with no new CE marks.

Single-Flow Detection

When KCC detects the flow is likely alone on the bottleneck (low queue delay, low jitter, no ECN marks, no ACK aggregation, no LT bandwidth), it automatically transitions to a BBR-pure mode:

kcc_get_model_rtt() returns min_rtt_us directly (bypassing the Kalman smoothed estimate, which has a small positive bias from one-sided measurement noise).
kcc_ecn_backoff() is skipped when kcc_alone_bypass_ecn = 1 (default), matching BBR's zero-ECN behavior. On a single-flow path there is no competing sender to share ECN marks with — any marks are false positives. Set kcc_alone_bypass_ecn = 0 to keep ECN backoff active even when alone (conservative).
LT BW (policer-detected rate limit) qualification is configurable via kcc_alone_bypass_lt_bw (default 1). A single-flow path has no policer, so LT BW cannot legitimately activate. Setting it to 1 prevents spurious alone-mode exit from false LT BW triggers. Set to 0 for original strict behavior.

This eliminates the single-flow throughput gap between KCC and BBR while preserving KCC's full protection loop (Kalman, ECN backoff, gain decay, LT bandwidth) for multi-flow scenarios.

Hysteresis: Entry requires kcc_alone_confirm_rounds (default 3) consecutive qualifying rounds — preventing oscillation during brief quiet periods in multi-flow competition ("conservative to accelerate"). Exit is immediate — any qualification failure clears the flag and resets the confirmation counter ("aggressive to brake"). The confirmation counter is a u8 bounded to KCC_ALONE_CONFIRM_CNT_MAX (255, compile-time).

Qualification conditions (all six must hold on a round boundary): 0. Kalman converged (sample_cnt >= kcc_kalman_min_samples) — trust qdelay/jitter as queue signals

qdelay_avg < kcc_alone_qdelay_thresh_us (default 1000 us) — near-empty queue
jitter_ewma < kcc_alone_jitter_thresh_us (default 2000 us) — ACK-clock micro-jitter only
ecn_ewma == 0 — no congestion marks from AQM
lt_use_bw == 0 — not in policer-detected rate-limited mode
agg_state <= max per kcc_alone_agg_state_level (default 1) — three-tier configurable:
- 0 = IDLE only (strictest), 1 = ≤ SUSPECTED (default), 2 = ≤ CONFIRMED (most permissive)

Dynamic PROBE_RTT Interval

Maps Kalman p_est to a per-connection PROBE_RTT interval:

p_est ≤ converged:              interval = dyn_max (default 30s)
p_est ≥ high (= mult × conv):   interval = base (default 10s)
converged < p_est < high:       linear interpolation

Reduces PROBE_RTT frequency when confidence is high (low p_est), lowering throughput jitter on stable paths. Reverts to classic 10-second interval when confidence is low.

Per-flow entry jitter: To prevent all co-existing flows from entering PROBE_RTT simultaneously (draining to 4 pkts aggregate ~1.8 Mbps then refilling at 2.89×), each flow adds a hash-derived jitter (0–845 ms spread) to its PROBE_RTT interval. At most ~1 flow is in PROBE_RTT at any instant, eliminating the RTO-inducing simultaneous drain/refill collapse.

PROBE_RTT Decoupling & Smart Recalibration

BBRv1's PROBE_RTT mechanism drains the pipe to 4 packets every ~10 seconds to measure min_rtt_us. This is necessary for a window-based min-RTT estimator — the window cannot distinguish propagation delay from queueing delay unless the pipe is empty. The cost is a periodic throughput cliff (the BBR "sawtooth").

In FILTER mode, the Kalman filter replaces the window entirely. It can separate queueing noise from true propagation delay through outlier gating and adaptive noise estimation — no pipe drain required. The parameter kcc_probe_rtt_decouple (default 1) controls this:

Mode	Value	Behavior
Decoupled	1 (default)	Kalman healthy (p_est ≤ `kcc_recal_p_est_thresh`): suppress PROBE_RTT entirely → zero throughput cliffs, zero sync collapses. Kalman diverged (p_est > threshold): auto-trigger traditional PROBE_RTT as a safety net → restores filter baseline, then decoupling resumes.
Traditional	0	Blind periodic PROBE_RTT every ~10s (BBR-compatible).

Smart recalibration heuristic (kcc_kalman_needs_recalibration()): In steady-state operation on a stable path, the Kalman error covariance p_est converges to p_est_floor (~4–10), far below the threshold kcc_recal_p_est_thresh (250,000 = 25% of p_est_max). A rising p_est signals that the filter's internal model no longer explains observations — typically because the path has materially changed. When p_est exceeds the threshold, a single traditional PROBE_RTT drain restores the filter baseline; the Kalman re-converges and decoupling resumes automatically.

This transforms PROBE_RTT from a blind periodic self-mutilation into an intelligent confidence-driven recalibration — the protocol only drains the pipe when it has empirical evidence that the filter has lost confidence.

Requires kcc_rtt_mode == 1. No-op in MIN mode (MIN mode depends on PROBE_RTT to refresh min_rtt_us).

Parameter	Default	Range	Description
`kcc_probe_rtt_decouple`	1	0–1	Enable PROBE_RTT decoupling (FILTER mode only)
`kcc_recal_p_est_thresh`	250,000	1–100,000,000	p_est threshold for triggered recalibration safety net

LT Bandwidth Estimation

Loss-triggered lower-bound estimator. Sampling interval spans [4, 16] RTTs. Valid when loss ratio ≥ 5.9% (kcc_lt_loss_thresh default 15/256). Bandwidth bw = delivered × BW_UNIT / interval_us.

Unlike BBR's simple average ((bw + lt_bw) >> 1), KCC uses a configurable EMA (kcc_lt_bw_ema_num / kcc_lt_bw_ema_den, default 1/2 = 0.5):

lt_bw = (bw_new × en + lt_bw × (ed − en)) / ed

Activation differs from BBR: KCC stores lt_bw on the first valid interval but does NOT set lt_use_bw; consistency with a previous interval is required — reduces false activation from measurement noise.

Dual-threshold congestion gate: Before setting lt_use_bw = 1, both a persistent EWMA queue check (qdelay_avg > kcc_ecn_qdelay_thresh_us_val) AND an instantaneous SRTT-based queue check (srtt_us − min_rtt_us > kcc_lt_bw_inst_qdelay_thresh_us, default 5000 µs) are evaluated. When congestion is detected, LT BW sampling is aborted. The SRTT check works without ext allocation, providing a safety net against allocation failure.

LT BW probe boost (kcc_lt_bw_probe_pct, default 10%): amplifies pacing_gain by 1 + probe_pct/100 across all PROBE_BW phases. Ramp component: +1% per 8 RTTs increase, capped at 2 × probe_pct.

LT BW auto-recovery (kcc_lt_restore_ratio_num/den, default 5/4 = 1.25x): when max_bw > lt_bw × ratio for kcc_lt_restore_consec_acks (default 3) consecutive ACKs, LT BW automatically exits and normal PROBE_BW probing resumes. The lt_restore_cnt counter is a 5-bit field bounded to KCC_LT_RESTORE_CNT_MAX (31, compile-time).

ACK Aggregation Confidence-Based Compensation (BBRplus-inspired)

Adds a confidence-gated second layer over the traditional dual-slot extra-acked estimator.

Four orthogonal factors (each contributes kcc_agg_factor_weight points, default 256):

Kalman converged (p_est < converged + sample_cnt >= min_samples)
Not in loss recovery (icsk_ca_state < TCP_CA_Recovery)
RTT within min_rtt_us + kcc_agg_factor3_qdelay_us (default 2ms) of true propagation delay
extra_acked within kcc_agg_factor4_ratio_num/den (default 1.5x) of windowed maximum

Four states: IDLE (< kcc_agg_thresh_suspected=256), SUSPECTED (≥256), CONFIRMED (≥512), TRUSTED (≥768).

Signal layer (always active): confidence linearly interpolates R scaling factor [r_min, r_max]. R rises instantly (fast response), decays at kcc_agg_r_hysteresis% (default 75% retained, ~4 RTTs to baseline) per RTT.

Control layer (agg_state ≥ CONFIRMED): five-layer safety-gated cwnd compensation:

Blocks if queue delay > kcc_agg_safety_qdelay_us (default 4ms)
Blocks during loss recovery
Blocks if cwnd > BDP × kcc_agg_safety_bdp_mult (default 3x)
Blocks if inflight > safe cwnd + TSO segs goal
Watchdog: demotes CONFIRMED→SUSPECTED after kcc_agg_max_comp_duration (default 8) consecutive RTTs

Drain qdelay_avg Reset

On transition to DRAIN, qdelay_avg is reset to zero, preventing the STARTUP queue estimate from persisting into PROBE_BW.

TSO Divisor Adaptation

kcc_min_tso_segs() adjusts the rate threshold divisor based on Kalman state:

Kalman converged + jitter_ewma < 1000 µs: divisor halved (8→4), larger TSO bursts
jitter_ewma > 4000 µs: divisor doubled (8→16), smaller TSO bursts to suppress jitter

Pacing Rate & Cwnd

Pacing Rate

rate = bw × mss × pacing_gain >> BBR_SCALE      // gain adjustment
rate = rate × USEC_PER_SEC >> BW_SCALE            // convert to bytes/s
rate = rate × margin_div / 100                    // pacing margin (default 1%, matching BBR)

Rate changes are applied immediately (no smoothing), matching BBR (Cardwell et al. 2016). After full_bw_reached: all rate changes written immediately. In STARTUP/DRAIN: only increases applied (rate > sk_pacing_rate).

Cwnd

target = BDP(bw, gain, ext)                       // base BDP
// inflight bounds (non-STARTUP: lo~hi clamp; STARTUP: lo floor only)
target = quantization_budget(target)              // TSO headroom + even-round + phase-0 bonus
target += ack_agg_bonus + agg_compensation        // ACK aggregation compensation

// cwnd progression
if full_bw_reached:
    cwnd = min(cwnd + acked, target)              // converge to target
else (STARTUP):
    cwnd = cwnd + acked                          // exponential growth

cwnd = max(cwnd, cwnd_min_target)                 // absolute floor 4
PROBE_RTT mode: cwnd = min(cwnd, cwnd_min_target) // minimum inflight

Module Parameters

Parameters are exposed under /proc/sys/net/kcc/. Writes trigger kcc_init_module_params() (validation + clamping + derived value computation). Array parameter writes trigger kcc_rebuild_gain_table().

PROBE_RTT Intervals

Parameter	Default	Min	Max	Unit	Description
`kcc_probe_rtt_base_sec`	10	1	86400	s	Base PROBE_RTT interval
`kcc_probe_rtt_max_sec`	15	1	86400	s	Upper cap for long-RTT paths
`kcc_probe_rtt_dyn_max_sec`	30	0	86400	s	Max dynamic interval; 0 disables

Gains

Parameter	Default	Min	Max	Description
`kcc_cwnd_gain_num` / `kcc_cwnd_gain_den`	2 / 1	0/1	100k	Baseline cwnd gain for PROBE_BW
`kcc_extra_acked_gain_num` / `kcc_extra_acked_gain_den`	1 / 1	0/1	100k/100k	ACK aggregation bonus multiplier
`kcc_high_gain_num` / `kcc_high_gain_den`	2885 / 1000	0/1	100k	STARTUP gain (≈2.885x)
`kcc_drain_gain_num` / `kcc_drain_gain_den`	347 / 1000	0/1	100k	DRAIN gain (≈0.347x)
`kcc_inflight_low_gain_num` / `kcc_inflight_low_gain_den`	125 / 100	0/1	100k	Inflight lower bound (1.25x BDP)
`kcc_inflight_high_gain_num` / `kcc_inflight_high_gain_den`	200 / 100	0/1	100k	Inflight upper bound (2.0x BDP)
`kcc_gain_num[i]` / `kcc_gain_den[i]`	BBRv1 pattern (256 slots)	0/1	—	Per-slot pacing gain
`kcc_cycle_decay_mask[8]`	0 (all zero)	0	0x7FFFFFFF	256-bit decay bitmap
`kcc_probe_bw_up_limit`	0	0	1	Limit PROBE_BW up-phase exit (0=off)

Kalman Base

Parameter	Default	Min	Max	Description
`kcc_kalman_q`	100	0	100k	Base process noise Q
`kcc_kalman_r`	400	0	100k	Base measurement noise R
`kcc_kalman_p_est_max`	1,000,000	1	100M	p_est absolute max
`kcc_kalman_converged_p_est`	500	1	1M	Convergence threshold
`kcc_kalman_p_est_init`	1000	1	10M	Initial p_est
`kcc_kalman_p_est_floor`	10	1	100k	p_est floor
`kcc_kalman_scale`	1024	64	1,048,576	Fixed-point scale (power of two)
`kcc_kalman_min_samples`	5	3	20	Min samples before takeover
`kcc_kalman_outlier_ms`	5	0	10000	ms
`kcc_kalman_q_boost_mult`	4	1	10000	Q-boost multiplier
`kcc_kalman_q_boost_ms`	1	0	5000	ms
`kcc_kalman_qboost_cdwn`	15	1	255	samples
`kcc_kalman_q_max`	2000	1	100k	Q ceiling
`kcc_kalman_q_scale_cap`	20	1	10000	Q scale cap
`kcc_kalman_max_consec_reject`	25	1	1000	Max consecutive rejections before force-accept
`kcc_rtt_sample_max_us`	500000	1	10M	µs
`kcc_kalman_r_max_boost`	8	1	1000	R max boost multiplier
`kcc_kalman_rtt_dyn_mult`	2	1	100	RTT dynamic ceiling multiplier
`kcc_kalman_q_rtt_div`	1000	1	1M	Q adaptation RTT divisor
`kcc_kalman_probe_band_mult`	4	1	32	PROBE_RTT transition band multiplier

Kalman Extras (num/den type)

Parameter	Default	Range	Description
`kcc_kalman_outlier_jitter_mult_num/den`	4 / 1	0-1000 / 1-100k	Outlier jitter multiplier
`kcc_kalman_q_min_factor_num/den`	10 / 1	0-1000 / 1-100k	Q min factor
`kcc_kalman_p_est_init_rtt_div_num/den`	10 / 1	1-100k / 1-100k	p_est init RTT divisor

BBR-S Noise Estimation

Parameter	Default	Range	Description
`kcc_kalman_noise_alpha_num/den`	1 / 10	0-100 / 1-100k	Q estimate learning rate
`kcc_kalman_noise_beta_num/den`	1 / 10	0-100 / 1-100k	R estimate learning rate
`kcc_kalman_noise_mode`	1	0-2	Combine mode (0=off, 1=max, 2=weighted avg)
`kcc_kalman_q_est_max`	1,000,000,000	1-2B	Q estimate upper bound
`kcc_kalman_r_est_max`	1,000,000,000	1-2B	R estimate upper bound
`kcc_kalman_q_est_floor` / `r_est_floor`	1	1-100k	Lower bound per estimate

Gain Decay (Probing)

Parameter	Default	Range	Unit	Description
`kcc_qdelay_probe_thresh_us`	5000	0-100k	µs	qdelay decay threshold
`kcc_qdelay_probe_scale_us`	20000	1-100k	µs	qdelay decay scale
`kcc_jitter_probe_thresh_us`	4000	0-100k	µs	Jitter decay threshold
`kcc_jitter_probe_scale_us`	16000	1-100k	µs	Jitter decay scale

Adaptive R (Jitter-Driven)

Parameter	Default	Range	Unit	Description
`kcc_jitter_r_thresh_us`	2000	0-100k	µs	Jitter threshold for R increase
`kcc_jitter_r_scale`	8000	1-100k	—	R increase scale divisor

ECN

Parameter	Default	Range	Description
`kcc_ecn_enable`	1	0-1	ECN master switch
`kcc_ecn_backoff_num` / `kcc_ecn_backoff_den`	20 / 100	0-100 / 1-100k	ECN backoff fraction
`kcc_ecn_qdelay_thresh_us`	2000	0-100k	µs
`kcc_ecn_ewma_retained` / `kcc_ecn_ewma_total`	3 / 4	0-100 / 1-100k	ECN EWMA weights
`kcc_ecn_idle_decay_num` / `kcc_ecn_idle_decay_den`	31 / 32	1-100k	Idle ECN decay

min_rtt

Parameter	Default	Range	Description
`kcc_minrtt_fast_fall_cnt`	3	0-3	Fast-fall count
`kcc_minrtt_fast_fall_div`	4	1-256	Fast-fall threshold divisor
`kcc_minrtt_sticky_num` / `kcc_minrtt_sticky_den`	75 / 100	0-1000 / 1-100k	Sticky fall ratio
`kcc_minrtt_srtt_guard_num` / `kcc_minrtt_srtt_guard_den`	90 / 100	0-1000 / 1-100k	SRTT guard ratio

LT Bandwidth

Parameter	Default	Range	Description
`kcc_lt_intvl_min_rtts`	4	1-127	RTTs
`kcc_lt_intvl_max_mult`	4	1-32	Interval timeout multiplier
`kcc_lt_loss_thresh`	15	1-65535	BBR_UNIT
`kcc_lt_bw_ratio_num` / `kcc_lt_bw_ratio_den`	1 / 8	0-100k / 1-100k	Relative tolerance
`kcc_lt_bw_diff`	500	0-100k	bytes/s
`kcc_lt_bw_max_rtts`	48	1-4094	RTTs
`kcc_lt_bw_probe_pct`	10	0-100	%

LT Auto-Recovery

Parameter	Default	Range	Description
`kcc_lt_restore_ratio_num` / `kcc_lt_restore_ratio_den`	5 / 4	0-100k / 1-100k	Recovery trigger ratio
`kcc_lt_restore_consec_acks`	3	1-31	Trigger consecutive ACK count

ACK Aggregation Confidence

Parameter	Default	Range	Description
`kcc_agg_enable`	1	0-1	Master switch
`kcc_agg_confidence_thresh`	512	0-10000	cwnd compensation confidence threshold
`kcc_agg_max_comp_ratio`	75	0-100	% of BDP
`kcc_agg_max_comp_duration`	8	1-128	RTTs
`kcc_agg_r_hysteresis`	75	0-100	%
`kcc_agg_r_multiplier_min` / `kcc_agg_r_multiplier_max`	256 / 2048	1-10000	R scaling range (256=1x)
`kcc_agg_factor3_qdelay_us`	2000	0-100k	µs
`kcc_agg_factor4_ratio_num` / `kcc_agg_factor4_ratio_den`	3 / 2	1-100k	Factor 4 ratio
`kcc_agg_safety_qdelay_us`	4000	0-100k	µs
`kcc_agg_safety_bdp_mult`	3	1-100	Safety guard BDP multiplier
`kcc_agg_max_window_ms`	100	1-10000	ms
`kcc_agg_max_decay_pct`	75	0-100	%
`kcc_agg_window_rotation_rtts`	5	1-65535	RTTs
`kcc_agg_factor_weight`	256	1-1024	Per-factor score
`kcc_agg_confidence_max`	1024	256-65535	Max confidence

EWMA Coefficients

Parameter	Default	Range	Description
`kcc_ewma_qdelay_num` / `kcc_ewma_qdelay_den`	7 / 8	0-100 / 1-100k	qdelay EWMA weight
`kcc_ewma_jitter_num` / `kcc_ewma_jitter_den`	7 / 8	0-100 / 1-100k	Jitter EWMA weight

Miscellaneous

Parameter	Default	Range	Description
`kcc_probe_bw_cycle_len`	8	2-256	PROBE_BW cycle phases (power-of-two)
`kcc_probe_bw_cycle_rand`	8	1-cycle_len	Cycle phase random offset
`kcc_full_bw_thresh_num` / `kcc_full_bw_thresh_den`	125 / 100	0-100k / 1-100k	STARTUP exit growth threshold
`kcc_full_bw_cnt`	3	1-3	Non-growth rounds to exit
`kcc_probe_rtt_mode_ms_num` / `kcc_probe_rtt_mode_ms_den`	200 / 1	1-100k	PROBE_RTT stay duration
`kcc_pacing_margin_num` / `kcc_pacing_margin_den`	1 / 100	0-50 / 1-100k	Pacing margin (1% = BBR parity, 0 = none)
`kcc_probe_cwnd_bonus`	2	0-100	segs
`kcc_bw_rt_cycle_len`	10	2-256	rounds
`kcc_cwnd_min_target`	4	1-1000	segs
`kcc_bdp_min_rtt_us`	1	0-100k	µs
`kcc_edt_near_now_ns`	1000	0-10M	ns
`kcc_min_tso_rate`	1,200,000	1-1B	bytes/s
`kcc_min_tso_rate_div`	8	1-256	TSO rate divisor (adaptive base)
`kcc_tso_max_segs`	127	1-65535	segs
`kcc_tso_headroom_mult`	3	0-1000	TSO headroom multiplier
`kcc_sndbuf_expand_factor`	3	2-100	Send buffer expansion factor
`kcc_ack_epoch_max`	0xFFFFF	64K-2G	bytes
`kcc_extra_acked_max_ms_num` / `kcc_extra_acked_max_ms_den`	150 / 1	0-100k / 1-100k	Max ACK agg window
`kcc_probe_rtt_long_rtt_us`	20000	0-10M	µs
`kcc_probe_rtt_long_interval_div`	1	1-1000	Long-RTT interval divisor
`kcc_drain_skip_qdelay_us`	1000	0-100k	µs
`kcc_alone_confirm_rounds`	3	1-32	rounds
`kcc_alone_qdelay_thresh_us`	1000	0-100k	µs
`kcc_alone_jitter_thresh_us`	2000	0-100k	µs
`kcc_alone_agg_state_level`	1	0-2	—
`kcc_alone_bypass_ecn`	1	0-1	—
`kcc_alone_bypass_lt_bw`	1	0-1	—

Data Path

ACK Arrives (rate_sample)
    │
    ▼
kcc_main()
    │
    ├──► ACK agg confidence pipeline (when kcc_agg_enable)
    │      measure → evaluate → state → watchdog
    │      ├── Signal layer: Kalman R scaling (always active)
    │      └── Control layer: cwnd compensation (CONFIRMED+)
    │
    ├──► kcc_update_model()
    │      ├── kcc_update_bw()              sliding-window max BW
    │      ├── kcc_update_ecn_ewma()        ECN-CE mark ratio
    │      ├── kcc_update_ack_aggregation()  dual-window extra_acked
    │      ├── kcc_update_cycle_phase()     PROBE_BW phase advance
    │      ├── kcc_check_full_bw_reached()  STARTUP exit detection
    │      ├── kcc_check_drain()            DRAIN entry/exit + drain skip
    │      ├── kcc_update_min_rtt()         Kalman + window min-RTT + PROBE_RTT
    │      ├── Mode-specific gain assignment
    │      └── kcc_alone_on_path_eval()     single-flow detection (round boundary)
    │
    ├──► kcc_apply_cwnd_constraints()
    │      └── kcc_ecn_backoff()            ECN backoff (cwnd_gain only)
    │
    ├──► kcc_set_pacing_rate()              immediate, BBR rule
    │
    └──► kcc_set_cwnd()                    BDP + bounds + agg compensation

Kalman Filter Internal Flow

RTT sample (rtt_us)
    │
    ├── Invalid (≥0 and < dynamic_max)? No → discard
    │
    ├── Cold start (sample_cnt==0)? Yes → init: x_est=z, p_est=max(p_init, rtt_us/div)
    │                                           (bypasses RTT max gate)
    │
    ├── Adaptive Q: Q_base × max(q_min_factor, min_rtt_us / q_rtt_div)
    │   Adaptive R: R_base + max(0, jitter − jr_thresh) × R_base / jr_scale
    │
    ├── Innovation: innov = z − x_est
    │
    ├── Q-Boost: |innov| > boost_thresh && p_est ≤ converged && cooldown expired?
    │   ├── Yes: p_est = p_est_init, cooldown = 15, mark qboost_fired
    │   └── No:  cooldown-- if active
    │
    ├── Predict: p_pred = p_est + Q
    │
    ├── Outlier gate: |innov| > dyn_thresh && p_pred ≤ converged?
    │   ├── Yes & reject_cnt < max → reject, ++consec_reject_cnt, return
    │   └── Yes & reject_cnt ≥ max → force-accept (anti-lock)
    │
    └── Kalman update:
         ├── K = p_pred / (p_pred + R)
         ├── x_est += K × innov (clamped non-negative)
         ├── p_est = max(p_floor, (1 − K) × p_pred)
         ├── Jitter EWMA update
         ├── qdelay EWMA update
         ├── BBR-S covariance-matched noise estimation
         └── sample_cnt++

Diagnostics

BBR-compatible diagnostic interface via ss -i (INET_DIAG_BBRINFO):

bbr_bw_lo/bbr_bw_hi: 64-bit bandwidth estimate (bytes/s)
bbr_min_rtt:         current min_rtt_us
bbr_pacing_gain:     current pacing gain (BBR_UNIT, 256=1.0x)
bbr_cwnd_gain:       current cwnd gain (BBR_UNIT)

Usage

# Compile kernel module
make

# Dev load (insmod, no dependency resolution)
sudo make load

# Install and formal load (modprobe)
sudo make install
sudo make modload

# Unload
sudo make unload

# Select KCC algorithm
echo KCC > /proc/sys/net/ipv4/tcp_congestion_control

Parameter configuration is via /proc/sys/net/kcc/. For example:

# Enable gain decay on specific PROBE_BW phases
echo 1 > /proc/sys/net/kcc/kcc_cycle_decay_mask

# Adjust ECN backoff sensitivity
echo 30 > /proc/sys/net/kcc/kcc_ecn_backoff_num

Concurrency & Safety Model

KCC deliberately does not use READ_ONCE/WRITE_ONCE or RCU for its own data structures. This design is consistent with all in-kernel CC modules such as BBR and CUBIC.

kcc_init() executes in process context (during socket creation), before the socket is exposed to any softirq. kcc_release() executes after the kernel guarantees no softirq is still processing this socket's ACKs. A transient stale value of a global module parameter affects at most one ACK, corrected at the next ACK.

The only exception: sk->sk_pacing_rate / sk->sk_pacing_shift are socket-layer fields that userspace can modify simultaneously via setsockopt, so BBR's WRITE_ONCE/READ_ONCE convention is preserved.

Performance Summary

Test environment: China → US LAX, 212ms RTT, 8 parallel flows, 26% packet loss, 1 Gbps shared VPS bottleneck.

Metric	KCC v1.0	BBR (control)	Delta
Average throughput	1,010 Mbps	937 Mbps	+7.8%
Intra-KCC unfairness	3.1×	6.2× (BBR)	−50%
Worst single flow	60.6 Mbps	30.8 Mbps	+97%
Retransmits	150K/10s	137K/10s	+9.5%
R3 stability	959 Mbps	883 Mbps	+8.6%

Retransmits are slightly higher — a trade-off consistent with maintaining high link utilisation under loss. KCC's Kalman-augmented min_rtt estimation provides a more accurate BDP baseline, allowing the algorithm to sustain higher throughput than BBRv1 on the same path.

Global Kalman BDP — Cross-Connection Bandwidth Injection

KCC v1.0 includes an optional cross-connection Global Kalman Filter that estimates the server's steady-state bottleneck bandwidth. This estimate is used to bootstrap new connections at a conservatively low "dessert speed" — fast enough to skip cold-start ramp-up, slow enough to avoid overshoot.

Design Principle

The filter is fed with bandwidth samples from the PROBE_BW cruise phase (gain = 1.0×) of all KCC connections. Cruise-phase samples are the cleanest signal of true available bandwidth — no 1.25× probe overshoot, no 0.75× drain undershoot. A one-dimensional random-walk Kalman filter (Kalman 1960) tracks the global steady state.

When a new connection is established, the filter's estimate is used to seed:

Injected value	Purpose
`minmax` (max_bw tracker)	Seed the sliding-window bandwidth history so the first few dirty ACK samples don't drag it to zero
`sk_pacing_rate`	Initial pacing rate at neutral gain (BBR_UNIT); STARTUP's 2.89× gain is applied on the first ACK
`tp->snd_cwnd`	Initial congestion window computed via `kcc_bdp()` at neutral gain

A defensive floor in kcc_update_bw prevents the first few RTTs of low delivery-rate samples from overwriting the injected estimate during STARTUP. A full-BW guard in kcc_check_full_bw_reached prevents the iperf3 control-message exchange from prematurely terminating STARTUP.

Dessert-Speed Discount Ratio

The effective injection speed is:

coeff = (discount_ratio) / high_gain
      = (num / den) / 2.89

where high_gain ≈ 2.89 is the BBR STARTUP pacing multiplier.

num	coeff	characteristic
35	12.1%	maximum safety, worst-path
50	17.3%	centre axis (default)
75	25.9%	mathematical dessert sweet spot
80	27.6%	mathematical rate ceiling (should not exceed)

Note: tcp_write_xmit enforces an initial CWND of TCP_INIT_CWND (10 segments, ≈15 KB) for every new connection. CWND only grows when remote ACKs arrive, so the dessert speed is an upper bound on pacing rate — actual throughput is CWND-limited until sufficient ACKs have been received to open the window.

Configuration

Enable via sysctl:

sysctl -w net.kcc.kcc_kf_enable=1           # master enable (default 0)
sysctl -w net.kcc.kcc_kf_discount_num=50   # dessert-speed numerator (default 50, range 0–100, recommended 35–75)

Key sysctl parameters (/proc/sys/net/kcc/):

Parameter	Default	Range	Description
`kcc_kf_enable`	0	0–1	Master enable for global Kalman BDP injection
`kcc_kf_discount_num`	50	0–100	Dessert-speed numerator (% of fair-share BW)
`kcc_kf_discount_den`	100	1–100000	Dessert-speed denominator
`kcc_kf_startup_r_pct`	20	1–100	Measurement noise R% during startup phase
`kcc_kf_steady_r_pct`	5	1–100	Measurement noise R% during steady-state
`kcc_kf_q_shift`	20	0–30	Process noise shift (Q = 1 << shift)
`kcc_kf_chi2_num`	384	1–100000	Chi-squared outlier gate numerator
`kcc_kf_chi2_den`	100	1–100000	Chi-squared outlier gate denominator

First-Second Performance (Trans-Pacific, 212 ms RTT)

Without KF:  2.8 Mbps  →  85 Mbps  →  622 Mbps  →  steady
With KF:     50 Mbps   →  530 Mbps  →  650 Mbps  →  steady

The first-second speed jumps from ~3 Mbps (cold-start) to ~50 Mbps (dessert-start), and convergence to steady-state is reached within 2–3 seconds. Retransmissions remain zero throughout.

How It Works

A running KCC connection enters PROBE_BW cruise phase → round-start boundary → feeds kcc_kf_update(bw, 5%) with the current delivery-rate sample.
The Kalman filter updates its estimate kcc_kf_x (a running average of steady-state bottleneck bandwidth).
When a new connection opens, kcc_init calls kcc_kf_get_init_bw(sk) which returns fair × discount / high_gain — a gain-compensated, fair-share initial bandwidth estimate.
This estimate seeds sk_pacing_rate, tp->snd_cwnd, and the minmax bandwidth tracker — the connection starts at the dessert speed rather than from zero.

Algorithm Source — On Kalman Estimation and Engineering Implementation of Global Steady-State Bandwidth in the Linux Kernel

The Global Kalman BDP filter is based on the author's article On Kalman Estimation and Engineering Implementation of Global Steady-State Bandwidth in the Linux Kernel (CC BY-SA 4.0): https://blog.csdn.net/liulilittle/article/details/161635652

KCC v1.0 — built on BBRv1 (Cardwell et al. 2016, ACM Queue) and the Kalman filter (Kalman 1960).

References

Tag	Citation / Link
BBR	Cardwell et al., "BBR: Congestion-Based Congestion Control", ACM Queue, Vol. 14 No. 5, 2016 — https://dl.acm.org/doi/10.1145/3009824
BBR-S	"BBR-S: A Low-Latency BBR Modification for Fast-Varying Connections", 2021 — https://ieeexplore.ieee.org/document/9438951
RBBR	"RBBR: A Receiver-Driven BBR in QUIC for Low-Latency in Cellular Networks", 2022 — https://ieeexplore.ieee.org/document/9703289
ERCC	"ERCC: Fine-grained RDMA Congestion Control via Kalman Filter-based Multi-bit ECN Feedback Reconstruction", 2025 — https://dl.acm.org/doi/10.1145/3769270.3770124
Linux BBR	Linux kernel BBR reference — https://github.com/torvalds/linux/blob/master/net/ipv4/tcp_bbr.c
Google BBR	BBR project page — https://github.com/google/bbr
BBRplus	"BBRplus: Adaptive Cycle Randomization, Drain-to-Target, and ACK Aggregation Compensation for BBR Convergence and Stall Prevention" — https://blog.csdn.net/dog250/article/details/80629551
IETF 101	"BBR Congestion Control Work at Google IETF 101 Update" — https://datatracker.ietf.org/meeting/101/materials/slides-101-iccrg-an-update-on-bbr-work-at-google-00

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
google		google
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
tcp_kcc.c		tcp_kcc.c

Folders and files

Latest commit

History

Repository files navigation

TCP KCC v1.0 (Kalman Congestion Control)

Design Principles

Algorithm Overview

Model RTT Strategy

State Machine

STARTUP → DRAIN

DRAIN → PROBE_BW

PROBE_BW → PROBE_RTT

PROBE_RTT → PROBE_BW

Recovery & Loss

Round Detection (BBR Alignment)

Core Measurements

Bandwidth Estimation

Kalman Filter

BBR Enhancements

Gain Decay

ECN Backoff

Single-Flow Detection

Dynamic PROBE_RTT Interval

PROBE_RTT Decoupling & Smart Recalibration

LT Bandwidth Estimation

ACK Aggregation Confidence-Based Compensation (BBRplus-inspired)

Drain qdelay_avg Reset

TSO Divisor Adaptation

Pacing Rate & Cwnd

Pacing Rate

Cwnd

Module Parameters

PROBE_RTT Intervals

Gains

Kalman Base

Kalman Extras (num/den type)

BBR-S Noise Estimation

Gain Decay (Probing)

Adaptive R (Jitter-Driven)

ECN

min_rtt

LT Bandwidth

LT Auto-Recovery

ACK Aggregation Confidence

EWMA Coefficients

Miscellaneous

Data Path

Kalman Filter Internal Flow

Diagnostics

Usage

Concurrency & Safety Model

Performance Summary

Global Kalman BDP — Cross-Connection Bandwidth Injection

Design Principle

Dessert-Speed Discount Ratio

Configuration

First-Second Performance (Trans-Pacific, 212 ms RTT)

How It Works

Algorithm Source — On Kalman Estimation and Engineering Implementation of Global Steady-State Bandwidth in the Linux Kernel

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages