AdaptiveMessageBatcher: oscillation from sensitive de-escalation + mis-classified transition reports

## Summary

While adding integration tests for `AdaptiveMessageBatcher` + `RateAwareMessageBatcher`, two interacting issues with the adaptive controller surfaced. Neither is rate-aware-specific — both also apply to `SimpleMessageBatcher` as the inner.

## Finding 1: De-escalation is easy to trigger

`DEESCALATION_HEADROOM_RATIO = 0.75` and `DEESCALATION_UNDERLOAD_THRESHOLD = 3` mean that any workload whose `processing_time_s < 0.75 * batch_length_s` for three batches in a row triggers a de-escalation step. A workload that comfortably fits its window (say, 50 % utilisation) will therefore continuously de-escalate until it overloads at a smaller window, escalates, and de-escalates again — steady oscillation instead of a stable level.

Writing `test_oscillation_preserves_messages` exposed this directly: the first draft used `processing_time=1.0` during the level-2 (2 s) run phase; the controller drifted back to level 0 within a few reports.

## Finding 2: Transition-batch classification is ambiguous

The moment `AdaptiveMessageBatcher` escalates or de-escalates, its own `batch_length_s` property returns the **new** value. But the inner batcher's currently-active batch was created with the **old** window (both `SimpleMessageBatcher` and `RateAwareMessageBatcher` document this: "the current active batch keeps its boundaries"). The next `report_batch(count, processing_time)` therefore compares a processing time that belongs to the old window against the new threshold.

Concrete consequence:

- **After escalation** (e.g., 1 s → 2 s): the first report is almost certainly `processing_time < 0.75 * 2 s`, so it's classified as "underloaded" and increments the de-escalation counter — through no fault of the workload.
- **After de-escalation** (e.g., 2 s → 1.43 s): the first report can be spuriously classified as "overloaded" by the new, smaller threshold.

Combined with finding 1, this means every escalation is *immediately followed* by a free step toward de-escalation, amplifying the oscillation tendency.

## Why both find-ings show up together

Finding 2 contributes one spurious underload report per escalation. Finding 1 converts only three such reports into a level change. So in a workload that has some variability, the controller can be nudged down just by the transition itself, not by the workload genuinely changing.

## Fix sketches (not decided)

Options to discuss:

1. **Skip classification for one cycle after a level change** — the adaptive wrapper tracks "just changed" and treats the next `report_batch` as neutral.
2. **Classify reports against the window the inner batch was actually created with** — the adaptive wrapper captures "the threshold at the time the batch started" and uses that for classification.
3. **Raise `DEESCALATION_HEADROOM_RATIO`** and/or `DEESCALATION_UNDERLOAD_THRESHOLD` so settling behavior becomes the norm. This only mitigates finding 1; finding 2 remains.
4. **Asymmetric counter reset**: after a level change, reset both consecutive counters to 0 (already done) *and* discard the first report. Covers both findings cheaply.

Option 2 is the most physically accurate but couples the adaptive wrapper more tightly to the inner's batch lifecycle. Option 4 is the smallest change and handles both findings.

## Scope / applies to

- `src/ess/livedata/core/message_batcher.py` → `AdaptiveMessageBatcher`
- Affects any inner batcher (`SimpleMessageBatcher`, `RateAwareMessageBatcher`).
- Existing `tests/core/adaptive_batching_scenarios_test.py` does not exhibit these problems because its simulation model treats `batch_length_s` as always matching the active window — the divergence between adaptive's view and the inner's view is the bug these tests don't model.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AdaptiveMessageBatcher: oscillation from sensitive de-escalation + mis-classified transition reports #877

Summary

Finding 1: De-escalation is easy to trigger

Finding 2: Transition-batch classification is ambiguous

Why both find-ings show up together

Fix sketches (not decided)

Scope / applies to

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

AdaptiveMessageBatcher: oscillation from sensitive de-escalation + mis-classified transition reports #877

Description

Summary

Finding 1: De-escalation is easy to trigger

Finding 2: Transition-batch classification is ambiguous

Why both find-ings show up together

Fix sketches (not decided)

Scope / applies to

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions