Skip to content

EEBus: fix LPC/LPP CS failsafe-exit path#29705

Merged
andig merged 1 commit intomasterfrom
fix/eebus-failsafe-exit
May 6, 2026
Merged

EEBus: fix LPC/LPP CS failsafe-exit path#29705
andig merged 1 commit intomasterfrom
fix/eebus-failsafe-exit

Conversation

@andig
Copy link
Copy Markdown
Member

@andig andig commented May 6, 2026

Summary

Follow-up to the EEBus verification surfaced via #29701. The LPC/LPP CS state machine in hems/eebus had two regulatory-grade bugs on the failsafe-exit path:

  1. Failsafe was held for the full failsafeDurationMinimum (default 2 h) even when the heartbeat returned with an EG limit pending. Spec LPP-918/919/920 (and LPC equivalents) require leaving Failsafe immediately on HB+limit and applying that limit. The old code waited for the duration and heartbeat both to be satisfied, then dropped to a 0 W limit — ignoring whatever value the EG had sent.

  2. When failsafeDurationMinimum expired with the heartbeat still missing, the CS transitioned to StatusNormal with limit=0 — i.e. unlimited. After ≥2 h of EG outage the building became completely unprotected. Spec LPP-921 / LPC-921 say the CS shall apply a self-determined protective default in this Unlimited-autonomous state.

Restructure run() so heartbeat health (not duration) decides the exit:

  • HB returns → leave Failsafe immediately; the LPC-914/1 block below re-applies the EG limit currently stored in c.consumptionLimit / c.productionLimit (or releases if IsActive=false).
  • HB still missing → stay applying the failsafe limit. The failsafe limit is our self-determined protective default; we never silently unlimit on prolonged outage.

Also drop the c.statusUpdated = time.Now() writes in events.go. Limit updates aren't status transitions, so they shouldn't have been resetting the failsafe-duration clock — that overload would have trapped evcc in Failsafe indefinitely as long as the EG kept resending limits.

Add unit tests for the four state-machine transitions:

  • TestRun_HeartbeatLost_EntersFailsafe
  • TestRun_FailsafeStaysOnMissingHeartbeat
  • TestRun_HeartbeatReturned_AppliesFreshLimit
  • TestRun_HeartbeatReturned_NoFreshLimit

Out of scope (follow-up)

This PR does not enforce LPP-913 ("writes received in Init/Failsafe/Unlimited-autonomous shall only be evaluated within 60 s of a heartbeat"). On HB return evcc applies whatever EG limit is currently stored — including a value that may have been buffered during the outage. A separate change in the WriteApproval path is needed to filter writes by heartbeat age and CS state.

Test plan

  • go test ./hems/eebus/... passes (4 new tests)
  • go test ./hems/... passes
  • go vet ./hems/eebus/... clean
  • golangci-lint run ./hems/eebus/... 0 issues
  • gofmt -l clean
  • Manual smoke test against an SMGW to confirm immediate failsafe-exit on HB return with the latest stored EG limit

Per LPP §1.1.8 / LPC equivalents, the CS state machine must:
- on heartbeat return: leave Failsafe immediately and apply the freshly
  received EG limit (transitions 8/9/10, LPP-918/919/920)
- on prolonged heartbeat loss: keep applying a self-determined
  protective default (Unlimited-autonomous, LPP-921)

The previous run() loop:
- waited for failsafeDurationMinimum to elapse before leaving Failsafe,
  ignoring fresh limits from a healthy EG
- on duration expiry transitioned to StatusNormal with limit=0 even
  when heartbeat was still missing, dropping all grid protection

Restructure run() so heartbeat health, not duration, decides the exit:
- HB present in Failsafe -> leave immediately; the LPC-914/1 block
  applies whatever active limit the EG has sent
- HB still missing -> stay applying the failsafe limit indefinitely

Also drop the c.statusUpdated = time.Now() writes in events.go: limit
updates are not status transitions, so they should not reset the
failsafe-duration clock.

Add unit tests covering the four state-machine transitions.
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The tests initialise a real in-memory SQLite instance via db.NewInstance for each test case, which introduces global state and potential cross-test interference; consider abstracting the persistence dependency behind an interface or a test helper that reuses/cleans up the DB (e.g. via t.Cleanup) to keep these unit tests isolated and faster.
  • The failsafe exit path in run() still calls setConsumptionLimit(0) unconditionally before the LPC-914/1 block re-applies any fresh limit; if this zero-limit transition is not mandated by the spec, consider simplifying the state machine by collapsing these into a single, spec-driven limit update to reduce transient state changes and mock expectations.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The tests initialise a real in-memory SQLite instance via db.NewInstance for each test case, which introduces global state and potential cross-test interference; consider abstracting the persistence dependency behind an interface or a test helper that reuses/cleans up the DB (e.g. via t.Cleanup) to keep these unit tests isolated and faster.
- The failsafe exit path in run() still calls setConsumptionLimit(0) unconditionally before the LPC-914/1 block re-applies any fresh limit; if this zero-limit transition is not mandated by the spec, consider simplifying the state machine by collapsing these into a single, spec-driven limit update to reduce transient state changes and mock expectations.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@andig andig merged commit 3b90182 into master May 6, 2026
19 checks passed
@andig andig deleted the fix/eebus-failsafe-exit branch May 6, 2026 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant