Skip to content

Fix cache refresh timer with a wrapper service#22

Merged
jdoss merged 1 commit intomasterfrom
fix/refresh-timer-wrapper
Apr 8, 2026
Merged

Fix cache refresh timer with a wrapper service#22
jdoss merged 1 commit intomasterfrom
fix/refresh-timer-wrapper

Conversation

@jdoss
Copy link
Copy Markdown
Contributor

@jdoss jdoss commented Apr 8, 2026

Summary

  • Adds a small psi-{provider}-refresh.service wrapper whose only job is ExecStart=/usr/bin/systemctl restart psi-{provider}-setup.service.
  • Re-points psi-{provider}-refresh.timer at the wrapper instead of at the setup unit directly.
  • Renames the timer (and gets rid of the no-longer-needed psi-{provider}-setup.timer).
  • Tests and docs updated. Verified end-to-end on a test host.

Why

PR #20 generated psi-{provider}-setup.timer pointing at the existing psi-{provider}-setup.service. That never worked for the typical case. The setup unit uses Type=oneshot with RemainAfterExit=yes so other services can depend on "setup has successfully run" without re-triggering it on every workload restart. The side effect: ActiveEnterTimestamp is set once on the first run and never updates. The timer's OnUnitActiveSec=1h anchors against that frozen timestamp, so the "next fire" is always first_run + 1h — usually already in the past — and systemd sets NextElapseUSecMonotonic=infinity and gives up.

Even when Persistent=true caught up the first overdue run, systemctl start on a oneshot that is currently in active (exited) state is a no-op, so the cache was not rewritten.

Observed on the test host:

$ systemctl show psi-infisical-setup.timer -p NextElapseUSecMonotonic
NextElapseUSecMonotonic=infinity

$ systemctl show psi-infisical-setup.service -p ActiveEnterTimestamp -p InactiveEnterTimestamp
ActiveEnterTimestamp=Wed 2026-04-08 05:52:34 UTC     # 11h before the test
InactiveEnterTimestamp=                              # never — RemainAfterExit=yes

$ ls -la /var/lib/psi/cache.enc
-rw------- 1 root root 35894 Apr  8 05:52   # didn't update after the timer fired

Fix

Generate two units per refreshable provider instead of one:

psi-{provider}-refresh.service — a plain oneshot, no RemainAfterExit:

[Unit]
Description=PSI infisical secret cache refresh
After=psi-infisical-setup.service

[Service]
Type=oneshot
ExecStart=/usr/bin/systemctl restart psi-infisical-setup.service

psi-{provider}-refresh.timer — points at the wrapper:

[Unit]
Description=Periodic PSI infisical secret cache refresh

[Timer]
Unit=psi-infisical-refresh.service
OnBootSec=1h
OnUnitActiveSec=1h
RandomizedDelaySec=5m
Persistent=true

[Install]
WantedBy=timers.target

The wrapper's ActiveEnterTimestamp moves forward on every run (because it is not RemainAfterExit=yes) so OnUnitActiveSec re-arms correctly. And systemctl restart on a RemainAfterExit=yes oneshot DOES re-run its ExecStart — unlike systemctl start which is a no-op for an already-active unit.

Verified on test host

With OnBootSec=2m / OnUnitActiveSec=2m for fast iteration:

$ ls -la /var/lib/psi/cache.enc
-rw------- 1 root root 35894 Apr  8 16:53   # rewritten by each scheduled run

$ systemctl list-timers psi-infisical-refresh.timer
NEXT                        LEFT LAST                              PASSED UNIT
Wed 2026-04-08 16:54:07 UTC  20s Wed 2026-04-08 16:52:06 UTC 1min 39s ago

The second and third scheduled runs both rewrote cache.enc and the timer re-armed correctly each time.

What changes

psi/unitgen.py

  • New generate_provider_refresh_service(provider)
  • generate_provider_setup_timer → renamed to generate_provider_refresh_timer, points at the wrapper

psi/installer.py

  • _write_refresh_timers now writes both the wrapper service AND the timer
  • Timer names changed from psi-{provider}-setup.timer to psi-{provider}-refresh.timer

Tests

  • tests/test_unitgen.py: TestProviderRefreshService (3 tests — Type=oneshot, no RemainAfterExit, correct ExecStart and After=) and TestProviderRefreshTimer (5 tests, including a regression test that asserts the timer does NOT target the setup unit directly)
  • tests/test_installer.py: updated TestWriteRefreshTimers to check both units are written, the wrapper has the right ExecStart, and the timer points at the wrapper

Docs

  • docs/secret-cache.md rotation section documents both units and the reasoning for the wrapper pattern
  • README.md FCOS generator list mentions both units
  • CLAUDE.md CLI reference note updated

Test plan

  • uv run ruff check psi/ tests/ — clean
  • uv run ruff format --check psi/ tests/ — clean
  • uv run ty check — clean
  • uv run pytest -q — 312 passed (3 new wrapper tests + renamed existing tests)
  • Manual end-to-end test on a test host with 2 minute interval — confirmed the wrapper re-runs the setup unit, cache.enc mtime updates every cycle, and the timer re-arms with a valid NEXT each time
  • Deploy the new image on the test server, confirm the timer re-arms on the default 1h cadence

PR #20 generated psi-{provider}-setup.timer pointing directly at the
existing setup unit, but that never fired more than once. The setup
service uses Type=oneshot + RemainAfterExit=yes so ActiveEnterTimestamp
is set once and never updates, and OnUnitActiveSec on the timer is
anchored to that frozen timestamp. Once the first fire happens,
'next fire' = ActiveEnterTimestamp + interval — already in the past,
so systemd sets next_elapse to infinity and the timer never re-arms.

Worse, even if it did fire again, systemctl start on a oneshot that is
currently active (exited) is a no-op, so the cache would not update.

Fix: generate a tiny wrapper psi-{provider}-refresh.service (plain
oneshot, no RemainAfterExit) that does:
    ExecStart=/usr/bin/systemctl restart psi-{provider}-setup.service

Point the timer at the wrapper. The wrapper's ActiveEnterTimestamp
moves forward every run, OnUnitActiveSec re-arms correctly, and
systemctl restart on the setup unit does re-run its ExecStart and
repopulate the cache.

Verified end-to-end on a test host with a 2 minute interval: the
second and third scheduled runs both rewrote cache.enc and the timer
showed NEXT/LEFT for the next cycle each time.
@jdoss jdoss merged commit be416d9 into master Apr 8, 2026
2 checks passed
@jdoss jdoss deleted the fix/refresh-timer-wrapper branch April 8, 2026 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant