Skip to content

feat(cli): add SIGHUP config hot-reload#68

Open
abhinav-1504 wants to merge 3 commits into
optiqor:mainfrom
abhinav-1504:feat/sighup-hot-reload
Open

feat(cli): add SIGHUP config hot-reload#68
abhinav-1504 wants to merge 3 commits into
optiqor:mainfrom
abhinav-1504:feat/sighup-hot-reload

Conversation

@abhinav-1504
Copy link
Copy Markdown

@abhinav-1504 abhinav-1504 commented May 13, 2026

  • Add signal.Notify(SIGHUP) handler in internal/cli/start.go
  • Add Config.ReloadFrom() with diff in internal/config/reload.go
  • Add Engine.UpdateThresholds() for live threshold swap in internal/doctor/engine.go
  • Add ExecReload=/bin/kill -HUP in deploy/systemd/kerno.service
  • Add SIGHUP daemon phase test in scripts/verify.sh

What

Adds SIGHUP-based config hot-reload to the kerno daemon. Operators can now change config without restarting — no BPF programs are dropped.

Why

Fixes #34

How

  • start.go — goroutine listens on sighupCh, calls handleSIGHUP() on every signal
  • config/reload.goReloadFrom() re-reads config via same Viper pipeline, diffs old vs new, classifies changes as "applied" or "restart-required"
  • doctor/engine.goUpdateThresholds() with sync.RWMutex for goroutine-safe hot-swap
  • kerno.serviceExecReload=/bin/kill -HUP $MAINPID so systemctl reload kerno works

Testing

  • go build ./... passes
  • go test ./... passes
  • go vet ./... passes
  • golangci-lint run ./... passes
  • Tested locally with: sudo ./bin/kerno start + sudo kill -HUP $(pidof kerno)

Verified on WSL2 Ubuntu 24.04, kernel 5.15:

{"msg":"SIGHUP received — reloading config"}
{"msg":"applying hot-reload change","change":"log_level: "debug" → "info""}
{"msg":"config reloaded; 3 changes applied; 0 changes require restart"}
{"msg":"HTTP server restarted","addr":":9090"}

  • sudo ./bin/bpf-verify --read 5s confirms 6/6 programs still load
  • ./scripts/verify.sh daemon phase updated for SIGHUP test

Checklist

  • PR title follows Conventional Commits (feat(scope): subject)
  • All commits are DCO-signed (git commit -s)
  • No unrelated changes pulled in
  • Documentation updated where user-visible behavior changed (kerno.service comments)
  • Added/updated tests for new code paths (scripts/verify.sh daemon phase)

@abhinav-1504 abhinav-1504 requested a review from btwshivam as a code owner May 13, 2026 20:14
@github-actions github-actions Bot added testing Tests and test coverage area/doctor Diagnostic engine and rules area/ops Operations, deployment, runtime ergonomics labels May 13, 2026
@github-actions
Copy link
Copy Markdown

🚀 First PR — welcome aboard!

A few things to expect:

  1. CI: every PR runs build + race tests + lint + (eventually) the kernel matrix. If something fails, the log will tell you exactly which gate.
  2. DCO: every commit needs Signed-off-by:git commit -s adds it automatically.
  3. Conventional Commits: PR titles like feat(doctor): add new rule or fix(bpf): handle X. We squash-merge by default.
  4. Review: a maintainer will review within 72 hours. Suggestions are conversations, not orders — push back if something doesn't fit your context.

If you get stuck, reply here or jump to Discussions. We want this PR to land.

@abhinav-1504 abhinav-1504 changed the title Feat/sighup hot reload feat(cli): add SIGHUP config hot-reload May 13, 2026
- Add signal.Notify(SIGHUP) handler in internal/cli/start.go
- Add Config.ReloadFrom() with diff in internal/config/reload.go
- Add Engine.UpdateThresholds() for live threshold swap in internal/doctor/engine.go
- Add ExecReload=/bin/kill -HUP in deploy/systemd/kerno.service
- Add SIGHUP daemon phase test in scripts/verify.sh

Closes optiqor#34

Signed-off-by: Abhinav <abhinavsinghc@gmail.com>
Signed-off-by: Abhinav Singh Chauhan <abhinavsinghc48@gmail.com>
Signed-off-by: Abhinav <abhinavsinghc@gmail.com>
Signed-off-by: Abhinav Singh Chauhan <abhinavsinghc48@gmail.com>
@abhinav-1504 abhinav-1504 force-pushed the feat/sighup-hot-reload branch from d662590 to 0b0ff62 Compare May 13, 2026 20:36
Copy link
Copy Markdown
Member

@btwshivam btwshivam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good shape overall, but the CI workflows (Build / Test / Lint / Docker) never actually ran on this branch — only label and conventional-title show up. that hides several real issues: a global cfg data race, healthz reporting 0/0 after a rebind, and an anyinterface{} revert of PR #54. flag them, push a fix, and let CI verify.

Comment thread internal/cli/start.go Outdated
Comment thread internal/cli/start.go Outdated
Comment thread internal/cli/start.go Outdated
Comment thread internal/cli/start.go Outdated
Comment thread internal/config/reload.go Outdated
- use atomic.Pointer[config.Config] to eliminate cfg data race
- pass real loadedCount/totalLoaders to rebindPrometheus (fixes 0/0 healthz)
- use map[string]any instead of map[string]interface{} (PR optiqor#54 idiom)
- drive threshold update from result.Applied, not re-diffing struct
- add reload_test.go: 19 table-driven cases covering all diff() branches

Signed-off-by: Abhinav Singh Chauhan <abhinavsinghc48@gmail.com>
@abhinav-1504
Copy link
Copy Markdown
Author

@btwshivam, all issues fixed.
Ready for re-review.

@abhinav-1504 abhinav-1504 requested a review from btwshivam May 15, 2026 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/doctor Diagnostic engine and rules area/ops Operations, deployment, runtime ergonomics testing Tests and test coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Daemon: SIGHUP config hot-reload

2 participants