Skip to content

feat: add supervisor abstraction with native dev fallback for Docker#63

Merged
edgarriba merged 13 commits intokornia:mainfrom
lferraz:feat/docker-container
Apr 4, 2026
Merged

feat: add supervisor abstraction with native dev fallback for Docker#63
edgarriba merged 13 commits intokornia:mainfrom
lferraz:feat/docker-container

Conversation

@lferraz
Copy link
Copy Markdown
Member

@lferraz lferraz commented Mar 31, 2026

Summary

This PR introduces a supervisor dispatcher for node/service lifecycle management and adds a native fallback backend for non-systemd environments (primarily Docker development).

The goal is to keep Linux + systemd as the supported production path, while allowing daemon/node flows to work in Docker without crashing when D-Bus/systemd is unavailable.

Changes

  • Added Supervisor dispatcher with backend detection:
    • systemd backend when D-Bus is available
    • native backend fallback when systemd is unavailable
  • Added NativeSupervisor implementation for development fallback
    • install/uninstall/start/stop/restart/status/autostart support
    • PID/config persistence under ~/.bubbaloop/procs
  • Migrated NodeManager lifecycle/status/build paths to use Supervisor instead of direct systemd coupling
  • Clarified logs behavior:
    • journalctl flows are systemd-only
    • native fallback now returns explicit guidance instead of ambiguous behavior
  • Updated docs to reflect platform intent:
    • Linux/systemd remains supported production path
    • Docker/non-systemd fallback is development-oriented, not full parity

Why

  • Fix daemon startup failures in Docker/no-systemd environments
  • Keep architecture extensible for future backends (for example launchd) without reworking NodeManager again
  • Reduce user confusion around logs and service management outside systemd

Validation

  • cargo check --bin bubbaloop
  • cargo clippy --bin bubbaloop -- -D warnings

Notes

  • This PR intentionally does not add full macOS support.
  • Native supervisor is positioned as development fallback, not a production replacement for systemd.

lferraz added 5 commits March 31, 2026 09:37
- Add Supervisor dispatcher with systemd and native backends

- Implement NativeSupervisor for Docker/non-systemd development

- Migrate NodeManager lifecycle/build/status flows to Supervisor

- Clarify systemd-only journalctl behavior in CLI and docs

- Document Linux/systemd as supported production path
- native_supervisor: 5 tests (lifecycle, autostart, stale PID, errors, signals)
- supervisor dispatcher: 2 tests (is_native, full lifecycle delegation)
- node_manager: 2 tests (get_logs native message, start/stop/restart e2e)
- lifecycle CLI: regression guard for systemd error message
- daemon_client: regression guard for systemd error message
@lferraz lferraz marked this pull request as ready for review March 31, 2026 22:32
- fix(native_supervisor): replace drop(child) with child.wait().await in
  watcher task — drop sends SIGKILL immediately, killing the spawned node
- fix(native_supervisor): redirect child stdout/stderr to
  ~/.bubbaloop/procs/{name}.{stdout,stderr} to avoid polluting daemon logs
- fix(native_supervisor): remove misleading JobRemoved event emitted on
  start_unit; refresh_node handles state update
- fix(native_supervisor): document PID-race caveat in stop_unit doc comment
- fix(docker-compose): bind ports to 127.0.0.1 instead of 0.0.0.0
- fix(docker-compose): remove unused BUBBALOOP_ALLOW_NO_SYSTEMD env var
- test: remove tautological string tests in daemon_client and node lifecycle
  that verified hardcoded strings against themselves

docs: add code review notes to docs/plans/
@edgarriba edgarriba marked this pull request as draft April 2, 2026 08:23
@lferraz lferraz marked this pull request as ready for review April 2, 2026 21:45
Copilot AI review requested due to automatic review settings April 2, 2026 21:45
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Add supervisor abstraction with native fallback for Docker development

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Add supervisor abstraction layer with systemd and native backends
  - Supervisor::detect() auto-selects systemd (Linux) or native (Docker/dev)
  - NativeSupervisor manages processes via tokio::process with PID persistence
• Migrate NodeManager lifecycle/build/status to use Supervisor instead of direct systemd
  - Removes systemd coupling from node start/stop/restart/install/uninstall flows
  - Simplifies service name handling by delegating to backend
• Clarify systemd-only features in CLI and documentation
  - journalctl logs now explicitly marked as systemd backend only
  - Native fallback returns helpful message instead of cryptic errors
• Add comprehensive test coverage for native supervisor and dispatcher
  - 5 native supervisor tests (lifecycle, autostart, stale PID, errors, signals)
  - 2 supervisor dispatcher tests (backend detection, full lifecycle delegation)
  - 2 node manager integration tests (native logs message, e2e start/stop/restart)
• Add devcontainer with Docker Compose for development without systemd
  - Ubuntu 22.04 base with Rust, GStreamer, D-Bus, protobuf, Node.js
  - SSH key mounting and port forwarding for local development
Diagram
flowchart LR
  A["NodeManager"] -->|uses| B["Supervisor"]
  B -->|detect| C{"D-Bus available?"}
  C -->|yes| D["SystemdClient"]
  C -->|no| E["NativeSupervisor"]
  D -->|manages| F["systemd services"]
  E -->|manages| G["tokio::process + PID files"]
  F -->|provides| H["journalctl logs"]
  G -->|provides| I["stdout/stderr files"]
Loading

Grey Divider

File Changes

1. crates/bubbaloop/src/daemon/supervisor.rs ✨ Enhancement +208/-0

New supervisor dispatcher with backend detection

crates/bubbaloop/src/daemon/supervisor.rs


2. crates/bubbaloop/src/daemon/native_supervisor.rs ✨ Enhancement +546/-0

Native process supervisor for non-systemd environments

crates/bubbaloop/src/daemon/native_supervisor.rs


3. crates/bubbaloop/src/daemon/node_manager/mod.rs ✨ Enhancement +115/-26

Migrate NodeManager to use Supervisor abstraction

crates/bubbaloop/src/daemon/node_manager/mod.rs


View more (11)
4. crates/bubbaloop/src/daemon/node_manager/lifecycle.rs ✨ Enhancement +19/-24

Update node lifecycle to use Supervisor instead of systemd

crates/bubbaloop/src/daemon/node_manager/lifecycle.rs


5. crates/bubbaloop/src/daemon/node_manager/build.rs ✨ Enhancement +3/-4

Use Supervisor for build pre-flight stop checks

crates/bubbaloop/src/daemon/node_manager/build.rs


6. crates/bubbaloop/src/daemon/mod.rs ✨ Enhancement +4/-2

Export new supervisor and native_supervisor modules

crates/bubbaloop/src/daemon/mod.rs


7. crates/bubbaloop/src/cli/daemon.rs 📝 Documentation +2/-2

Clarify journalctl logs are systemd backend only

crates/bubbaloop/src/cli/daemon.rs


8. crates/bubbaloop/src/cli/daemon_client.rs ✨ Enhancement +4/-2

Improve journalctl error message for non-systemd environments

crates/bubbaloop/src/cli/daemon_client.rs


9. crates/bubbaloop/src/cli/node/lifecycle.rs ✨ Enhancement +2/-2

Update node logs error message for systemd-only feature

crates/bubbaloop/src/cli/node/lifecycle.rs


10. .devcontainer/Dockerfile ⚙️ Configuration changes +55/-0

Add Ubuntu dev container with Rust and build dependencies

.devcontainer/Dockerfile


11. .devcontainer/devcontainer.json ⚙️ Configuration changes +43/-0

Configure VS Code dev container with port forwarding

.devcontainer/devcontainer.json


12. .devcontainer/docker-compose.yml ⚙️ Configuration changes +34/-0

Docker Compose setup with dev container and Zenoh router

.devcontainer/docker-compose.yml


13. README.md 📝 Documentation +8/-0

Document systemd as production path, Docker as development fallback

README.md


14. docs/getting-started/installation.md 📝 Documentation +11/-1

Update platform support table and add native supervisor notes

docs/getting-started/installation.md


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Apr 2, 2026

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX Issues (0)

Grey Divider


Action required

1. NativeSupervisor tests write to home📘 Rule violation ☼ Reliability
Description
New filesystem tests exercise NativeSupervisor using the default ~/.bubbaloop/procs location
instead of an isolated tempfile::tempdir(), which can pollute developer machines and make tests
flaky. This violates the project convention for filesystem tests to be temporary and self-contained.
Code

crates/bubbaloop/src/daemon/native_supervisor.rs[R408-428]

+    #[tokio::test]
+    async fn install_start_stop_uninstall_cycle() {
+        let sup = NativeSupervisor::new();
+        let name = unique_name("native-cycle");
+        cleanup_node_files(&name);
+
+        sup.install_service("/tmp", &name, "rust", Some("sleep 30"))
+            .unwrap();
+        assert!(NativeSupervisor::is_installed(&name));
+
+        sup.start_unit(&name).await.unwrap();
+        wait_for_active_state(&sup, &name, ActiveState::Active, 2_000).await;
+
+        sup.stop_unit(&name).await.unwrap();
+        wait_for_active_state(&sup, &name, ActiveState::Inactive, 2_000).await;
+
+        sup.uninstall_service(&name).await.unwrap();
+        assert!(!NativeSupervisor::is_installed(&name));
+
+        cleanup_node_files(&name);
+    }
Evidence
Compliance ID 5 requires filesystem tests to use tempfile::tempdir() and avoid real user/system
paths. NativeSupervisor::procs_dir() targets the real home directory and write_config()
creates/writes there, while the new #[tokio::test] cases call install_service()/start_unit()
which triggers those writes under ~/.bubbaloop/procs.

CLAUDE.md
crates/bubbaloop/src/daemon/native_supervisor.rs[54-58]
crates/bubbaloop/src/daemon/native_supervisor.rs[81-86]
crates/bubbaloop/src/daemon/native_supervisor.rs[408-428]
crates/bubbaloop/src/daemon/supervisor.rs[175-207]
crates/bubbaloop/src/daemon/node_manager/mod.rs[935-983]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Filesystem tests added in this PR write configs/PID/log files under the real `~/.bubbaloop/procs` directory because `NativeSupervisor::procs_dir()` is hardcoded to `dirs::home_dir()`. This violates the project rule that filesystem tests must use `tempfile::tempdir()` and must not write to real user/system paths.
## Issue Context
`NativeSupervisor::install_service()` calls `write_config()`, which creates and writes under `procs_dir()`. Several new async tests call `install_service()` and lifecycle methods, so they perform real filesystem writes.
## Fix Focus Areas
- crates/bubbaloop/src/daemon/native_supervisor.rs[54-86]
- crates/bubbaloop/src/daemon/native_supervisor.rs[371-467]
- crates/bubbaloop/src/daemon/supervisor.rs[154-207]
- crates/bubbaloop/src/daemon/node_manager/mod.rs[935-983]
## Implementation notes
- Make the NativeSupervisor base directory configurable (e.g., `NativeSupervisor::new_with_root(PathBuf)` or an injected `procs_dir` field).
- Update tests to create a `tempfile::tempdir()` and construct the supervisor/dispatcher to use that directory.
- Ensure tests clean up by dropping the tempdir (no manual removal of `~/.bubbaloop/...`).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. stop_unit() can kill PID 1📘 Rule violation ⛨ Security
Description
The native supervisor reads a PID from a file and passes it directly to /bin/kill without blocking
dangerous PIDs like 0 or 1. A corrupted or tampered PID file could cause termination of the init
process or signal a process group.
Code

crates/bubbaloop/src/daemon/native_supervisor.rs[R265-297]

+    pub async fn stop_unit(&self, name: &str) -> Result<()> {
+        let pid = Self::read_pid(name)
+            .filter(|&pid| Self::is_pid_alive(pid))
+            .ok_or_else(|| SystemdError::ServiceNotFound(name.to_string()))?;
+
+        // Send SIGTERM — use absolute path per security conventions
+        let kill_bin = if std::path::Path::new("/bin/kill").exists() {
+            "/bin/kill"
+        } else {
+            "/usr/bin/kill"
+        };
+
+        tokio::process::Command::new(kill_bin)
+            .args(["-TERM", &pid.to_string()])
+            .status()
+            .await
+            .map_err(|e| SystemdError::OperationFailed(e.to_string()))?;
+
+        // Give it up to 3s to exit gracefully, then SIGKILL
+        for _ in 0..6 {
+            tokio::time::sleep(std::time::Duration::from_millis(500)).await;
+            if !Self::is_pid_alive(pid) {
+                break;
+            }
+        }
+
+        if Self::is_pid_alive(pid) {
+            tokio::process::Command::new(kill_bin)
+                .args(["-KILL", &pid.to_string()])
+                .status()
+                .await
+                .ok();
+        }
Evidence
Compliance ID 14 requires kill usage to be restricted to numeric PIDs and to block dangerous
variants like kill 0 and kill 1. read_pid() accepts any u32 (including 0 and 1), and
stop_unit() uses that value directly in kill arguments (.args(["-TERM", &pid.to_string()]) and
later -KILL) without rejecting 0/1.

CLAUDE.md
crates/bubbaloop/src/daemon/native_supervisor.rs[88-92]
crates/bubbaloop/src/daemon/native_supervisor.rs[265-297]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`NativeSupervisor::stop_unit()` can invoke `kill` with PID `0` or `1` because the PID is read from a PID file and not validated. The compliance rule requires blocking dangerous kill targets (`kill 0`, `kill 1`, etc.).
## Issue Context
- `read_pid()` parses any `u32`.
- `stop_unit()` filters only on `is_pid_alive(pid)` and then calls `/bin/kill` with `-TERM` and potentially `-KILL`.
## Fix Focus Areas
- crates/bubbaloop/src/daemon/native_supervisor.rs[88-92]
- crates/bubbaloop/src/daemon/native_supervisor.rs[265-297]
## Implementation notes
- Add an explicit guard like `if pid <= 1 { return Err(SystemdError::OperationFailed("refusing to kill pid 0/1".into())); }` before spawning `kill`.
- Consider also rejecting PID `0` in `read_pid()` (or in the `.filter(...)` chain) so it cannot reach kill execution.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Bad native default command🐞 Bug ≡ Correctness
Description
NativeSupervisor::install_service defaults to "./{name}" when a node has no manifest command, but
the codebase treats command as optional and expects Rust binaries under
target/{release,debug}/<name> (or python venv). This will cause installed nodes to fail to start in
native fallback (Docker/non-systemd).
Code

crates/bubbaloop/src/daemon/native_supervisor.rs[R146-149]

+        let cmd = command
+            .map(|c| c.to_string())
+            .unwrap_or_else(|| format!("./{name}"));
+
Evidence
NativeSupervisor persists a default command of "./{name}" when command is None, but the registry and
systemd backend both encode that "command" is optional and have a different default execution layout
(target/release|debug for Rust; venv python for Python). NodeManager can pass None through to
install_service when the manifest omits command, making the native default immediately user-visible.

crates/bubbaloop/src/daemon/native_supervisor.rs[139-156]
crates/bubbaloop/src/daemon/node_manager/lifecycle.rs[45-64]
crates/bubbaloop/src/daemon/registry.rs[428-445]
crates/bubbaloop/src/daemon/systemd.rs[599-615]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`NativeSupervisor::install_service()` chooses `./{name}` when `command` is `None`, but the rest of the codebase treats `command` as optional and defaults to `target/release/<name>` (or `target/debug/<name>`) for Rust and `venv/bin/python main.py` for Python. In native fallback this makes freshly installed nodes fail to start.
### Issue Context
- `NodeManager::install_node()` passes `command.as_deref()` which can be `None` when the manifest omits a command.
- The systemd backend already contains the canonical defaulting logic.
### Fix Focus Areas
- crates/bubbaloop/src/daemon/native_supervisor.rs[139-156]
- crates/bubbaloop/src/daemon/systemd.rs[599-615]
- crates/bubbaloop/src/daemon/registry.rs[428-445]
- crates/bubbaloop/src/daemon/node_manager/lifecycle.rs[45-64]
### Suggested fix
- When `command` is `None`, derive a default based on `node_type` similar to `systemd::generate_service_unit()`:
- Rust: prefer `target/release/{name}` if it exists, else `target/debug/{name}`; otherwise return an error that the node is not built.
- Python: use `venv/bin/python main.py`.
- Consider normalizing to a command vector (exe + args) to avoid whitespace parsing surprises later.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

4. Stop errors when inactive🐞 Bug ☼ Reliability
Description
NativeSupervisor::stop_unit returns ServiceNotFound when the PID file is missing or stale, so
NodeManager stop_node reports failure for already-stopped but installed nodes. This turns a
common/benign operation into an error in the CLI/API and logs false failures in watchdog kill paths.
Code

crates/bubbaloop/src/daemon/native_supervisor.rs[R265-269]

+    pub async fn stop_unit(&self, name: &str) -> Result<()> {
+        let pid = Self::read_pid(name)
+            .filter(|&pid| Self::is_pid_alive(pid))
+            .ok_or_else(|| SystemdError::ServiceNotFound(name.to_string()))?;
+
Evidence
stop_node propagates the supervisor stop error directly into the command result, and the watchdog
uses stop_node and logs errors. NativeSupervisor currently treats “no PID file / dead PID” as
ServiceNotFound rather than “inactive”, which makes stop non-idempotent in native mode.

crates/bubbaloop/src/daemon/native_supervisor.rs[265-269]
crates/bubbaloop/src/daemon/node_manager/lifecycle.rs[17-22]
crates/bubbaloop/src/daemon/telemetry/circuit_breaker.rs[271-308]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
In native mode, `stop_unit()` errors with `ServiceNotFound` when the PID file is missing or the PID is not alive. This causes `NodeManager::stop_node()` to fail for already-stopped (but installed) services.
### Issue Context
Systemd stop is typically treated as idempotent by callers; the daemon and watchdog paths log errors when stop fails.
### Fix Focus Areas
- crates/bubbaloop/src/daemon/native_supervisor.rs[265-269]
- crates/bubbaloop/src/daemon/node_manager/lifecycle.rs[17-22]
- crates/bubbaloop/src/daemon/telemetry/circuit_breaker.rs[271-308]
### Suggested fix
- If the node is installed (config exists) but there is no live PID:
- remove any stale PID file
- return `Ok(())` (treat as already stopped)
- Reserve `ServiceNotFound` for “not installed / no config” rather than “not running”.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Stop may misreport success🐞 Bug ☼ Reliability
Description
NativeSupervisor::stop_unit does not validate the SIGKILL attempt and does not re-check liveness
after it, yet it removes the PID file and returns Ok. If the process remains alive (e.g., kill fails
or uninterruptible sleep), the daemon will report the node as stopped while it’s still running.
Code

crates/bubbaloop/src/daemon/native_supervisor.rs[R291-303]

+        if Self::is_pid_alive(pid) {
+            tokio::process::Command::new(kill_bin)
+                .args(["-KILL", &pid.to_string()])
+                .status()
+                .await
+                .ok();
+        }
+
+        Self::remove_pid(name);
+        self.emit(Self::job_removed_event(name, "done"));
+        log::info!("[NativeSupervisor] Stopped {name} (pid={pid})");
+
+        Ok(())
Evidence
After the grace period, the code attempts SIGKILL but ignores its result, then unconditionally
removes the PID file and emits a ‘done’ event. Because native status is derived from PID file + PID
liveness, removing the PID file can desynchronize reported state from reality.

crates/bubbaloop/src/daemon/native_supervisor.rs[277-303]
crates/bubbaloop/src/daemon/native_supervisor.rs[313-320]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`NativeSupervisor::stop_unit()` removes the PID file and returns success even if the SIGKILL attempt fails and the process is still alive.
### Issue Context
Native active state is computed from PID file + PID liveness, so removing the PID file is effectively asserting the process is gone.
### Fix Focus Areas
- crates/bubbaloop/src/daemon/native_supervisor.rs[277-303]
- crates/bubbaloop/src/daemon/native_supervisor.rs[313-320]
### Suggested fix
- Capture and check the `ExitStatus` from both SIGTERM and SIGKILL.
- After SIGKILL, re-check `is_pid_alive(pid)`:
- if still alive: return `SystemdError::OperationFailed("failed to stop ...")` and **do not** remove the PID file
- if not alive: proceed with PID cleanup and event emission
- Optionally log a warning if SIGKILL returns non-success but the PID is already gone (race).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Native drops depends_on🐞 Bug ≡ Correctness
Description
Supervisor::install_service accepts depends_on, but the native backend ignores it, silently removing
dependency ordering that systemd enforces via After/Requires. Nodes that rely on depends_on may
start in the wrong order or without required dependencies in Docker/native mode.
Code

crates/bubbaloop/src/daemon/supervisor.rs[R119-132]

+    pub async fn install_service(
+        &self,
+        node_path: &str,
+        node_name: &str,
+        node_type: &str,
+        command: Option<&str>,
+        depends_on: &[String],
+    ) -> Result<()> {
+        match self {
+            Supervisor::Systemd(_) => {
+                systemd::install_service(node_path, node_name, node_type, command, depends_on).await
+            }
+            Supervisor::Native(n) => n.install_service(node_path, node_name, node_type, command),
+        }
Evidence
NodeManager passes manifest.depends_on into Supervisor::install_service, and the systemd backend
uses it to build After/Requires lines. In the native branch, depends_on is not forwarded or
persisted anywhere, so the information is lost.

crates/bubbaloop/src/daemon/supervisor.rs[119-132]
crates/bubbaloop/src/daemon/node_manager/lifecycle.rs[56-64]
crates/bubbaloop/src/daemon/systemd.rs[618-633]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
In native fallback, `depends_on` is silently ignored during install, removing ordering/requirement semantics that exist in the systemd backend.
### Issue Context
- `NodeManager::install_node()` forwards `manifest.depends_on`.
- `systemd::generate_service_unit()` uses `depends_on` to generate After/Requires.
### Fix Focus Areas
- crates/bubbaloop/src/daemon/supervisor.rs[119-132]
- crates/bubbaloop/src/daemon/native_supervisor.rs[24-32]
- crates/bubbaloop/src/daemon/node_manager/lifecycle.rs[56-64]
- crates/bubbaloop/src/daemon/systemd.rs[618-633]
### Suggested fix
Choose one:
1) Implement basic parity:
- Add `depends_on: Vec<String>` to `ProcConfig`.
- On `start_unit(name)`, recursively `start_unit(dep)` first (with cycle protection).
2) If you want to keep it dev-only minimal:
- Persist `depends_on` and emit a clear `log::warn!` during install/start when non-empty saying it is not enforced in native mode.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
7. Native autostart not applied🐞 Bug ≡ Correctness
Description
NativeSupervisor persists an autostart flag, but NodeManager startup does not read it to start
enabled nodes, so EnableAutostart has no effect across daemon restarts in native fallback. This
makes autostart appear supported while not actually providing the expected behavior in Docker/native
mode.
Code

crates/bubbaloop/src/daemon/native_supervisor.rs[R340-354]

+    /// Enable autostart (persisted to config file).
+    pub fn enable_unit(&self, name: &str) -> Result<()> {
+        let mut config = Self::read_config(name)
+            .ok_or_else(|| SystemdError::ServiceNotFound(name.to_string()))?;
+        config.autostart = true;
+        Self::write_config(&config)
+    }
+
+    /// Disable autostart (persisted to config file).
+    pub fn disable_unit(&self, name: &str) -> Result<()> {
+        let mut config = Self::read_config(name)
+            .ok_or_else(|| SystemdError::ServiceNotFound(name.to_string()))?;
+        config.autostart = false;
+        Self::write_config(&config)
+    }
Evidence
NativeSupervisor enable/disable only flips a boolean in the stored config. NodeManager::new performs
an initial refresh but does not start any enabled units, which is fine for systemd (systemd starts
enabled units) but leaves native mode without any autostart mechanism.

crates/bubbaloop/src/daemon/native_supervisor.rs[333-354]
crates/bubbaloop/src/daemon/node_manager/mod.rs[177-213]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Native autostart is persisted but never acted upon at daemon startup, so enabling autostart does not actually start nodes after a restart in native mode.
### Issue Context
Systemd provides autostart externally. Native mode needs an explicit “start enabled units” step.
### Fix Focus Areas
- crates/bubbaloop/src/daemon/node_manager/mod.rs[177-213]
- crates/bubbaloop/src/daemon/native_supervisor.rs[333-354]
### Suggested fix
- When `Supervisor::is_native()`:
- enumerate installed native configs under `~/.bubbaloop/procs/*.json`
- for each with `autostart=true`, call `supervisor.start_unit(node_name)` on startup (ideally after initial `refresh_all()`)
- emit appropriate events / refresh node state after starting
- Ensure this is best-effort and doesn’t fail NodeManager creation if one node fails to start.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a unified supervisor abstraction to decouple node/service lifecycle management from systemd, adding a native (non-systemd) development fallback primarily for Docker environments.

Changes:

  • Added Supervisor dispatcher that selects a systemd backend when D-Bus is available and otherwise falls back to a native supervisor.
  • Implemented NativeSupervisor to manage node processes via tokio::process with on-disk PID/config persistence.
  • Refactored NodeManager lifecycle/status/logs paths to route through Supervisor, and updated docs/devcontainer assets to clarify systemd vs Docker behavior.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
README.md Clarifies that Linux+systemd is the supported production path and Docker uses a dev-only fallback.
docs/getting-started/installation.md Updates platform support matrix and explains non-systemd fallback limitations.
crates/bubbaloop/src/daemon/supervisor.rs Adds supervisor dispatcher and basic tests for native delegation.
crates/bubbaloop/src/daemon/node_manager/mod.rs Switches signal subscription/state queries to Supervisor and returns explicit logs guidance in native mode.
crates/bubbaloop/src/daemon/node_manager/lifecycle.rs Routes lifecycle install/start/stop/restart/autostart through Supervisor.
crates/bubbaloop/src/daemon/node_manager/build.rs Uses Supervisor for pre-build stop-if-running behavior.
crates/bubbaloop/src/daemon/native_supervisor.rs Implements native process supervisor with config/PID persistence and lifecycle signaling.
crates/bubbaloop/src/daemon/mod.rs Exposes new supervisor modules and generalizes signal-listener log messages.
crates/bubbaloop/src/cli/node/lifecycle.rs Clarifies that journalctl -f log following is systemd-only and improves error text.
crates/bubbaloop/src/cli/daemon.rs Documents daemon logs command as systemd-only.
crates/bubbaloop/src/cli/daemon_client.rs Improves journalctl failure messaging for daemon logs.
Cargo.lock Bumps bubbaloop version to 0.0.12-dev.
.devcontainer/Dockerfile Adds devcontainer image for building/running in Docker.
.devcontainer/docker-compose.yml Defines dev service plus a zenoh router dependency.
.devcontainer/devcontainer.json Configures VS Code devcontainer extensions, ports, and setup commands.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lferraz
Copy link
Copy Markdown
Member Author

lferraz commented Apr 2, 2026

Summary comment fixes (#5, #6, #7)

#5 — Stop may misreport success: After SIGKILL, stop_unit now waits 300ms and re-checks liveness. If the process is still alive it returns OperationFailed without removing the PID file, so the node correctly reports as still running.

#6 — Native drops depends_on: ProcConfig now has a depends_on: Vec<String> field (with #[serde(default)] for backward compat). The native backend persists it but does not enforce ordering. A log::warn! is emitted at start time when dependencies are present so the gap is visible.

#7 — Native autostart not applied: Added NativeSupervisor::start_autostart_units() (enumerates *.json configs and starts those with autostart: true), exposed as Supervisor::start_native_autostart(), and called from NodeManager::new() after the initial refresh. Autostart now works across daemon restarts in native/Docker mode.

lferraz and others added 2 commits April 3, 2026 00:36
- pixi.toml: add --features dashboard to `pixi run dash` and `build-release`
  tasks (was missing after dashboard was moved from default features)

- native_supervisor: make procs_dir configurable via new_with_root(PathBuf)
  so tests use tempfile::tempdir() instead of writing to ~/.bubbaloop/procs

- native_supervisor: guard stop_unit against PID <= 1 (corrupted PID file
  could otherwise signal init or a process group)

- native_supervisor: derive default command from node_type when command is
  None, matching systemd backend (rust: target/release/<name>, python:
  venv/bin/python main.py)

- native_supervisor: make stop_unit idempotent — installed-but-stopped nodes
  now return Ok(()) instead of ServiceNotFound

- native_supervisor: check SIGTERM exit status; return error if kill fails

- native_supervisor: fix misleading comment about kill-on-drop behavior
  (tokio does not send SIGKILL on drop by default)

- native_supervisor: after SIGKILL, re-check liveness before claiming success;
  return OperationFailed without removing PID file if process survives

- native_supervisor: persist depends_on in ProcConfig and emit a warn! at
  start time — native backend does not enforce ordering (dev-only fallback)

- native_supervisor: add list_installed_names() and start_autostart_units()
  to apply autostart flag at daemon startup (systemd handles this externally,
  native mode needs it explicitly)

- supervisor: forward depends_on to native backend; add start_native_autostart()

- node_manager: call start_native_autostart() after initial refresh so
  autostart works across daemon restarts in Docker/native mode

- node_manager: fix get_logs() fallback message to point to actual log files
  (~/.bubbaloop/procs/{name}.stdout/stderr) instead of suggesting foreground

- node_manager: generalize start_signal_listener() doc comment to cover both
  backends; scope D-Bus deadlock warning to systemd only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- pixi.toml: add --features dashboard to pixi run dash and build-release
  tasks (was missing after dashboard was moved from default features)

- native_supervisor: make procs_dir configurable via new_with_root(PathBuf)
  so tests use tempfile::tempdir() instead of writing to ~/.bubbaloop/procs

- native_supervisor: guard stop_unit against PID <= 1 (corrupted PID file
  could otherwise signal init or a process group)

- native_supervisor: derive default command from node_type when command is
  None, matching systemd backend (rust: target/release/<name>, python:
  venv/bin/python main.py)

- native_supervisor: make stop_unit idempotent — installed-but-stopped nodes
  return Ok(()) instead of ServiceNotFound

- native_supervisor: check SIGTERM exit status; return error if kill fails

- native_supervisor: fix misleading comment about kill-on-drop behavior
  (tokio does not send SIGKILL on drop by default)

- native_supervisor: after SIGKILL, re-check liveness before claiming success;
  return OperationFailed without removing PID file if process survives

- native_supervisor: persist depends_on in ProcConfig and emit a warn! at
  start time; native backend does not enforce ordering (dev-only fallback)

- native_supervisor: add list_installed_names() and start_autostart_units()
  to apply autostart flag at daemon startup

- supervisor: forward depends_on to native backend; add start_native_autostart()

- node_manager: call start_native_autostart() after initial refresh so
  autostart works across daemon restarts in Docker/native mode

- node_manager: fix get_logs() fallback message to point to actual log files
  (~/.bubbaloop/procs/{name}.stdout/stderr)

- node_manager: generalize start_signal_listener() doc comment; scope D-Bus
  deadlock warning to systemd backend only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Suppress duplicate JobRemoved event from watcher task when stop_unit
  already removed the PID file (intentional stop). Watcher now only emits
  on unexpected exits (crashes), preventing a spurious 'failed' event
  racing with stop_unit's 'done' event.

- Expose NativeSupervisor::procs_dir_path() and Supervisor::native_procs_dir()
  so callers can use the actual storage path (including /tmp fallback) instead
  of reading $HOME from the environment.

- Fix get_logs() native fallback message to use supervisor.native_procs_dir()
  instead of $HOME, so the hint is accurate when NativeSupervisor falls back
  to /tmp.

- Refresh NodeManager cache after start_native_autostart() so newly started
  nodes appear as Running immediately rather than Stopped until next poll.

- Document split_whitespace limitation in start_unit (quoted/escaped args and
  paths with spaces are not supported in the native backend).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…tect()

Add SystemdClient::probe() which calls GetUnit("basic.target") via D-Bus.
Any systemd-level response (including NoSuchUnit) proves the service is
present; a transport/FDO error (ServiceUnknown etc.) means systemd is not
on this bus.

Supervisor::detect() now calls probe() after opening the session connection
so environments with D-Bus but without systemd (containers with a user
dbus-daemon but no systemd PID 1) correctly fall back to NativeSupervisor
instead of selecting a broken systemd backend.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@edgarriba edgarriba merged commit 2cd7adf into kornia:main Apr 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants