smokemon

full-stack network + host monitoring for the edge — every signal on one timeline, in stdlib python and ~30 mb of ram. no cloud, no agent to pay for: collection/ship/hub are pure stdlib; you add only the probe tools (fping/mtr/iperf3/iw) and a plotting lib for graphs.

smokemon watches network and the box it runs on — ping loss & latency spread, bandwidth, http breakdown, per-hop routes, wifi, throughput, cpu/mem/temp/psi/power — and lays it all on a single timeline, so you can see what else was happening the moment things went bad.

the core is pure-stdlib python: a raspberry pi or jetson runs it for ~30 mb of ram and well under 1% of one core — it graphs its own footprint to prove it. point many nodes at one hub and watch the whole fleet from a terminal or a browser.

new here? copy-paste install & use in QUICKSTART.md. full reference below.

the code? the package is documented file-by-file in smokemon/README.md; each probe (what it measures, what it refuses to do, the footprint rules) in smokemon/probes/README.md.

view:    smoke
         smoke live 24h
         smoke kiosk 24h
         smoke png (or python -m smokemon.cli …)

run:     python -m smokemon.collect {fast|slow}
         (launchd/systemd do this; see deploy/)

macOS    cp deploy/launchd/*.plist ~/Library/LaunchAgents/ && bootstrap each

Linux    curl -fsSL https://raw.githubusercontent.com/oovets/smokemon/main/install.sh \
            | sudo bash -s -- --node NAME [--hub-url URL --secret S]

== analysis engine + dashboard + alerting ==

- smokemon/analyze.py (hub-side, read-only, stdlib): incident detection (isp-outage /
  link-down / packet-loss / latency-spike / dns-slow), multi-signal blame (what deviated
  during the window + new processes), time-of-day anomaly baseline, change-point detection,
  mtr path intelligence, bandwidth attribution.

- text surfaces (run on a node too): `smoke status` (sparkline health line), `smoke
  incidents` (incidents + blame), `smoke digest` (plain-english summary). `smoke replay`
  scrubs any past window. `--bell` rings on degraded health; `--notify` pushes incidents.

- hub now serves a live fleet dashboard at GET / , a prometheus /metrics endpoint, and a
  family of read-only /api/* json endpoints (nodes/latest/fleet/fleet-status/heatmap/risks/
  cost/services/logs/ports/network/inventory/ingest-rate/spark + render plot/png). push
  alerts via smokemon/notify.py (ntfy/slack/discord/webhook).

- node-side: a `self` panel graphs smokemon's own RSS/CPU; opt-in synthetic transactions
  (captive-portal + DoH) via probes/synthetic.py; opt-in lightweight external HTTP
  scrapes via probes/ext.py (bounded timeout/body/metric caps, no log streams);
  opt-in Redis stream health via probes/redisq.py uses stdlib socket reads only. Jetson
  GPU util/frequency is read from sysfs, not tegrastats.

== v0.11  rich host metrics + grid layout ==

- new tables: thermal_zones (all sensors, not just max), power_samples (jetson INA3221
  per-rail watts), tcp_samples (retransmits / RSTs / udp errors / conntrack fill),
  disk_health (SD wear-level, hourly). host_samples adds PSI cpu/mem/io, swap/cache,
  oom_kill_count, cpu_freq_mhz, cpu_throttle_count, pi_throttle_bits. wifi_samples adds
  bssid + retry/discard/beacon counters; render shows roam count across BSSIDs.

- renderer: 5 new panels (thermal, power, tcp, psi, freq). 2-col grid by default
  (PNG when >=3 panels, TUI when terminal >=140 cols). --cols N to force.

- perf: ping_rtt percentiles (p25/p75) pre-aggregated at insert -> load_ping_smoke skips
  the ping_rtts scan for new rows. hub ingest uses executemany. load_net uses SQL LAG()
  (sqlite >=3.25). SQLite stays on WAL + synchronous=NORMAL only; cache/mmap PRAGMAs were
  tried and reverted to keep node RSS low (smokemon reports its own RSS, so they'd skew it).

== v0.10  package refactor ==

- flat scripts -> smokemon/ package: config (env/NODE/paths), core (log/connect/
  signals/run_scheduler), schema (single-source DDL -> node+hub + STD_TABLES + generic
  insert), adapters/{darwin,linux}, probes/{ping,net,http,mtr,wifi,iperf,host}, collect
  (one daemon, group fast|slow|all), ship, hub, query (shared loaders + --node),
  render/{tui,png}, cli (`smoke` subcommands).

- 3 collector daemons -> 2 (fast=ping/net; slow=http/mtr/wifi/host). live.sh/daily_graph.sh
  -> `smoke live|kiosk|daily`. dedup: schema, daemon loop, plot loaders, the duplicate
  wifi_probe (all gone). net caches the TS iface (5 min). hub: ThreadingHTTPServer + write
  lock. entrypoints: python -m smokemon.* (PYTHONPATH=repo, no install needed).

earlier versions (v0.1 - v0.9) → CHANGELOG.md

smoke [tui]                 static TUI; 18 panel types: ping,net,http,mtr,wifi,iperf,host,
                            gpu,redis,docker,pipeline,disk,thermal,power,tcp,psi,freq,self|all
                            --cols N|0(auto). a panel only draws if the node has that data.
                            psi+freq are Linux-only; thermal/power/tcp also work on macOS
                            (cpu_speed_limit, battery rail, netstat -s parsing)

smoke live 24h | smoke kiosk 24h [--refresh N] [--bell]   live / clean wall display

smoke replay [DATE|Nh] [--frame MIN] DVR scrubber (←/→ scrub, ↑/↓ step, q)

smoke fleet [live]         aggregated terminal view of every node reporting to the hub
                           (worst-first, colour-coded; the TUI twin of GET /). --ranked
                           for uptime/downtime over --hours; --heatmap [--metric loss|rtt]
                           for a node×hour sparkline grid; --hub-url URL reads the hub's
                           read-only /api over HTTP (no hub DB access needed); --bell.

smoke footprint            collector rows/day + SQLite and shipper gzip byte estimates
                           for a node DB (or --node on a hub DB). --ship-rtts includes
                           raw ping RTTs in the wire estimate.

smoke png [--width N --dpi N --cols N] | smoke daily   PNG -> Preview / dated 24h PNG

smoke status | smoke incidents | smoke digest [--notify]   text analysis (stdlib, node-ok)
common: --minutes N|--hours N|--since|--until --targets --panels --node (req. on hub DB)

analysis: smokemon/analyze.py (incident detection + multi-signal blame + anomaly/change-
          point/path/attribution stats, hub-side read-only). hub also serves a live fleet
          dashboard at GET / (grid/table/ranking/heatmap/risks/services/logs/cost tabs),
          plus GET /metrics (prometheus) and read-only GET /api/* (nodes, latest, fleet,
          fleet-status, heatmap, risks, cost, services, logs, ports, network, inventory,
          ingest-rate, spark; plot/png render a node's panels). see INSTALL.md for the list.

alerting: set SMOKEMON_NOTIFY_URL (ntfy/slack/discord/webhook) + `smoke digest --notify`
          or the smokemon-notify timer. synthetic checks: SMOKEMON_SYNTHETIC=1.
          external checks: SMOKEMON_EXT_HTTP='app=http://127.0.0.1:8080/health'.
          redis/docker/pipeline: auto-detected (default on, silent no-op when absent);
          name redis streams with SMOKEMON_REDIS_STREAMS=a,b,c; SMOKEMON_{REDIS,DOCKER,
          PIPELINE}=0 disables, =1 forces a down-row even when the service is unreachable.

daemons: python -m smokemon.collect {fast|slow} | .probes.iperf | .probes.synthetic
         | .ship | .hub | .notify  (PYTHONPATH=repo)

multi-node: nodes run collect + iperf + ship (push delta -> hub); hub runs
            python -m smokemon.hub (-> smokemon-hub.db). plot a single node with
            --node NAME; see the whole fleet at once with `smoke fleet` (or GET /).
            repoint a node with `smoke hub NEW-HUB` (writes SMOKEMON_HUB_URL).

deploy: macOS deploy/launchd/*.plist (collect-fast/slow, iperf, daily, shipper, hub);
        Linux sudo ./install.sh --node NAME --hub-url URL --secret S
        (hub: --hub --secret S). secret must match node<->hub.

deps:   node: fping,mtr,iperf3,iw + python3 stdlib + plotext(TUI);
        hub: +matplotlib/numpy(PNG) + iperf3 (runs iperf3 -s as the nodes' bandwidth target).

== what the metrics mean (the non-obvious ones) ==

rtt spread        the p25-p75 / p0-p100 band around median ping, not a single number - a
                  wide band = jitter even when the average looks fine.
bufferbloat grade A+..F from idle ping vs ping-under-load (iperf). F = the link buffers
                  badly under load (calls/games stutter while something downloads).
psi               linux pressure-stall info (/proc/pressure): % of time tasks stalled on
                  cpu/mem/io. rises *before* utilisation hits 100% - an early warning.
conntrack fill    how full the kernel's connection-tracking table is. near 100% = new
                  connections get dropped (looks like packet loss, isn't the link).
death clocks      linear extrapolation to a limit: disk-full eta, sd/emmc wear-out eta, and
                  headroom (degC) before the cpu thermally throttles.
roam count        how many times wifi jumped between bssids (access points) in the window;
                  frequent roams correlate with throughput dips.
throttle bits     raspberry pi vcgencmd flags (under-voltage / freq-capped / throttled),
                  past and currently-active - the usual cause of silent pi slowdowns.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github		.github
deploy		deploy
docs		docs
scripts		scripts
smokemon		smokemon
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
PLAN.md		PLAN.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SECURITY.md		SECURITY.md
install.sh		install.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

smokemon

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

smokemon

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages