cyhy-commander

Job orchestrator for the Cyber Hygiene (CyHy) scanning system. Dispatches scan jobs to nmap and Nessus scanner hosts via SSH/rsync, processes results, and updates the database.

Requirements

Python 3.14+
MongoDB 8.0+ or AWS DocumentDB (MongoDB 8.0 compatible)
Scanner hosts running cyhy-runner

Installation

# Clone and install with uv
git clone https://github.com/cisagov/cyhy-commander.git
cd cyhy-commander
uv sync

Configuration

Configuration is loaded from a TOML file via cyhy-config. The following locations are searched in order:

Path in the CYHY_CONFIG_PATH environment variable
AWS SSM Parameter Store path in CYHY_CONFIG_SSM_PATH
./cyhy.toml (current working directory)
~/.cyhy/cyhy.toml (user home)
/etc/cyhy.toml (system-wide)

See extras/cyhy-example.toml for a fully commented example.

Usage

uv run cyhy-commander <working_dir>

Docker

# Build the container image
docker build -t cyhy-commander .

# Build with custom UID/GID
docker build --build-arg CISA_UID=1001 --build-arg CISA_GID=1001 -t cyhy-commander .

# Verify the image works
docker run --rm cyhy-commander --help

# Run with a working directory (mount as volume)
docker run --rm -v ./work:/work cyhy-commander /work

The image runs as an unprivileged user (cisa, UID 1000) and supports read-only root filesystems. Mount writable paths as volumes for job working directories.

Working Directory

The default working directory is /work. The container's ENTRYPOINT invokes cyhy-commander /work by default.

The Commander creates the subdirectories done/, pushed/, failed/, and drop/ within the working directory on startup if they do not already exist.

Observability

cyhy-commander exposes Prometheus metrics and HTTP health check endpoints on a dedicated metrics port (default 9090). The metrics server runs as a daemon thread using a ThreadingWSGIServer, separate from the asyncio event loop.

Prometheus Metrics

The following metrics are exposed at GET /metrics in Prometheus exposition format:

Metric Name	Type	Labels	Description
`cyhy_commander_work_cycle_duration_seconds`	Histogram	—	Wall-clock duration of each work cycle iteration
`cyhy_commander_jobs_pushed_total`	Counter	`stage`	Jobs successfully pushed to scanner hosts
`cyhy_commander_jobs_pulled_total`	Counter	`stage`	Completed jobs pulled from scanner hosts
`cyhy_commander_jobs_failed_total`	Counter	`stage`	Jobs that completed with non-zero exit code
`cyhy_commander_host_errors_total`	Counter	`host`	SSH/rsync exceptions per scanner host
`cyhy_commander_ips_pushed_total`	Counter	`stage`	IP addresses pushed in job bundles
`cyhy_commander_ips_pulled_total`	Counter	`stage`, `status`	IP addresses in pulled jobs (success/failure)
`cyhy_commander_last_cycle_completed_timestamp_seconds`	Gauge	—	Unix timestamp of last successful cycle completion
`cyhy_commander_last_db_success_timestamp_seconds`	Gauge	—	Unix timestamp of last successful DB operation
`cyhy_commander_scanner_connection_status`	Gauge	`host`, `workgroup`	1 if last SSH/rsync to host succeeded, 0 if failed

The stage label takes values: NETSCAN1, NETSCAN2, PORTSCAN, VULNSCAN. The status label takes values: success, failure.

Health Check Endpoints

All health endpoints are served on the metrics port alongside /metrics.

Endpoint	Evaluates	Threshold	200 Response	503 Response
`GET /livez`	`last_cycle_completed_timestamp_seconds`	`CYHY_LIVENESS_THRESHOLD_SECONDS` (default 300s)	`ok` — last cycle within threshold, or first cycle not yet completed (startup grace)	`work cycle stale` — elapsed time since last cycle ≥ threshold
`GET /readyz`	`last_db_success_timestamp_seconds`	`CYHY_READINESS_THRESHOLD_SECONDS` (default 120s)	`ok` — last DB operation within threshold	`database connection stale` — elapsed time since last DB op ≥ threshold, or no DB op yet
`GET /startupz`	`_first_cycle_completed` flag	—	`ok` — first work cycle has completed	`first cycle not completed` — still initializing

Requests to any other path on the metrics port return HTTP 404 with an empty body.

Environment Variables

Variable	Purpose	Accepted Values	Default	Behavior When Unset
`CYHY_METRICS_PORT`	TCP port for the metrics/health server	Integer in range [1024, 65535]	`9090`	Uses default port 9090
`CYHY_LIVENESS_THRESHOLD_SECONDS`	Maximum seconds since last completed work cycle before `/livez` returns 503	Positive numeric value	`300`	Uses default of 300 seconds (5 minutes)
`CYHY_READINESS_THRESHOLD_SECONDS`	Maximum seconds since last successful DB operation before `/readyz` returns 503	Integer in range [1, 3600]	`120`	Uses default of 120 seconds (2 minutes)
`CYHY_METRICS_BEARER_TOKEN`	Optional bearer token for authenticating requests to `/metrics`	String (minimum 8 characters recommended)	Unset (no auth)	Metrics served without authentication; security relies on NetworkPolicy

If CYHY_METRICS_PORT, CYHY_LIVENESS_THRESHOLD_SECONDS, or CYHY_READINESS_THRESHOLD_SECONDS is set to an invalid value, the server logs a warning and falls back to the default.

If CYHY_METRICS_BEARER_TOKEN is set, all requests to /metrics must include an Authorization: Bearer <token> header matching the configured value. Requests with a missing, malformed, or incorrect token receive HTTP 401. Health endpoints (/livez, /readyz, /startupz) are never authenticated. If the token is shorter than 8 characters, a warning is logged but authentication is still enforced.

Kubernetes Deployment Example

The following manifest demonstrates httpGet probes, the /work volume mount, the metrics container port, and a NetworkPolicy restricting access to the metrics port:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cyhy-commander
  labels:
    app.kubernetes.io/name: cyhy-commander
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: cyhy-commander
  template:
    metadata:
      labels:
        app.kubernetes.io/name: cyhy-commander
    spec:
      containers:
        - name: cyhy-commander
          image: cyhy-commander:latest
          ports:
            - name: metrics
              containerPort: 9090
              protocol: TCP
          env:
            - name: CYHY_METRICS_PORT
              value: "9090"
            - name: CYHY_LIVENESS_THRESHOLD_SECONDS
              value: "300"
            - name: CYHY_READINESS_THRESHOLD_SECONDS
              value: "120"
          volumeMounts:
            - name: work
              mountPath: /work
          startupProbe:
            httpGet:
              path: /startupz
              port: 9090
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 30
          livenessProbe:
            httpGet:
              path: /livez
              port: 9090
            initialDelaySeconds: 0
            periodSeconds: 15
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /readyz
              port: 9090
            initialDelaySeconds: 0
            periodSeconds: 10
            failureThreshold: 3
      volumes:
        - name: work
          emptyDir: {}
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: cyhy-commander-metrics
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: cyhy-commander
  policyTypes:
    - Ingress
  ingress:
    - from:
        # Allow Prometheus scraper pods
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: prometheus
        # Allow kubelet health probes (node-level traffic)
        - ipBlock:
            cidr: 0.0.0.0/0
      ports:
        - protocol: TCP
          port: 9090

The startup probe allows up to 5 minutes (30 failures × 10s period) for the first work cycle to complete. While the startup probe has not yet succeeded, Kubernetes does not execute the liveness or readiness probes.

Docker HEALTHCHECK

For Docker Compose and non-Kubernetes environments, the image includes a lightweight healthcheck script at /home/cisa/scripts/healthcheck.py that queries the local metrics server using only the Python standard library:

# Liveness check (used by Docker HEALTHCHECK)
python3 /home/cisa/scripts/healthcheck.py liveness

# Readiness check
python3 /home/cisa/scripts/healthcheck.py readiness

Exit codes: 0 (healthy/ready), 1 (unhealthy/connection failure), 2 (invalid subcommand).

Development

# Install with test and dev extras
uv sync --extra test --extra dev

# Run all tests (unit, property, and integration)
uv run pytest

# Run pre-commit hooks
pre-commit run --all-files

License

This project is in the worldwide public domain (CC0 1.0).

Name		Name	Last commit message	Last commit date
Latest commit History 307 Commits
.github		.github
cyhy_commander		cyhy_commander
extras		extras
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.hadolint.yaml		.hadolint.yaml
.mdl_config.yaml		.mdl_config.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.yamllint		.yamllint
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cyhy-commander

Requirements

Installation

Configuration

Usage

Docker

Working Directory

Observability

Prometheus Metrics

Health Check Endpoints

Environment Variables

Kubernetes Deployment Example

Docker HEALTHCHECK

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cyhy-commander

Requirements

Installation

Configuration

Usage

Docker

Working Directory

Observability

Prometheus Metrics

Health Check Endpoints

Environment Variables

Kubernetes Deployment Example

Docker HEALTHCHECK

Development

License

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages