Skip to content

chore: add bandit Python source security audit#10

Merged
timzsu merged 12 commits into
mainfrom
zsu/bandit
May 1, 2026
Merged

chore: add bandit Python source security audit#10
timzsu merged 12 commits into
mainfrom
zsu/bandit

Conversation

@timzsu
Copy link
Copy Markdown
Collaborator

@timzsu timzsu commented May 1, 2026

Purpose

Adds bandit (Python source security audit) as the third leg of FlowMesh's security CI #6.

The job runs with no severity / confidence threshold: every finding either has a source-level fix, a documented skip in pyproject.toml's [tool.bandit] section with a written rationale, or a per-line # nosec BXXX paired with an inline TODO that names the planned fix. A bare # nosec (no rule code, no written reason) is disallowed.

Changes

  • src/... (16 files) — source-level fixes for every High/Medium bandit finding (commit d43ee03).
  • src/worker/power.py, src/worker/hw.py — replace the nvidia-smi subprocess shellouts with direct pynvml calls. power.py (commit d43ee03) was the original B607 site; hw.py (commit de9f63f, inlined into collect_hw in 21b5c91) was a B404/B603 site that also benefited from dropping the regex-based parsing of human-readable nvidia-smi output and the implicit nvidia-smi $PATH dependency.
  • pyproject.toml — adds [tool.bandit] with the skip list documented inline; promotes nvidia-ml-py from training-gpu to worker-core (the package is pure Python; the C lib only loads on nvmlInit, so importing on a CPU-only worker is harmless) and adds pynvml to the follow_untyped_imports mypy override since nvidia-ml-py ships no py.typed marker.
  • .github/workflows/security.yml — adds the bandit job pinned to bandit==1.9.4, run via uvx bandit -c pyproject.toml -r src/.
  • AGENTS.md — adds a "Security Rules (bandit-enforced)" subsection so new contributors know which patterns to follow without first triggering CI.
  • CONTRIBUTING.md — adds bandit to the Code Style tools table.

Design

Why no severity/confidence threshold

Threshold-based filtering hides findings without forcing a decision. We want every rule classified once — fix, skip-with-reason, or follow-up issue — and bandit clean from then on. New unclassified findings should fail CI until a contributor either fixes them or argues the skip.

Rule-by-rule walkthrough

For each rule bandit flagged on src/, here is what happened and why.

Fixed in source (this PR)

Rule What it flags Fix
B324 ×4 hashlib.md5(...) without explicit usedforsecurity Pass usedforsecurity=False. All four call sites are cache-key / fingerprint generators (worker selector jitter, tool cache key, file MD5 fingerprint), no security boundary.
B202 ×3 tarfile.extractall / zipfile.extractall on untrusted archives Tarfile: filter='data' (Python 3.12+, drops links/devices, blocks path traversal). Zipfile: iterate infolist(), validate each member resolves under destination, extract per-member.
B506 ×1 yaml.load(..., Loader=yaml.FullLoader) Switch to yaml.safe_load. The supervisor worker config has no need for arbitrary tag construction.
B614 ×1 torch.load(...) without weights_only=True Pass weights_only=True. The image-embedding loader receives tensors only; pickle deserialization would be RCE-on-untrusted-input.
B701 ×2 jinja2.Environment(...) without autoescape autoescape=jinja2.select_autoescape(). Both call sites currently render LLM prompts (non-HTML) where escaping is a no-op, but a future contributor copying these constructors for HTML output would be silently unsafe.
B310 ×1 urllib.request.urlopen(...) allows file:// and custom schemes Switch checkpoint downloader to requests and reject any URL scheme other than http / https. As a side effect, removes the only urllib.request dependency in checkpoints.py.
B113 ×5 requests.get/post/... with no timeout= Pass an explicit timeout= to every call. Hung connections are a denial-of-service: no implicit defaults.
B108 ×4 hardcoded /tmp/... literals Replace with tempfile.gettempdir() / os.path.join(tempfile.gettempdir(), ...). The one exception is ssh_executor._FINISH_SENTINEL_PATH, which is intentionally a path inside the SSH container and must remain /tmp/.flowmesh_finish (the contract with ssh-run.sh / ssh-session.sh); it is now constructed via PurePosixPath("/", "tmp", ".flowmesh_finish") so no string literal in the AST starts with /tmp/.
B607 ×1 + B404 ×1 + B603 ×1 nvidia-smi shellout in worker.power and worker.hw Both now call pynvml directly (nvmlInit, nvmlSystemGetDriverVersion, nvmlSystemGetCudaDriverVersion, nvmlDeviceGetHandleByIndex, nvmlDeviceGetMemoryInfo, nvmlDeviceGetPowerUsage). nvidia-ml-py moves from training-gpu to worker-core so every worker has it; runtime NVMLError handling preserves the previous "no GPU stack → empty result" behavior.

Skipped with rationale ([tool.bandit])

Rule Count Rationale (also in pyproject.toml)
B101 106 assert is used for internal invariants. We don't run with -O, so asserts execute. Replacing with raise would clutter paths where the precondition cannot fail under normal control flow.
B102 2 exec lives in agent worker sandboxes (Python executor toolkit) where running user-supplied code is the feature. Sandbox isolation lives at the container boundary, not the call site.
B104 7 0.0.0.0 bind. Server / worker entrypoints intentionally listen on all interfaces inside their containers; network exposure is controlled by the container/orchestration layer.
B107 1 Hardcoded password default. The flagged default is a placeholder for a config field that must be overridden in any non-test deployment; not a real credential.
B110 98 try/except/pass. Used in best-effort cleanup, optional metric collection, and shutdown paths where failure is intentionally swallowed to avoid masking the original error.
B112 4 try/except/continue. Same reasoning as B110, in iteration contexts where one bad input must not abort the loop.
B311 1 Pseudo-random generators are used only for non-cryptographic purposes (sampling, jitter, tie-breaking). No security boundary depends on randomness.
B307 2 eval is confined to the agent Python executor toolkit. Same reasoning as B102 — the guarantee is the sandbox, not the call site.
B603 5 subprocess.run with a list argument. Allowed because each call site uses an argv list (no shell=True) with controlled arguments; we audit the individual call sites manually.
B615 23 huggingface_hub downloads without a pinned revision. Models / datasets are user-supplied workflow inputs; pinning is the user's call, not ours. The framework cannot meaningfully pin on their behalf.

Tagged with # nosec B404 + TODO (not skipped globally)

B404 (import subprocess) is not in [tool.bandit] skips. The four remaining importers each carry an inline # nosec B404 — TODO: ... that names the planned fix, so the audit stays loud about new subprocess imports while letting these specific sites land:

  • sft_executor, dpo_executor, ppo_executor — torchrun launchers; TODO is to replace the shellout with in-process torch.distributed.run.main. Out of scope here because it's a meaningful rewrite of the distributed-training entry path.
  • worker/executors/utils/checkpoints.py — the optional pigz/tar acceleration in archive_model_dir; TODO is to drop it in favor of pure python tarfile. Already falls back to python tarfile when the binaries are absent, so removal is mostly a perf decision.

Test Plan

  • uvx bandit -c pyproject.toml -r src/ — must report No issues identified.
  • uvx --from zizmor==1.24.1 zizmor --persona pedantic --format github .github/workflows
  • uv run pre-commit run --all-files
  • uv run pytest tests/ --ignore=tests/worker/test_mp_executor_cleanup_gpu.py
  • The new bandit CI job must pass on this PR.
  • E2E sanity-check the pynvml refactor (the riskiest change, since neither worker.hw.collect_hw nor worker.power.PowerMonitor has unit-test coverage) by running the built flowmesh_worker:bandit-gpu image with and without --gpus, confirming the outputs match what the previous nvidia-smi-based code produced and that the no-GPU fallback still returns an empty GpuPlatformInfo.

Test Result

$ uvx bandit -c pyproject.toml -r src/
[main]  INFO    Found project level configuration file: pyproject.toml
...
Test results:
        No issues identified.
Code scanned:
        Total lines of code: 38742
Run metrics:
        Total issues (by severity): Undefined: 0  Low: 0  Medium: 0  High: 0
$ uv run pytest tests/ --ignore=tests/worker/test_mp_executor_cleanup_gpu.py
... 530 passed in 28.40s
$ uv run pre-commit run --all-files
gitleaks ............ Passed
isort ............... Passed
black ............... Passed
ruff check .......... Passed
codespell ........... Passed
mypy ................ Passed
sync requirements ... Passed
$ uvx --from zizmor==1.24.1 zizmor --persona pedantic .github/workflows
🌈 completed all 7 workflows  (no findings)

E2E inside the freshly-built flowmesh_worker:bandit-gpu image on an H200 host (GPU 2):

$ docker run --rm --gpus '"device=2"' --entrypoint /bin/bash -w /app \
    ghcr.io/mlsys-io/flowmesh_worker:bandit-gpu \
    -c "python -m worker.main --collect-hw"
{"cpu":{"logical_cores":512,"model":"x86_64"},"memory":{"total_bytes":2434389880832},
 "gpu":{"driver_version":"580.126.09","cuda_version":"13.0",
        "gpus":[{"index":0,"name":"NVIDIA H200 NVL",
                 "uuid":"GPU-d2ff8dc7-...","memory_total_bytes":150754820096}]},
 "network":{"ip":"172.17.0.2","bandwidth_bytes_per_sec":null}}
$ docker run --rm --gpus '"device=2"' --entrypoint /bin/bash -w /app \
    ghcr.io/mlsys-io/flowmesh_worker:bandit-gpu \
    -c "python -c 'from worker.power import PowerMonitor; import json, time; \
                   m = PowerMonitor(); m.sample(); time.sleep(1); \
                   print(json.dumps(m.sample(), indent=2))'"
{
  "timestamp": "2026-05-01T03:31:44.812848+00:00",
  "cpu_watts": null,
  "gpu_watts": {"total": 464.109, "per_gpu": [{"index": 0, "power_w": 464.109}]}
}
$ docker run --rm --entrypoint /bin/bash -w /app \
    ghcr.io/mlsys-io/flowmesh_worker:bandit-gpu \
    -c "python -m worker.main --collect-hw"
... (no GPU passthrough) ...
{"cpu":{...},"memory":{...},
 "gpu":{"driver_version":null,"cuda_version":null,"gpus":[]},
 "network":{...}}

Driver version, CUDA version, GPU name, UUID, and memory total all match the host's nvidia-smi. The no-GPU run returns an empty GpuPlatformInfo, matching the previous "nvidia-smi missing → empty list" path. PowerMonitor.sample() returns a real per-GPU watt reading from the live H200 (~464 W).


Pre-submission Checklist
  • I have read the contribution guidelines.
  • I have run pre-commit run --all-files and fixed any issues.
  • I have added or updated tests covering my changes (if applicable). New CI job is itself the test; e2e validation of the pynvml refactor is in Test Result.
  • I have verified that uv run pytest tests/ passes locally.
  • If I changed shared schemas or proto definitions, I have checked downstream compatibility across Server and Worker. N/A.
  • If I changed the SDK or CLI, I have verified the affected packages work. N/A.
  • If this is a breaking change, I have prefixed the PR title with [BREAKING]. Not breaking.
  • I have updated documentation or config examples if user-facing behavior changed. AGENTS.md + CONTRIBUTING.md.

timzsu added 2 commits April 30, 2026 16:10
Source-level fixes for every High and Medium bandit finding the audit
will surface once it is enabled in CI.

- B324: pass usedforsecurity=False to non-crypto MD5 calls
- B202: extract tar archives with filter='data'; iterate zip members
  with destination-bound validation instead of bare extractall
- B506: replace yaml.load(FullLoader) with yaml.safe_load
- B614: pass weights_only=True to torch.load
- B701: enable jinja2 select_autoescape on Environment construction
- B310: switch checkpoint downloader from urllib.urlopen to requests
  and reject schemes other than http/https
- B113: pass timeout to every requests call
- B108: build /tmp paths from tempfile.gettempdir() or PurePosixPath
  segments rather than as hardcoded string constants
- B607: replace nvidia-smi subprocess in worker.power with pynvml

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Adds a third security job alongside zizmor (workflow audit) and gitleaks
(committed-secret scan): bandit, run with no severity / confidence
threshold against src/.

Every rule that bandit raises has either been fixed in the previous
commit or is explicitly skipped in pyproject.toml's [tool.bandit]
section with a written rationale. Per-line # nosec is disallowed —
silencing a finding without a written rationale defeats the audit.

The full rule-by-rule walkthrough lives in the PR description.
AGENTS.md gains a "Security Rules (bandit-enforced)" subsection so
future contributors know which patterns to follow without having to
trigger CI to discover the constraint.

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
@timzsu timzsu changed the title chore(ci): add bandit Python source security audit chore: add bandit Python source security audit May 1, 2026
timzsu added 4 commits May 1, 2026 01:48
Removes the last subprocess shellout outside the trainer torchrun
launchers and the optional pigz/tar archive accelerator. `collect_hw`
now queries pynvml directly for driver / CUDA version, GPU enumeration,
and per-device memory totals — dropping the implicit `nvidia-smi`
$PATH dependency, the regex-based parsing of human-readable output,
and a layer of subprocess error swallowing.

Behavior on hosts without a GPU stack is unchanged: ImportError on
pynvml or NVMLError on init returns an empty GpuPlatformInfo, matching
the previous code's "no nvidia-smi → empty list" path.

Tightens the [tool.bandit] B404 rationale: the remaining subprocess
importers are the SFT/DPO/PPO trainer executors (torchrun, deferred)
and the model-archive packer (optional pigz/tar acceleration with a
python-tarfile fallback) — not docker/git, which already use SDKs.

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
pynvml is pure Python (the C lib only loads on nvmlInit), so importing
it on a GPU-less host is harmless. Promote nvidia-ml-py from
training-gpu to worker-core — every worker now has it available — and
import it at module top in worker.power and worker.hw. The runtime
NVMLError handling for hosts without an actual GPU stack is preserved.

Also inlines the small _safe_str / _decode / _format_cuda_version
helpers that were only ever used once each into _collect_gpu_info.

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
B404 (subprocess module imports) is no longer in [tool.bandit] skips.
The four remaining importers each get a `# nosec B404` paired with an
inline TODO naming the planned fix:

- sft/dpo/ppo trainer executors: replace torchrun shellout with
  in-process torch.distributed.run.main
- model-archive packer: drop the optional pigz/tar acceleration in
  favor of pure python tarfile

The audit policy in AGENTS.md gains an explicit clause: per-line
`# nosec BXXX` is allowed when paired with a TODO; a bare `# nosec`
without a rule code and a written reason is not.

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
@timzsu timzsu marked this pull request as ready for review May 1, 2026 03:45
@timzsu timzsu requested a review from kaiitunnz May 1, 2026 03:58
Copy link
Copy Markdown
Collaborator

@kaiitunnz kaiitunnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. PTAL.

Comment thread src/worker/executors/agent/utu/tools/codesnip_toolkit.py Outdated
Comment thread src/worker/executors/utils/checkpoints.py Outdated
Comment thread src/worker/executors/ssh_executor.py Outdated
Comment thread src/worker/hw.py Outdated
Comment thread src/worker/hw.py Outdated
Comment thread pyproject.toml
timzsu added 5 commits May 1, 2026 06:09
- codesnip_toolkit: load timeout from config (default 600s) instead of
  hardcoding 60s — code execution can be long-running
- checkpoints.download_and_unpack: chunk_size=64 * 1024 to match the
  rest of the worker's iter_content sites
- ssh_executor._FINISH_SENTINEL_PATH: use PurePosixPath.as_posix()
  instead of str() for consistency with the rest of the file
- hw.collect_hw: nvmlSystemGetCudaDriverVersion already returns int;
  drop the int(...) cast and lift the memory_info try/except up one
  level so it sits as a sibling of the handle/name/uuid try/except
- pyproject.toml [tool.bandit].skips: collapse each rationale onto a
  single inline trailing comment

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Pulls request timeouts from the toolkit's `config.config["timeout"]`
(matching `bash_toolkit`'s existing pattern) for the three remaining
agent toolkits whose timeouts were hardcoded by the B113 fix:
github (30s), wikipedia (30s), image (15s). Each toolkit YAML gains
the corresponding `timeout:` knob so the default is discoverable.

`FileUtils.download_file` and `FileUtils.get_file_md5` are static
helpers with no `self.config`, so they take an optional `timeout`
argument with the same default the B113 fix introduced (60s).

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
A single try/except NVMLError now wraps every pynvml call in
collect_hw — init, driver/CUDA queries, device enumeration, and
per-device probes. Any NVMLError anywhere along the chain bails out
and returns empty defaults. nvmlShutdown is dropped to match
worker.power's precedent (init lazily, let process exit reclaim).

Behavior change: a mid-loop failure now drops subsequent GPUs from
the list instead of skipping just the failing one. In practice, the
NVML state is binary — either healthy or not — so a partial GPU
listing in a degraded state is no more useful than an empty one.

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
…power

Same shape as the worker/hw simplification: a single try/except
NVMLError now wraps the device count, handle lookup, and per-device
power query. Any NVMLError mid-loop returns whatever was sampled
before the failure rather than skipping just the failing device.

Also drops the redundant `int(...)` cast on
`pynvml.nvmlDeviceGetMemoryInfo(handle).total` in worker/hw — `.total`
already returns a Python int.

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Copy link
Copy Markdown
Collaborator

@kaiitunnz kaiitunnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a few minor comments.

Comment thread src/worker/hw.py
Comment thread src/worker/power.py Outdated
…ndle cache write

- worker/hw.py: cast `pynvml.nvmlDeviceGetMemoryInfo(handle).total` back to
  int. pynvml ships no type stubs, so Pylance infers
  `<subclass of bytes and str> | str | Any` without the cast and warns at
  the GpuInfo construction site.
- worker/power._read_gpu_power: only assign `_nvml_handles[idx]` when we
  actually fetched a new handle, instead of writing back unconditionally
  on every sample.

Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Copy link
Copy Markdown
Collaborator

@kaiitunnz kaiitunnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@timzsu timzsu merged commit 18b9d45 into main May 1, 2026
9 checks passed
@timzsu timzsu deleted the zsu/bandit branch May 1, 2026 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants