chore: add bandit Python source security audit#10
Merged
Merged
Conversation
Source-level fixes for every High and Medium bandit finding the audit will surface once it is enabled in CI. - B324: pass usedforsecurity=False to non-crypto MD5 calls - B202: extract tar archives with filter='data'; iterate zip members with destination-bound validation instead of bare extractall - B506: replace yaml.load(FullLoader) with yaml.safe_load - B614: pass weights_only=True to torch.load - B701: enable jinja2 select_autoescape on Environment construction - B310: switch checkpoint downloader from urllib.urlopen to requests and reject schemes other than http/https - B113: pass timeout to every requests call - B108: build /tmp paths from tempfile.gettempdir() or PurePosixPath segments rather than as hardcoded string constants - B607: replace nvidia-smi subprocess in worker.power with pynvml Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Adds a third security job alongside zizmor (workflow audit) and gitleaks (committed-secret scan): bandit, run with no severity / confidence threshold against src/. Every rule that bandit raises has either been fixed in the previous commit or is explicitly skipped in pyproject.toml's [tool.bandit] section with a written rationale. Per-line # nosec is disallowed — silencing a finding without a written rationale defeats the audit. The full rule-by-rule walkthrough lives in the PR description. AGENTS.md gains a "Security Rules (bandit-enforced)" subsection so future contributors know which patterns to follow without having to trigger CI to discover the constraint. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Removes the last subprocess shellout outside the trainer torchrun launchers and the optional pigz/tar archive accelerator. `collect_hw` now queries pynvml directly for driver / CUDA version, GPU enumeration, and per-device memory totals — dropping the implicit `nvidia-smi` $PATH dependency, the regex-based parsing of human-readable output, and a layer of subprocess error swallowing. Behavior on hosts without a GPU stack is unchanged: ImportError on pynvml or NVMLError on init returns an empty GpuPlatformInfo, matching the previous code's "no nvidia-smi → empty list" path. Tightens the [tool.bandit] B404 rationale: the remaining subprocess importers are the SFT/DPO/PPO trainer executors (torchrun, deferred) and the model-archive packer (optional pigz/tar acceleration with a python-tarfile fallback) — not docker/git, which already use SDKs. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
pynvml is pure Python (the C lib only loads on nvmlInit), so importing it on a GPU-less host is harmless. Promote nvidia-ml-py from training-gpu to worker-core — every worker now has it available — and import it at module top in worker.power and worker.hw. The runtime NVMLError handling for hosts without an actual GPU stack is preserved. Also inlines the small _safe_str / _decode / _format_cuda_version helpers that were only ever used once each into _collect_gpu_info. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
B404 (subprocess module imports) is no longer in [tool.bandit] skips. The four remaining importers each get a `# nosec B404` paired with an inline TODO naming the planned fix: - sft/dpo/ppo trainer executors: replace torchrun shellout with in-process torch.distributed.run.main - model-archive packer: drop the optional pigz/tar acceleration in favor of pure python tarfile The audit policy in AGENTS.md gains an explicit clause: per-line `# nosec BXXX` is allowed when paired with a TODO; a bare `# nosec` without a rule code and a written reason is not. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
kaiitunnz
requested changes
May 1, 2026
Collaborator
kaiitunnz
left a comment
There was a problem hiding this comment.
Minor comments. PTAL.
- codesnip_toolkit: load timeout from config (default 600s) instead of hardcoding 60s — code execution can be long-running - checkpoints.download_and_unpack: chunk_size=64 * 1024 to match the rest of the worker's iter_content sites - ssh_executor._FINISH_SENTINEL_PATH: use PurePosixPath.as_posix() instead of str() for consistency with the rest of the file - hw.collect_hw: nvmlSystemGetCudaDriverVersion already returns int; drop the int(...) cast and lift the memory_info try/except up one level so it sits as a sibling of the handle/name/uuid try/except - pyproject.toml [tool.bandit].skips: collapse each rationale onto a single inline trailing comment Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
Pulls request timeouts from the toolkit's `config.config["timeout"]` (matching `bash_toolkit`'s existing pattern) for the three remaining agent toolkits whose timeouts were hardcoded by the B113 fix: github (30s), wikipedia (30s), image (15s). Each toolkit YAML gains the corresponding `timeout:` knob so the default is discoverable. `FileUtils.download_file` and `FileUtils.get_file_md5` are static helpers with no `self.config`, so they take an optional `timeout` argument with the same default the B113 fix introduced (60s). Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
A single try/except NVMLError now wraps every pynvml call in collect_hw — init, driver/CUDA queries, device enumeration, and per-device probes. Any NVMLError anywhere along the chain bails out and returns empty defaults. nvmlShutdown is dropped to match worker.power's precedent (init lazily, let process exit reclaim). Behavior change: a mid-loop failure now drops subsequent GPUs from the list instead of skipping just the failing one. In practice, the NVML state is binary — either healthy or not — so a partial GPU listing in a degraded state is no more useful than an empty one. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
…power Same shape as the worker/hw simplification: a single try/except NVMLError now wraps the device count, handle lookup, and per-device power query. Any NVMLError mid-loop returns whatever was sampled before the failure rather than skipping just the failing device. Also drops the redundant `int(...)` cast on `pynvml.nvmlDeviceGetMemoryInfo(handle).total` in worker/hw — `.total` already returns a Python int. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
7 tasks
7 tasks
7 tasks
kaiitunnz
requested changes
May 1, 2026
Collaborator
kaiitunnz
left a comment
There was a problem hiding this comment.
Still a few minor comments.
…ndle cache write - worker/hw.py: cast `pynvml.nvmlDeviceGetMemoryInfo(handle).total` back to int. pynvml ships no type stubs, so Pylance infers `<subclass of bytes and str> | str | Any` without the cast and warns at the GpuInfo construction site. - worker/power._read_gpu_power: only assign `_nvml_handles[idx]` when we actually fetched a new handle, instead of writing back unconditionally on every sample. Signed-off-by: Zhengyuan Su <su.zhengyuan@u.nus.edu>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Adds
bandit(Python source security audit) as the third leg of FlowMesh's security CI #6.The job runs with no severity / confidence threshold: every finding either has a source-level fix, a documented skip in
pyproject.toml's[tool.bandit]section with a written rationale, or a per-line# nosec BXXXpaired with an inline TODO that names the planned fix. A bare# nosec(no rule code, no written reason) is disallowed.Changes
src/...(16 files) — source-level fixes for every High/Medium bandit finding (commitd43ee03).src/worker/power.py,src/worker/hw.py— replace thenvidia-smisubprocess shellouts with directpynvmlcalls.power.py(commitd43ee03) was the original B607 site;hw.py(commitde9f63f, inlined intocollect_hwin21b5c91) was a B404/B603 site that also benefited from dropping the regex-based parsing of human-readablenvidia-smioutput and the implicitnvidia-smi$PATHdependency.pyproject.toml— adds[tool.bandit]with the skip list documented inline; promotesnvidia-ml-pyfromtraining-gputoworker-core(the package is pure Python; the C lib only loads onnvmlInit, so importing on a CPU-only worker is harmless) and addspynvmlto thefollow_untyped_importsmypy override sincenvidia-ml-pyships nopy.typedmarker..github/workflows/security.yml— adds thebanditjob pinned tobandit==1.9.4, run viauvx bandit -c pyproject.toml -r src/.AGENTS.md— adds a "Security Rules (bandit-enforced)" subsection so new contributors know which patterns to follow without first triggering CI.CONTRIBUTING.md— adds bandit to the Code Style tools table.Design
Why no severity/confidence threshold
Threshold-based filtering hides findings without forcing a decision. We want every rule classified once — fix, skip-with-reason, or follow-up issue — and bandit clean from then on. New unclassified findings should fail CI until a contributor either fixes them or argues the skip.
Rule-by-rule walkthrough
For each rule bandit flagged on
src/, here is what happened and why.Fixed in source (this PR)
hashlib.md5(...)without explicitusedforsecurityusedforsecurity=False. All four call sites are cache-key / fingerprint generators (worker selector jitter, tool cache key, file MD5 fingerprint), no security boundary.tarfile.extractall/zipfile.extractallon untrusted archivesfilter='data'(Python 3.12+, drops links/devices, blocks path traversal). Zipfile: iterateinfolist(), validate each member resolves under destination, extract per-member.yaml.load(..., Loader=yaml.FullLoader)yaml.safe_load. The supervisor worker config has no need for arbitrary tag construction.torch.load(...)withoutweights_only=Trueweights_only=True. The image-embedding loader receives tensors only; pickle deserialization would be RCE-on-untrusted-input.jinja2.Environment(...)without autoescapeautoescape=jinja2.select_autoescape(). Both call sites currently render LLM prompts (non-HTML) where escaping is a no-op, but a future contributor copying these constructors for HTML output would be silently unsafe.urllib.request.urlopen(...)allowsfile://and custom schemesrequestsand reject any URL scheme other thanhttp/https. As a side effect, removes the onlyurllib.requestdependency incheckpoints.py.requests.get/post/...with notimeout=timeout=to every call. Hung connections are a denial-of-service: no implicit defaults./tmp/...literalstempfile.gettempdir()/os.path.join(tempfile.gettempdir(), ...). The one exception isssh_executor._FINISH_SENTINEL_PATH, which is intentionally a path inside the SSH container and must remain/tmp/.flowmesh_finish(the contract withssh-run.sh/ssh-session.sh); it is now constructed viaPurePosixPath("/", "tmp", ".flowmesh_finish")so no string literal in the AST starts with/tmp/.nvidia-smishellout inworker.powerandworker.hwpynvmldirectly (nvmlInit,nvmlSystemGetDriverVersion,nvmlSystemGetCudaDriverVersion,nvmlDeviceGetHandleByIndex,nvmlDeviceGetMemoryInfo,nvmlDeviceGetPowerUsage).nvidia-ml-pymoves fromtraining-gputoworker-coreso every worker has it; runtimeNVMLErrorhandling preserves the previous "no GPU stack → empty result" behavior.Skipped with rationale (
[tool.bandit])pyproject.toml)assertis used for internal invariants. We don't run with-O, so asserts execute. Replacing withraisewould clutter paths where the precondition cannot fail under normal control flow.execlives in agent worker sandboxes (Python executor toolkit) where running user-supplied code is the feature. Sandbox isolation lives at the container boundary, not the call site.0.0.0.0bind. Server / worker entrypoints intentionally listen on all interfaces inside their containers; network exposure is controlled by the container/orchestration layer.try/except/pass. Used in best-effort cleanup, optional metric collection, and shutdown paths where failure is intentionally swallowed to avoid masking the original error.try/except/continue. Same reasoning as B110, in iteration contexts where one bad input must not abort the loop.evalis confined to the agent Python executor toolkit. Same reasoning as B102 — the guarantee is the sandbox, not the call site.subprocess.runwith a list argument. Allowed because each call site uses an argv list (noshell=True) with controlled arguments; we audit the individual call sites manually.huggingface_hubdownloads without a pinned revision. Models / datasets are user-supplied workflow inputs; pinning is the user's call, not ours. The framework cannot meaningfully pin on their behalf.Tagged with
# nosec B404+ TODO (not skipped globally)B404 (
import subprocess) is not in[tool.bandit]skips. The four remaining importers each carry an inline# nosec B404 — TODO: ...that names the planned fix, so the audit stays loud about new subprocess imports while letting these specific sites land:sft_executor,dpo_executor,ppo_executor— torchrun launchers; TODO is to replace the shellout with in-processtorch.distributed.run.main. Out of scope here because it's a meaningful rewrite of the distributed-training entry path.worker/executors/utils/checkpoints.py— the optionalpigz/taracceleration inarchive_model_dir; TODO is to drop it in favor of pure pythontarfile. Already falls back to pythontarfilewhen the binaries are absent, so removal is mostly a perf decision.Test Plan
uvx bandit -c pyproject.toml -r src/— must reportNo issues identified.uvx --from zizmor==1.24.1 zizmor --persona pedantic --format github .github/workflowsuv run pre-commit run --all-filesuv run pytest tests/ --ignore=tests/worker/test_mp_executor_cleanup_gpu.pybanditCI job must pass on this PR.pynvmlrefactor (the riskiest change, since neitherworker.hw.collect_hwnorworker.power.PowerMonitorhas unit-test coverage) by running the builtflowmesh_worker:bandit-gpuimage with and without--gpus, confirming the outputs match what the previousnvidia-smi-based code produced and that the no-GPU fallback still returns an emptyGpuPlatformInfo.Test Result
E2E inside the freshly-built
flowmesh_worker:bandit-gpuimage on an H200 host (GPU 2):Driver version, CUDA version, GPU name, UUID, and memory total all match the host's
nvidia-smi. The no-GPU run returns an emptyGpuPlatformInfo, matching the previous "nvidia-smi missing → empty list" path.PowerMonitor.sample()returns a real per-GPU watt reading from the live H200 (~464 W).Pre-submission Checklist
pre-commit run --all-filesand fixed any issues.uv run pytest tests/passes locally.[BREAKING]. Not breaking.