v2.6.4 - Improved fuzzing infrastructure
[2.6.4] - 2026-04-30
Summary
Blue-Tap 2.6.4 turns the fuzzer into a research-grade tool: a typed Python API (run_campaign, benchmark, MockTransport, CampaignResult, BenchmarkResult) for driving campaigns from notebooks and CI, byte-level reproducibility through a ContextVar-scoped random source, an in-process mock transport that lets the full mutation/strategy/state-tracker pipeline run with zero hardware, and a new fuzz benchmark subcommand that runs N independent trials and aggregates per-metric statistics. Three CLI bugs are fixed along the way (sub-help dispatcher under target subcommand groups, run-playbook exit code on failure, and a root-gate bypass for test runners).
Researchers / scripting users: the new programmatic surface lives at
blue_tap.modules.fuzzing(run_campaign,benchmark,compare_campaigns,compare_benchmarks,CampaignResult,BenchmarkResult,MockTransport). It bypasses RunContext / RunEnvelope, so it's a thin wrapper for ablation studies and not a replacement for the CLI / playbook flow.
Added — Programmatic research API
run_campaign(target, protocols, *, strategy, duration, max_iterations, session_dir, cooldown, seed, dry_run, random_source, trajectory_interval_seconds, ...) -> CampaignResult— drives a singleFuzzCampaignand returns a typed result. CatchesKeyboardInterrupt(recordsaborted=True) and any other exception (recordserror=...) so batch experiments survive single bad runs. Surfaces engine-side terminal failures ({"result": "error", "reason": "..."}) intoCampaignResult.errorrather than returning an all-zero "successful" result.benchmark(target, protocols, *, strategy, trials, base_seed, ...) -> BenchmarkResult— runs N independent trials with seedsbase_seed, base_seed+1, …, aggregatescrashes,crashes_per_kpkt,iterations,packets_sent,runtime_seconds, andstates_discoveredinto per-metric(n, mean, stdev, median, min, max)dicts. Errored / aborted trials are kept inBenchmarkResult.trialsand counted separately so callers can decide whether to discard them.compare_campaigns(a, b) -> CampaignDelta— per-metric delta (positive =b > a).compare_benchmarks(a, b) -> BenchmarkComparison— Cohen's d on the per-trialcrashes_per_kpktseries with pooled stdev plus a conventional effect-size label (negligible/small/medium/large).CampaignResult.to_csv(path)— atomically writes trajectory rows (elapsed_seconds,iterations,packets_sent,crashes,errors,states,transitions) as CSV. Empty trajectory still produces a header-only file. Fixed column order, missing keys become empty cells, extra keys are dropped.CampaignResult.to_json()/from_json()andBenchmarkResult.to_json()/from_json()— round-trip every field exceptraw_summary(which is omitted fromto_dictby design).BenchmarkResult.from_dictrejects compact-form payloads (notrialskey) — aggregate stats alone can't reconstruct per-trial state.MockTransport— in-process subclass ofBluetoothTransportused underdry_run=True. Boundedcollections.dequefor sent payloads (default 64), validatessend_buffer_len ≥ 1, rejects non-bytessends, validates that response factories returnbytes, swallows factory exceptions and surfaces them asrecv() → None. Documented thread-safety contract.list_strategies()/list_protocols()— typed introspection of registered strategies and protocols for menu-driven UIs.
Added — Engine reproducibility plumbing
blue_tap.modules.fuzzing._randommodule — canonical random-source mechanism for the fuzzing tree. Holds the active source in acontextvars.ContextVarso concurrent campaigns in different threads / async contexts don't cross-contaminate.random_bytes(n)— replaces everyos.urandom(n)call site inmutators.py,engine.py,protocols/{att,bnep,lmp,smp}.py, andstrategies/{coverage_guided,random_walk,state_machine,targeted}.py. Validatesn ≥ 0at the boundary.set_random_source(callable)context manager — installs a pluggableCallable[[int], bytes]for the lifetime of the with-block. Restores on exit (including exception paths). Validates that the source is callable.derive_random_source_from_seed(seed)— canonical seed → byte-source mapping shared by bothrun_campaign(seed=…)and thefuzz campaign --seedCLI flag, so the two entry points are guaranteed to produce identical streams for the same seed.
- Byte-level reproducibility contract.
seed=N(orBLUE_TAP_FUZZ_SEED=N) seeds the globalrandommodule and installs arandom.Random(N).randbytes-backed source through the ContextVar. Every strategy / mutator / protocol builder reads bytes through that single source — two runs with the same seed produce byte-identical fuzz payloads, not just statistically similar ones. Wall-clock fields (runtime_seconds,packets_per_second) are explicitly not part of the contract. l2cap_raw.pytwo-layer split._STRUCTURAL_L2CAP_RAW_FUZZ_TESTSis a module-level constant (built once at import);_random_l2cap_raw_fuzz_tests()is a function that re-evaluates the random-data entries against the currently active random source.generate_all_l2cap_sig_fuzz_cases()takes one coherent snapshot of the test list and threads it through a new_frames_matching_in(tests, prefixes)helper, so the dedup pass keeps exactly oneecho_oversizedvariant per call instead of multiple independently-randomised variants.
Added — fuzz campaign flags
--dry-run— runs the full mutation / strategy / state-tracker pipeline againstMockTransport. Bypasses the pre-loopl2pingreachability check, skips crash-recovery liveness probes, pins response latency to0.0(so the response analyzer's clustering is also deterministic), and accepts a placeholder target if none is supplied. Intended for CI smoke tests and reproducibility sweeps. Disables--captureautomatically and is rejected in combination with--resume.--seed N— integer seed for byte-level reproducible mutations. Falls back toBLUE_TAP_FUZZ_SEEDif unset. Rejected in combination with--resume(seed isn't part of the persisted campaign state, and silently ignoring it would mislead the operator). Hex (0x2a), octal (0o52), and decimal literals are all accepted.--trajectory-interval SECONDS— samples(elapsed_seconds, iterations, packets_sent, crashes, errors, states, transitions)at most once perNseconds inside the main loop; results land inCampaignResult.trajectory. Required for non-emptyto_csvoutput.
Added — fuzz benchmark subcommand
blue-tap fuzz benchmark TARGET [-p PROTO]... [-s STRATEGY] -t TRIALS (-d DURATION | -n ITERATIONS) [--base-seed N] [--label TEXT] [-o BENCH.json] [--csv-dir DIR] [--cooldown N] [--dry-run] [--trajectory-interval SECONDS]— runs N independent trials of the same configuration and aggregates per-metric stats. Prints a Rich summary table (crashes,crashes_per_kpkt,iterations,packets_sent,runtime_seconds,states_discovered; one row each asn / mean / stdev / min / max), atomically writes the round-trippableBenchmarkResultJSON when-ois given, and writes per-trialtrial_{i}.csvtrajectory files into--csv-dir(paired with--trajectory-intervalto make the rows non-empty). Aborted / errored trials still appear in the aggregate but are also surfaced as a separate operator warning. Logs to the active session undercategory="fuzz".
Added — Environment variables
BLUE_TAP_FUZZ_SEED— default seed forrun_campaign,fuzz campaign --seed, andfuzz benchmark --base-seedwhen no explicit value is passed. Accepts decimal,0x-hex, or0o-octal. Validated at the boundary in both Python (ValueError) and the CLI (click.BadParameter). Forbenchmark, the env var resolves tobase_seedso successive trials still get distinct seeds (base, base+1, base+2, …) instead of the same seed being applied identically to every trial.BLUE_TAP_SKIP_ROOT_CHECK=1— bypasses the root and RTL8761B chipset gates in the CLI. Intended for test runners and CI smoke tests under--dry-run. Picked up by both_check_privileges()and_check_rtl_dongle()ininterfaces/cli/main.pyand pre-set intests/conftest.pyso the pytest suite can run withoutsudo.
Fixed — CLI
TargetSubcommandGroupno longer mis-parses target + subcommand combinations.blue-tap recon AA:BB:CC:DD:EE:FF sdp --helppreviously producedNo such command ''.because the empty-TARGETplaceholder was injected even after a real positional target had been seen, pushing the subcommand into the wrong slot. The peeker now tracksseen_positionaland only injects the placeholder when the subcommand name is the first positional. Value-flag detection runs before the positional check so--protocol sdp reconno longer treatssdpas a positional. Verified:recon/exploit/extractsub-help dispatches now exit 0 with the correct help text.run-playbookexit code on partial failure. A 1-of-1 failed playbook used to print✖ Failed: 1and exit 0 because thelog_commandcall swallowed the failure. The runner now raisesSystemExit(1)after logging when any step fails, so CI can rely on the exit code.
Changed — CLI defaults
fuzz campaignandfuzz benchmarkshare the same default protocol set. Both default to-p all(all 16 registered protocols). Earlier benchmark drafts defaulted to-p sdpfor variance-on-one-surface use cases — that choice is left to the operator (-p sdp -p rfcomm) rather than baked into the CLI default, matching the existingfuzz campaignbehaviour.
Tests
tests/test_fuzzing_research_api.py(new) — 52 tests covering the research API surface:CampaignResult/CampaignDelta/BenchmarkResult/BenchmarkComparisonshape and round-trip;MockTransportvalidation;dry_runend-to-end viarun_campaignandbenchmark; engine-error →CampaignResult.errorpropagation;set_random_sourcerestore-on-exception;random_bytesnon-negative-length validation; byte-level reproducibility (same seed → same protocol breakdown across runs, different seeds → different breakdowns);BLUE_TAP_FUZZ_SEEDenv-var resolution including precedence over and conflict with explicit seed kwargs;derive_random_source_from_seeddeterminism;to_csvempty/full trajectory, fixed column order, atomic write (no temp leftovers), missing-parent-dir error.tests/test_fuzz_cli_v2_6_4.py(new) — CLI integration tests for the new flags:--dry-run+--seedreproducibility through the real Click command,--resume×--dry-runrejection,--resume×--seedrejection,--dry-runautomatically disabling--capture, env-var seed (0xdeadbeef) flowing through toSeed locked: ..., env-var validation rejecting non-integer values, fullfuzz benchmarkround-trip (JSON + per-trial CSVs + summary table),--csv-dirwarning when--trajectory-intervalis missing, mutually-exclusive--duration/--iterationsvalidation, env-base-seed driving aggregate-stat reproducibility across two benchmark runs,--labelflowing through toBenchmarkResult.label.- Test counts: 442 → 456 passing, 1 skipped, 0 failing. Lint (
ruff) clean.