v0.7.0

hexsecs released this 09 May 02:09

· 103 commits to main since this release

f257f71

Added

Added canarchy datasets stream for chunked streaming of downloaded dataset files to candump or JSONL, including JSONL provenance metadata for provider refs, frame offsets, and chunk positions so large datasets can be piped into analysis without buffering the full conversion in memory.
Added candid (CANdid) dataset to the built-in catalog: VehicleSec 2025 paper dataset with 10-passenger-vehicle candump-format CAN logs, annotations, GPS, metadata, and video. Ref: catalog:candid. Closes #225.
Added canarchy datasets replay for Netflix-style streaming playback from a direct candump URL or replayable dataset ref such as catalog:candid, with candump/JSONL stdout output, rate control, and JSON summary mode. Closes #233.
Added canarchy datasets replay --dry-run so operators and agents can resolve replay metadata for dataset refs or direct URLs without opening the remote stream. Closes #247.
Added canarchy datasets replay --max-seconds and JSONL replay provenance metadata so operators can time-bound remote replay and trace emitted frames back to their dataset ref or URL. Closes #245.
Added replay file manifests and canarchy datasets replay --list-files / --file <id-or-name> selection for replayable dataset refs such as catalog:candid. Closes #241.
Added candump as a public canarchy datasets stream --source-format option so downloaded candump logs can be streamed to candump or JSONL with provenance metadata. Closes #243.
Added stable machine-friendly dataset JSON fields (ref, is_replayable, is_index, default_replay_file, download_url_available, and source_type) for dataset search and inspect results. Closes #242.
Added MCP tools for dataset provider discovery, search, inspect, fetch, cache operations, and safe replay planning so agents can use dataset metadata without shelling out. Closes #239.
Added pivot-auto-datasets to the built-in catalog as a curated source index for CAN, CAN-FD, J1939, and broader automotive datasets listed by the PIVOT Project. Closes #235.
Added human-readable dataset search and inspect output improvements: empty search shows "All datasets" instead of "Datasets matching "all"", search table includes TYPE column (INDEX/PLAY), verbose output shows type labels and index notes, and inspect output now has separated sections (Basic information, Format support, Source information) with clear replay URLs and index notes. Closes #244.
Clarified datasets fetch output for curated index entries: responses now include is_index field, index_instructions with guidance to visit the index page and discover datasets, and clearer human-readable messaging. Normal dataset fetch continues to use download_instructions. Closes #246.
Added stdin pipeline support for file-backed analysis commands: capture-info, stats, filter, and other commands now accept - as file argument to read candump data from stdin. This enables piping datasets replay output directly into analysis commands without temporary files. Closes #238.
Exposed additional CLI commands as MCP tools: j1939 compare, j1939 faults, j1939 tp compare, re signals, datasets convert, datasets replay --list-files, and full skills provider/cache/search/fetch workflows. Closes #226.
Added canarchy datasets stream --max-frames so local downloaded dataset files can be streamed with a bounded frame count, matching safe replay workflows. Closes #270.

Fixed

Fixed the default datasets search human heading to show All datasets (N) instead of Datasets matching All datasets (N). Closes #267.
Made canarchy datasets provider list and canarchy datasets search <query> default to compact readable output instead of dumping raw Python result dictionaries, added --verbose for detailed search result blocks, and preserved explicit JSON output for automation.
Fixed MCP stats and filter tool argument mapping so they invoke the current CLI grammar (stats --file <file> and filter <expr> --file <file>) and return successful canonical envelopes. Closes #237.
Made canarchy datasets replay stop cleanly on closed stdout pipes instead of printing a Python BrokenPipeError traceback. Closes #240.
Made dataset replay return DATASET_INDEX_NOT_REPLAYABLE for curated dataset indexes so automation can distinguish index entries from other non-replayable datasets. Closes #242.

Documentation

Polished the CANdid streaming tutorial to clarify replay output modes, source timestamps, optional shell helpers, and save-to-file examples. Closes #281.
Documented five security use cases for using CANarchy with coding agents, including a CAN/J1939 capture triage workflow and fixture-backed example. Closes #271.
Documented dataset stream and replay bounding semantics for agents, including the difference between frame limits and JSONL chunk metadata. Closes #268.
Audited and refreshed operator, agent, design, and test documentation for current dataset replay/fetch behavior, MCP dataset coverage, stdin pipeline semantics, and file-backed command grammar. Closes #259.

Assets 2