v1.0.6 — Security hardening release
Security hardening release. Closes 8 actionable findings from the 2026-04-27 multi-tool audit (semgrep + trivy + gitleaks + bandit + pip-audit + manual review). No public API changes; behavior unchanged for legitimate inputs.
Highlights
- F1 (HIGH) — new
any2md/_http.pydoes manual redirect walking with per-hop host revalidation; defeats DNS rebind and redirect-based SSRF. Applied to URL fetching, HEAD-for-Last-Modified, and arxiv lookup. - F5 (MED) —
--metaintegrity: reserved keys (content_hash,extracted_via,source_file,lane,token_estimate,recommended_chunk_level) are now silently filtered with a stderr WARN. - F3 (MED) — TOML auto-discovery now bounded at project markers (
.git,pyproject.toml, etc.). New--no-configflag. - F4 (MED) — DOCX zip-bomb guard: 1 MB cap on declared uncompressed size of
core.xml/app.xml. - F-CVE — bumps
lxml>=6.1.0(CVE-2026-41066),pillow>=12.2.0(CVE-2026-25990, CVE-2026-40192),urllib3>=2.6.3(CVE-2026-21441). All ranges acquire upper bounds. - F6/F2/F11 — atomic-write + pre-existing-symlink rejection across all converters; control-char sanitizer for forwarded Docling warnings; PDF stem hardening for image dir.
- XXE defense-in-depth via
defusedxml.
Audit summary
| ID | Title | Severity |
|---|---|---|
| F1 | SSRF: DNS rebind + redirect bypass | High |
| F5 | --meta clobbers reserved frontmatter | Medium |
| F3 | TOML discovery walks above project root | Medium |
| F4 | DOCX zip-bomb amplification | Medium |
| F-CVE | Transitive deps with known CVEs | Medium |
| F6 | Symlink-following on output writes | Low |
| F2 | Control-char passthrough in Docling logs | Low |
| F11 | PDF stem .. corner case |
Low |
Verification
- 348 tests pass (was 290; added 58 new tests across 7 unit-test files).
- Bandit: 9 findings → 1 low (legitimate
try/except/pass). - Real-world 5-DOCX regression: word counts identical to v1.0.5 (978/6726/15566/7811/1477); fallback fires correctly with sanitized warnings.