v0.6.0 — Security, Performance & Quality
Security
- SSRF protection: URL fetching validates resolved IPs against private/reserved/loopback/link-local ranges
- File size limits: 100MB default, configurable via
--max-file-size CLI flag
- Hardened filename sanitization: Strips null bytes, control characters, and path separators
- YAML escape fix: Properly handles newlines and carriage returns in frontmatter values
- CI permissions: Added
contents: read to workflow
Performance
- Lazy imports: Converter modules loaded on-demand (~300-800ms startup improvement)
- Pre-compiled regexes: All regex patterns compiled once at module level
- PDF single-parse: Uses context manager and passes open document to
pymupdf4llm
- lxml parser: Faster HTML pre-cleaning via
lxml instead of html.parser
- Trafilatura-first HTML: Calls
trafilatura.extract(output_format="markdown") directly, falls back to markdownify only when needed
Fixes
- Fixed lettered list regex false positive on uppercase names (e.g. "A. Einstein")
- Fixed CLI skip counter logic (early-return before converter call)
- Moved
SCRIPT_DIR inside main() to avoid module-level side effect
- Narrowed exception handlers across all converters
Improvements
- Shared
build_frontmatter() helper eliminates 4-way frontmatter duplication
- Shared
read_text_with_fallback() helper for encoding detection
convert_url() convenience wrapper simplifies URL processing
- Min version bounds on all dependencies
- Added
lxml>=5.0.0 as explicit dependency
- Updated README: Python 3.10+ requirement, security section, feature table updates
- GitHub metadata: CI workflow, CodeQL analysis, Dependabot, issue/PR templates
Install / Upgrade
pip install --upgrade any2md