First public release with corrected package metadata.
vid2llm extracts frames from video for multimodal LLM workflows, with a
streaming Python API and a CLI. Frame extraction runs through three
auto-selected backends: OpenCV, PyAV, and ffmpeg.
Changed
- Corrected the published package description and project metadata on PyPI to
reflect the shipped functionality.
Available in this release
- Frame extraction with sampling by interval, count cap, and time window
- Three decode backends with automatic selection
- Streaming Python API and a typed CLI (probe and extract)
- Output to jpg, png, or webp
- Tested on Linux and Windows across Python 3.11, 3.12, and 3.13
On the roadmap
- Scene-aware and motion-based sampling
- OCR text extraction
- Direct adapters for multimodal provider SDKs
Full changelog: https://github.com/leozitogs/vid2llm/blob/main/CHANGELOG.md