Skip to content

jfbastien/VLMaxxing

Repository files navigation

VLMaxxing through FrameMogging

ci website arXiv

Training-Free Anti-Recomputation for Video Vision-Language Models

Research code, artifacts, and manuscript tooling for training-free anti-recomputation in video vision-language models.

The repo is organized around a small set of claim-bearing regimes:

  • C-CEILING: component speedups survive to end-to-end latency only in proportion to the dense wall-clock share they own.
  • C-PERSIST: after ingest, same-video follow-up queries can be much cheaper inside a tested cache-reuse envelope.
  • C-VISION: bounded measured sparse-vision execution exists; broad sparse backends and sparse LM prefill remain open.
  • candidate C-STREAM: native-rate streaming state reuse has a checked mixed/boundary bundle, but it is not an earned headline until a native policy beats matched baselines under cache-correctness and stale-cache tests.

The durable imported-target summary is in docs/claim-register.md, local reproduction status is in docs/reproduction-status.md, and raw history remains in git.

Regime overview

Quick Start

uv sync --locked --group dev --group research
uv run ruff format --check .
uv run ruff check .
uv run mypy src tests
uv run pytest
uv run python scripts/audit_artifact_integrity.py

For local MLX / MLX-VLM research utilities:

uv sync --locked --group dev --group research --group vlm

For local corpus assets:

uv run python scripts/fetch_corpus.py --tier primary --encode
uv run python scripts/generate_synthetic_corpus.py

For benchmark-native TOMATO / MVBench / VideoMME assets, start with docs/benchmark-setup.md. VideoMME uses checked manifest subsets and a separate subset fetch path documented in docs/videomme-download-handoff.md.

For the paper:

uv sync --locked --group dev --group research --group benchmark --group paper
brew install tectonic  # macOS; any XeLaTeX/Tectonic install also works
make paper-doctor
make paper-sync
make paper-build

Where To Read First

For readers and reviewers:

For contributors and agents:

Repository Layout

.
├── docs/        stable methodology, setup, literature, and status
├── paper/       arXiv manuscript, generated assets, and paper claim ledgers
├── research/    dated experiment notes, registry, and checked artifacts
├── scripts/     reusable runners, analyzers, validators, and plotters
├── src/         importable codec_through package
└── tests/       unit tests for reusable code

Checked research artifacts remain in this repo when they directly support tables, figures, status claims, or review/publication previews with adjacent provenance notes. Large future bundles should use a manifest with checksums, but deleting current artifact evidence would make the paper harder to audit.

Research Principles

  • label claims as reproduced here, imported target, or hypothesis
  • separate semantic answer stability from real skipped work
  • report denominators and setup costs explicitly
  • keep negative results when they change the claim boundary
  • use primary sources for literature and standards claims

Citation

Cite the paper as arXiv:2605.03351. CITATION.cff includes the preferred paper citation for GitHub's citation UI.

License

This repository is multi-licensed.

Code, scripts, tests, software configuration, and paper/build tooling are licensed under MIT.

Original documentation, research notes, manuscript source, generated paper figures/tables, and non-code research artifacts created solely by this repo are licensed under Creative Commons Attribution 4.0 International (CC-BY-4.0).

Benchmark-derived preview artifacts carry provenance notes in their artifact folders.

About

Training-free anti-recomputation for video vision-language models.

Topics

Resources

License

MIT, CC-BY-4.0 licenses found

Licenses found

MIT
LICENSE
CC-BY-4.0
LICENSE-docs

Stars

Watchers

Forks

Packages

 
 
 

Contributors