v0.1.0-alpha — first public release
Pre-releasev0.1.0-alpha — first public release
This is the first public release of Gemma Multimodal Fine-Tuner, and it's an alpha.
Why this exists
I wanted to fine-tune Gemma on audio + text on my Mac Studio, on data that didn't fit on my Mac — and discovered nothing did all three at once:
- MLX-LM / Unsloth / axolotl either don't do audio, don't run on Apple Silicon, or assume your dataset fits on local disk.
- Renting an H100 to LoRA a 2B model felt absurd. Copying a terabyte of GCS data to a laptop, more so.
So this toolkit does the thing I needed: text, image, and audio LoRA on Gemma 3n / Gemma 4, MPS-native, streaming from GCS / BigQuery. As far as I know it's the only Apple-Silicon-native path for Gemma audio fine-tuning.
What works today
- ✅ Text-only LoRA (instruction or completion on local CSV) — the most-tested path.
- ✅ Image + text LoRA (captioning / VQA on local CSV) — works, with offline + gated smoke tests.
- ✅ Audio + text LoRA — works on Apple Silicon.
- ✅ GCS / BigQuery streaming for datasets that don't fit locally.
- ✅ Interactive wizard for system check, LoRA config, model, and dataset selection.
- ✅ Export to merged HF / SafeTensors via
gemma_tuner/scripts/export.py.
Why it's alpha
- APIs and config schema will change. Profiles,
config.inikeys, and CLI flags are not yet stable. - Tested primarily on my own hardware (Apple Silicon, unified memory). Other configurations are likely to surface rough edges.
- Image + audio paths have lighter test coverage than text. The gated multimodal smoke workflow is manual, not on every PR.
- The wizard's vision memory estimator is a heuristic and is documented as unvalidated.
- Gemma 4 support requires a separate
requirements-gemma4.txtstack — expect dependency friction. - Expect bugs in edge cases around long prompts, unusual CSV schemas, and very large streamed shards.
Install & try it
See the README for install, profile setup, and the wizard walkthrough.
Feedback
Issues and PRs welcome at github.com/mattmireles/gemma-tuner-multimodal. If you hit a crash, the bootstrap log + your profile is the most useful thing to attach. That said, this is a side quest for me, so hopefully this doesn't get too popular lol.