Skip to content

v0.1.0-alpha — first public release

Pre-release
Pre-release

Choose a tag to compare

@mattmireles mattmireles released this 07 Apr 18:36

v0.1.0-alpha — first public release

This is the first public release of Gemma Multimodal Fine-Tuner, and it's an alpha.

Why this exists

I wanted to fine-tune Gemma on audio + text on my Mac Studio, on data that didn't fit on my Mac — and discovered nothing did all three at once:

  • MLX-LM / Unsloth / axolotl either don't do audio, don't run on Apple Silicon, or assume your dataset fits on local disk.
  • Renting an H100 to LoRA a 2B model felt absurd. Copying a terabyte of GCS data to a laptop, more so.

So this toolkit does the thing I needed: text, image, and audio LoRA on Gemma 3n / Gemma 4, MPS-native, streaming from GCS / BigQuery. As far as I know it's the only Apple-Silicon-native path for Gemma audio fine-tuning.

What works today

  • Text-only LoRA (instruction or completion on local CSV) — the most-tested path.
  • Image + text LoRA (captioning / VQA on local CSV) — works, with offline + gated smoke tests.
  • Audio + text LoRA — works on Apple Silicon.
  • GCS / BigQuery streaming for datasets that don't fit locally.
  • Interactive wizard for system check, LoRA config, model, and dataset selection.
  • ✅ Export to merged HF / SafeTensors via gemma_tuner/scripts/export.py.

Why it's alpha

  • APIs and config schema will change. Profiles, config.ini keys, and CLI flags are not yet stable.
  • Tested primarily on my own hardware (Apple Silicon, unified memory). Other configurations are likely to surface rough edges.
  • Image + audio paths have lighter test coverage than text. The gated multimodal smoke workflow is manual, not on every PR.
  • The wizard's vision memory estimator is a heuristic and is documented as unvalidated.
  • Gemma 4 support requires a separate requirements-gemma4.txt stack — expect dependency friction.
  • Expect bugs in edge cases around long prompts, unusual CSV schemas, and very large streamed shards.

Install & try it

See the README for install, profile setup, and the wizard walkthrough.

Feedback

Issues and PRs welcome at github.com/mattmireles/gemma-tuner-multimodal. If you hit a crash, the bootstrap log + your profile is the most useful thing to attach. That said, this is a side quest for me, so hopefully this doesn't get too popular lol.