Copyright 2025 Google LLC.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Convert Gemma 3 270M for on-device inference with MediaPipe

This notebook provides code for converting Gemma for use with the MediaPipe LLM Inference API, a library that enables inference on mobile devices or in web browsers. You can learn how to fine-tune Gemma 3 270M [here](https://).

Run each code snippet to:

1. Set up the Colab environment
2. Load and prepare Gemma 3 model from Hugging Face
3. Convert the model with the AI Edge Torch converter
4. Package the model with the MediaPipe
5. Test, evaluate, and save the model for further use

If you create your own dataset, you can customize the model so it "speaks" more like you by training it to output specific emojis when it recognizes a certain phrase.

## Set up development environment

The first step is to install the necessary libraries using the `pip` package installer, which handles dependencies.

In [1]:
!pip uninstall -y tensorflow
!pip install tf-nightly==2.21.0.dev20250819 ai-edge-torch==0.6.0 protobuf transformers
!pip install -U jax jaxlib

[0m[31mERROR: Operation cancelled by user[0m[31m


## Load the model

If you're using a fine-tuned model, you'll need to ensure that the vocabulary size matches the base Gemma 3 model to use the AI Edge Torch Converter, which is 262144.

Log in to Hugging Face with your [Access Token](https://huggingface.co/settings/tokens) by storing it as a Colab secret in the left toolbar. Specify `HF_TOKEN` as the 'Name' and add your unique token as the 'Value'.

This makes it easy to load and resize the model from Hugging Face Hub in one step.

In [2]:
import os
from google.colab import userdata
from huggingface_hub import login
hf_token = userdata.get('HF_TOKEN')
login(hf_token)

### Resize model vocabulary

To convert the Gemma 3 model using the Google AI Edge Torch converter, you'll need to ensure that the vocabulary size matches the base Gemma 3 model, which is 262144.

We'll download and resize the model from Hugging Face Hub in one step.

In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = "kr15t3n/my-gemmoji"                               # Model to convert
save_path = "/content/my-gemmoji"                               # Replace with path to save resized model

model = AutoModelForCausalLM.from_pretrained(model_path)        # Load the model
tokenizer = AutoTokenizer.from_pretrained(model_path)           # Load the tokenizer

model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

config.json: 0.00B [00:00, ?B/s]



model.safetensors:   0%|          | 0.00/1.07G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/4.69M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/35.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

('/content/my-gemmoji/tokenizer_config.json',
 '/content/my-gemmoji/special_tokens_map.json',
 '/content/my-gemmoji/chat_template.jinja',
 '/content/my-gemmoji/tokenizer.model',
 '/content/my-gemmoji/added_tokens.json',
 '/content/my-gemmoji/tokenizer.json')

## Convert the model
Now we'll convert and quantize the model you just saved using the [AI Edge Torch](https://github.com/google-ai-edge/ai-edge-torch) converter. The conversion process can vary based on model size, but for Gemma 3 270M should take 5-10 minutes.

The conversion parameter values are set to maximize efficiency for one-shot emoji translations in the browser:

* `prefill_seq_len`: maximum length of supported input
* `kv_cache_max_len`: maximum of prefill + decode context length
* `quantize`: the quantization scheme.

You can increase the maximum lengths for other use cases, such as chat conversations.

Quantization is important for running models locally. It reduces the precision of the model's weights to save space. In this case, 8-bit integer quantization (INT8) converts each parameter from a 4-byte float to a 1-byte integer.

In [9]:
from ai_edge_torch.generative.examples.gemma3 import gemma3
from ai_edge_torch.generative.utilities import converter
from ai_edge_torch.generative.utilities.export_config import ExportConfig
from ai_edge_torch.generative.layers import kv_cache

pytorch_model = gemma3.build_model_270m("/content/my-gemmoji")  # Path of the model to convert

# Set export settings and convert model to .tflite
export_config = ExportConfig()
export_config.kvcache_layout = kv_cache.KV_LAYOUT_TRANSPOSED
export_config.mask_as_input = True
converter.convert_to_tflite(
    pytorch_model,
    output_path="/content",
    output_name_prefix="my-gemmoji",
    prefill_seq_len=128,
    kv_cache_max_len=512,
    quantize="dynamic_int8",
    export_config=export_config,
)

---
## Create a MediaPipe Task Bundle

A Task Bundle file (.task) contains the original model tokenizer, the LiteRT model (.tflite), and additional metadata needed to run end-to-end inference with the MediaPipe LLM Inference API.

To use the bundler, install the MediaPipe PyPI package (>0.10.14) in this step as it comes with its own set of dependencies.

In [10]:
pip install mediapipe

Collecting mediapipe
  Downloading mediapipe-0.10.21-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (9.7 kB)
Collecting numpy<2 (from mediapipe)
  Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
Collecting protobuf<5,>=4.25.3 (from mediapipe)
  Downloading protobuf-4.25.8-cp37-abi3-manylinux2014_x86_64.whl.metadata (541 bytes)
Collecting sounddevice>=0.4.4 (from mediapipe)
  Downloading sounddevice-0.5.2-py3-none-any.whl.metadata (1.6 kB)
INFO: pip is looking at multiple versions of opencv-contrib-python to determine which version is compatible with other requirements. This could take a while.
Collecting opencv-contrib-python (from mediapipe)
  Downloading opencv_contrib_python-4.11.0.86-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (20 kB)
Downloading mediapipe-0.10.21-cp312-cp312-manylinux_2_28

The version of `protobuf` that the MediaPipe package installs is older, so you may run into errors. Make sure you have a compatible version by doing a fresh reinstall.

In [3]:
!pip uninstall protobuf -y && pip install protobuf

Found existing installation: protobuf 4.25.8
Uninstalling protobuf-4.25.8:
  Successfully uninstalled protobuf-4.25.8
Collecting protobuf
  Using cached protobuf-6.32.1-cp39-abi3-manylinux2014_x86_64.whl.metadata (593 bytes)
Using cached protobuf-6.32.1-cp39-abi3-manylinux2014_x86_64.whl (322 kB)
Installing collected packages: protobuf
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
mediapipe 0.10.21 requires protobuf<5,>=4.25.3, but you have protobuf 6.32.1 which is incompatible.
grpcio-status 1.71.2 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 6.32.1 which is incompatible.
google-ai-generativelanguage 0.6.15 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<6.0.0dev,>=3.20.2, but you have protobuf 6.32.1 which is incompatible.[0m[31m
[0mSuccessfully installed protobuf-6.32.1


If you haven't already, make sure you restart the runtime before proceeding to the `genai.bundler` code to make sure the latest libraries are used.

In [5]:
!pip install tensorflow -quiet

Collecting tensorflow
  Downloading tensorflow-2.20.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.5 kB)
Collecting tensorboard~=2.20.0 (from tensorflow)
  Downloading tensorboard-2.20.0-py3-none-any.whl.metadata (1.8 kB)
Downloading tensorflow-2.20.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (620.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m620.7/620.7 MB[0m [31m800.9 kB/s[0m eta [36m0:00:00[0m
[?25hDownloading tensorboard-2.20.0-py3-none-any.whl (5.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.5/5.5 MB[0m [31m65.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tensorboard, tensorflow
  Attempting uninstall: tensorboard
    Found existing installation: tensorboard 2.19.0
    Uninstalling tensorboard-2.19.0:
      Successfully uninstalled tensorboard-2.19.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. Thi

TensorFlow needs to be installed again. Restart session once installed.

In [2]:
from mediapipe.tasks.python.genai import bundler

config = bundler.BundleConfig(
    tflite_model="/content/my-gemmoji_q8_ekv512.tflite",
    tokenizer_model="/content/my-gemmoji/tokenizer.model",
    start_token="<bos>",
    stop_tokens=["<eos>", "<end_of_turn>"],  # stop_tokens must be a list
    output_filename="/content/my-gemmoji.task",
    prompt_prefix="Translate the following text to emoji: ",
    prompt_suffix="\nEmoji: ",
)
bundler.create_bundle(config)

The `bundler.create_bundle` function creates a .task file that contains all the necessary information to run the model.

---

## Download your model

Your `.task` model bundle is now ready for on-device inference using the MediaPipe LLM Inference API!

Download it from your session storage to use it for future building.

In [22]:
from google.colab import files

files.download("/content/my-gemmoji.task")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Troubleshooting installed libraries
This Colab introduces a number of python packages that install their own set of dependencies and may require resetting the session after installs. If you're running into errors when running the scripts, check that you're using compatible packages below.

In [2]:
from google import protobuf
print(protobuf.__version__)

4.25.8


In [6]:
import ai_edge_torch
print(ai_edge_torch.__version__)



0.6.0


In [4]:
import tensorflow as tf
print(tf.__version__)

2.21.0-dev20250818


In [5]:
import jaxlib
print(jaxlib.__version__)

0.7.1
