# Gemma 3 4B QNN model conversion with Olive 
### Task: Text + Vision Generation 📝

In this notebook, you'll:
- Download the required datasets
- Convert LLM to QNN format
- Convert Vision to QNN format
- Convert Embedding layer with image to QNN format


### Platform requirements
This notebook is intended to run on a machine with:
  * **Operating System**: Linux Ubuntu 22.04 (automated setup script is Linux-only)
  * **Python**: 3.10
  * NVIDIA driver version equivalent to 525.60.13
  * NVIDIA A100 GPU
  * **Storage**: ~13GB for COCO train2017 dataset (downloaded automatically)

## 🐍 Python Virtual environments
Creates Olive and QNN python virtual environments

In [None]:
!git clone https://github.com/CodeLinaro/Olive.git -b dev/qti-kromero/gemma3

In [None]:
import os
import venv
from pathlib import Path
import subprocess
import json
import shutil
import urllib.request
import onnx
from onnx import helper, TensorProto
import glob

current_dir = os.getcwd()
MODEL="google/gemma-3-4b-it"
OLIVE_PYTHON_PATH = './olive_venv'
OLIVE_PYTHON_BIN = './olive_venv/bin/python'
olive_pip_path = Path(OLIVE_PYTHON_PATH) / "bin" / "pip"
OLIVE_REPO_PATH = Path("./Olive")
OLIVE_REQ = "./requirements.txt"
QNN_REQ = "./qnn_req.txt"

QNN_PYTHON_PATH = './qnn_venv'
QNN_PYTHON_BIN_PATH = './qnn_venv/bin'
qnn_pip_path = Path(QNN_PYTHON_PATH) / "bin" / "pip"
QNN_PYTHON_BIN_FULL_PATH = f"{current_dir}/{QNN_PYTHON_BIN_PATH}"

### Prepare Olive Python Environment

In [None]:
if not os.path.exists(OLIVE_PYTHON_PATH):
    print("Creating Olive Venv")
    builder = venv.EnvBuilder(with_pip=True)
    builder.create(Path(OLIVE_PYTHON_PATH))
my_env = os.environ.copy()
my_env["BUILD_CUDA_EXT"] = "0"
GPTQ="git+https://github.com/ModelCloud/GPTQModel.git@558449bed3ef2653c36041650d30da6bbbca440d"
subprocess.check_call([str(olive_pip_path), "install", "-U", "-r" , OLIVE_REQ], env=my_env)
subprocess.check_call([str(olive_pip_path), "install", "--no-build-isolation", GPTQ], env=my_env)
subprocess.check_call([str(olive_pip_path), "install", "-e", OLIVE_REPO_PATH])

### Prepare QNN Python Environment

In [None]:
if not os.path.exists(QNN_PYTHON_PATH):
    print("Creating QNN Venv")
    builder = venv.EnvBuilder(with_pip=True)
    builder.create(Path(QNN_PYTHON_PATH))
subprocess.check_call([str(qnn_pip_path), "install", "--no-build-isolation", "-r" , QNN_REQ], env=my_env)
subprocess.check_call([str(qnn_pip_path), "install", "-e", OLIVE_REPO_PATH])
subprocess.check_call([str(qnn_pip_path), "install", "-U", "--pre", "--extra-index-url",
                       "https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple",
                       "onnxruntime-qnn==1.23.0.dev20250815002", "--no-deps"])

### 🤗 Login to Hugging Face
To access models, you'll need to log-in to Hugging Face with a [user access token](https://huggingface.co/docs/hub/security-tokens). The following command will run you through the steps to login:

In [None]:
!huggingface-cli login --token <>

### Apply few patches to Onnxruntime

This is needed for running the Olive recipies for this model

In [None]:
base_url = "https://raw.githubusercontent.com/CodeLinaro/onnxruntime/326d9d30129bbad698e0306d24dcea0ec5a19e60"
urls = [
    base_url + "/onnxruntime/python/tools/quantization/execution_providers/qnn/quant_config.py",
    base_url + "/onnxruntime/python/tools/quantization/quant_utils.py"
]

destinations = [
    OLIVE_PYTHON_PATH+"/lib/python3.10/site-packages/onnxruntime/quantization/execution_providers/qnn/quant_config.py",
    OLIVE_PYTHON_PATH+"/lib/python3.10/site-packages/onnxruntime/quantization/quant_utils.py"
]

for url, dest in zip(urls, destinations):
    urllib.request.urlretrieve(url, dest)
    print(f"Downloaded and replaced: {dest}")

## Run Olive Recipes

**GPU utilization observed during the run**

		a. Text GPTQModel quantization:        12gb
		b. Text Onnx static quantization:      41gb
		c. Vision Onnx static quantization:    68gb
        d. Embedding Onnx static quantization: 3gb

In [None]:
# Clean Context binary directories if they exist
def clean_directory(path):
    if os.path.exists(path):
        for file in glob.glob(os.path.join(path, '*')):
            if os.path.isfile(file):
                os.remove(file)
dirs_to_clean = [
    './models/gemma3_qnn/model/',
    './models/gemma-3-4b-it-vision/model/',
    './models/gemma-3-4b-it-embed/model/'
]

for dir_path in dirs_to_clean:
    clean_directory(dir_path)

### 1️⃣ LLM model generation

In [None]:
config_path = Path(f"./gemma3-4b-text-qnn-config.json")
with open(config_path, "r") as file:
    data = json.load(file)

data["systems"]["qnn_system"]["python_environment_path"] = QNN_PYTHON_BIN_FULL_PATH
data["input_model"]["model_path"] = MODEL

with open(config_path, "w") as file:
    json.dump(data, file, indent=4)

In [None]:
!./olive_venv/bin/olive run --config ./gemma3-4b-text-qnn-config.json

### 2️⃣ Vision model Quantization

In [None]:
config_path = Path(f"./gemma3-4b-vision-qnn-config.json")
with open(config_path, "r") as file:
    data = json.load(file)
data["systems"]["qnn_system"]["python_environment_path"] = QNN_PYTHON_BIN_FULL_PATH

with open(config_path, "w") as file:
    json.dump(data, file, indent=4)

In [None]:
!./olive_venv/bin/olive run --config ./gemma3-4b-vision-qnn-config.json

### 3️⃣ Embedding Model

In [None]:
!./olive_venv/bin/olive run --config ./gemma3-4b-embedding-qnn-config.json

### Keep output of the embedding model as uint16 instead of float

In [None]:
model = onnx.load("./models/gemma-3-4b-it-embed/model/model.onnx")
graph = model.graph

last_node = graph.node[-1]
graph.node.remove(last_node)
previous_node_output = graph.node[-1].output[0]
new_output = helper.make_tensor_value_info(
    name=previous_node_output,
    elem_type=TensorProto.UINT16,
    shape=["batch_size", "seq_length", 2560]
)
graph.output.remove(graph.output[0])
graph.output.extend([new_output])
onnx.save(model, "./models/gemma-3-4b-it-embed/model/embeddings_with_image.onnx")

### Prepare final ORT GenAI folder for on-device inference 

In [None]:
!cp ./models/gemma-3-4b-it-embed/model/embeddings_with_image.onnx ./models/gemma3_qnn/model/
!cp ./models/gemma-3-4b-it-vision/model/model_ctx.onnx ./models/gemma3_qnn/model/model_ctx_vision.onnx 
!cp ./models/gemma-3-4b-it-vision/model/model_ctx_qnn.bin ./models/gemma3_qnn/model/model_ctx_qnn.bin 
!cp ./genai/*.* ./models/gemma3_qnn/model/
!ls -al ./models/gemma3_qnn/model/

print("ORT GenAI inference setup: ./models/gemma3_qnn")