<a target="_blank" href="https://colab.research.google.com/github/google-ai-edge/LiteRT/blob/main/litert/samples/colab/LiteRT_AOT_Compilation_Tutorial.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

##### Copyright 2025 The AI Edge Authors.

In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# LiteRT NPU AOT Compilation, and Play for On-device AI integration

In this colab, you'll learn how to use the LiteRT AOT (ahead of time) Compiler to compile an Selfie Segmentation model from either PyTorch or TFLite model, to LiteRT models that are optimized and compiled for on-device NPUs.

The models we are using are originally published on Mediapipe [Image segmentation guide](https://ai.google.dev/edge/mediapipe/solutions/vision/image_segmenter). The two models used in this colabs are
* Selfie Segmentation: A model that segment the portrait of a person, and can be used for replacing or modifying the background in an image.
* Selfie Multiclass: A LiteRT model that takes an image of a person, locates areas for different areas such as hair, skin, and clothing, and outputs an image segmentation map for these items.

This colab will also walk you through the steps for preparing the models with **[Play for On-device AI]( https://developer.android.com/google/play/on-device-ai) (PODAI)**.

**PODAI** deliver custom models for on-device AI features more efficiently with Play for On-device AI. Google Play simplifies launching, targeting, versioning, downloading. Together with LiteRT NPU AOT Compilation, developers can deliver compiled ML models for variant end devices without the need to know what NPUs end users' phone is equiped with.

## Prerequisites

### Install the required packages
Start by installing the required packages, including the ai-edge-litert which contains the NPU AOT compiler, and other libraries you'll use for model conversion.

In [None]:
# Install libc++ dependencies

!wget https://apt.llvm.org/llvm.sh
!chmod +x llvm.sh
!./llvm.sh 18 all
!apt-get install -y libc++-18-dev libc++abi-18-dev

In [None]:
!pip install ai-edge-litert-sdk-mediatek-nightly
# Takes ~5 minutes to download and build the package
!pip install ai-edge-litert-sdk-qualcomm-nightly
!pip install ai-edge-litert-nightly
!pip install ai-edge-torch

Import the required packages.

In [None]:
import os
import shutil

from ai_edge_litert.aot import aot_compile as aot_lib
from ai_edge_litert.aot.ai_pack import export_lib as ai_pack_export
from ai_edge_litert.aot.vendors.mediatek import target as mtk_target
from ai_edge_litert.aot.vendors.qualcomm import target as qnn_target
import ai_edge_torch
from ai_edge_torch.examples.selfie_segmentation import model as selfie_segmentation_model_lib
import huggingface_hub
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import requests
import torch

## Quickstart

### Prepare the PyTorch Model

Here let's use the [MediaPipe Selfie Segmentation](https://storage.googleapis.com/mediapipe-assets/Model%20Card%20MediaPipe%20Selfie%20Segmentation.pdf) model as the starting point.

This model is ported to PyTorch. The Torch implementation is available under [ai_edge_torch/examples/selfie_segmentation](https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/examples/selfie_segmentation/model.py).

The weights of the model is available at HuggingFace [litert-community/MediaPipe-Selfie-Segmentation](https://huggingface.co/litert-community/MediaPipe-Selfie-Segmentation).

In [None]:
selfie_segmentation = selfie_segmentation_model_lib.SelfieSegmentation()

# Download the weights from Hugging Face Hub
work_dir = "selfie_segmentation"
os.makedirs(work_dir, exist_ok=True)

weights_path = huggingface_hub.hf_hub_download(
    repo_id="litert-community/MediaPipe-Selfie-Segmentation",
    filename="selfie_segmentation.pth",
)
selfie_segmentation.load_state_dict(torch.load(weights_path))

### Convert to LiteRT model, with NPU AOT compilation.

Let's first follow regular PyTorch to LiteRT conversion using ai_edge_torch. The `convert` function provided by the ai_edge_torch package allows conversion from a PyTorch model to an on-device model.

In order to support compiling to NPUs, we just need to add additional method `experimental_add_compilation_backend`. By default (if you don't provide any target backends) the model will be converted and compiled to all registered backends. For now the backend includes:
* General model runnable on LiteRT CPU and GPU.
* Qualcomm NPUs.
* MediaTek NPUs.

In [None]:
# Converts model I/O to Channel Last layout
channel_last_selfie_segmentation = ai_edge_torch.to_channel_last_io(
    selfie_segmentation, args=[0], outputs=[0]
)
sample_input = (torch.randn(1, 256, 256, 3),)
compiled_models = ai_edge_torch.experimental_add_compilation_backend().convert(
    channel_last_selfie_segmentation.eval(), sample_input
)

From the logs, we can see some backends failed for compilation. This is because for older generations of NPUs, executing floating point model is generally not supported. Failing backends will be skipped gracefully for batch compilation.

We can also inspect the compilation status using `compilation_report` API.

In addition to the compilation status, the report also contains the following information:
* For each backend, if compilation succeeded, how much of the graph is offloaded to NPU, or the full model is compiled.
* Error log for failed backends.

In [None]:
# @title Print Compilation Report

print(compiled_models.compilation_report())

### Export and Validation on CPU.

Once the compilation finishes, we can use `model.export` method to export all models to disk.

By default, the models will be stored in a flat structure in output directory, with each model name suffixed with the backend id.

For example:

| Model File Name                            | Backend  | SoC    | Note               |
|--------------------------------------------|----------|--------|--------------------|
| selfie_segmentation_fallback.tflite        | CPU/GPU  | N/A    | N/ A               |
| selfie_segmentation_Qualcomm_SM8450.tflite | Qualcomm | SM8450 | Snapdragon 8 Gen 1 |
| selfie_segmentation_MediaTek_mt6989.tflite | MediaTek | mt6989 | Dimensity 9300     |

In [None]:
# Saving models to disk.

compiled_models.export(work_dir, model_name='selfie_segmentation')

In [None]:
# Downloading Testing image

test_image = huggingface_hub.hf_hub_download(
    repo_id="litert-community/MediaPipe-Selfie-Segmentation",
    filename="test_img.png",
)
pil_image = Image.open(test_image).convert("RGB").resize((256, 256))

In [None]:
# Run PyTorch with test image

channel_first_numpy_array = np.array(pil_image, dtype=np.float32)[
    None, ...
].transpose(0, 3, 1, 2)
torch_mask_out = (
    selfie_segmentation(torch.from_numpy(channel_first_numpy_array))
    .detach()
    .numpy()
    .transpose(0, 2, 3, 1)
)
torch_uint8_mask = (torch_mask_out.reshape((256, 256)) * 255).astype(np.uint8)
torch_mask_image = Image.fromarray(torch_uint8_mask, mode='L')

In [None]:
# Run LiteRT with test image
from ai_edge_litert.compiled_model import CompiledModel

numpy_array = np.array(pil_image, dtype=np.float32)[None, ...]
cpu_model_path = os.path.join(work_dir, "selfie_segmentation_fallback.tflite")
cm_model = CompiledModel.from_file(cpu_model_path)
sig_idx = 0
input_buffers = cm_model.create_input_buffers(sig_idx)
output_buffers = cm_model.create_output_buffers(sig_idx)
input_buffers[0].write(numpy_array)
cm_model.run_by_index(sig_idx, input_buffers, output_buffers)
uint8_mask = (
    output_buffers[0].read(256 * 256, np.float32).reshape((256, 256)) * 255
).astype(np.uint8)
mask_image = Image.fromarray(uint8_mask, mode="L")

In [None]:
# Show output results

fig, axes = plt.subplots(1, 3, figsize=(9, 3))

for idx, (title, image) in enumerate([
    ('Test Image', pil_image),
    ('PyTorch Mask Image', torch_mask_image),
    ('TFLite Mask Image', mask_image),
]):
  axes[idx].imshow(image)
  axes[idx].set_title(title)
  axes[idx].axis('off')

plt.tight_layout()
plt.show()


### Exporting Models for Google Play On-Device AI (PODAI)

With your models verified, the next essential step is preparing them for deployment. This section details how to package your compiled models for upload to Google Play, enabling delivery to user devices through the On-Demand AI (PODAI) framework.

The AiEdgeLiteRT AOT (Ahead-of-Time) module provides `ai_pack` utilities specifically for this purpose. These utilities create an **AI Pack**, which is a crucial data asset. An AI Pack bundles your compiled models with device-targeting configurations, ensuring that the correct models and assets are delivered to the appropriate user devices. This is particularly vital for NPU (Neural Processing Unit) compilations, as it guarantees that models optimized for a specific System-on-Chip (SoC) reach only the devices equipped with that SoC.

In [None]:
# Configuring the AI Pack
ai_pack_dir = os.path.join(work_dir, 'ai_pack')
ai_pack_name = 'selfie_segmentation'
litert_model_name = 'segmentation_model'

# Clean up
shutil.rmtree(ai_pack_dir, ignore_errors=True)

# Export
ai_pack_export.export(
    compiled_models, ai_pack_dir, ai_pack_name, litert_model_name
)

And now the models are ready for comsuption by PODAI!

Now we will move on to Android Studio for the following steps, please refer to [THIS LINK](https://github.com/google-ai-edge/LiteRT/tree/main/litert/samples/image_segmentation/kotlin_npu) for details.

But if you are curious on the contents of AI Pack, we can take a look into the directory.

In [None]:
# @title Inspecting AI Pack source


def list_files(startpath):
  """Function to print out the tree structure of a directory."""
  for root, dirs, files in os.walk(startpath):
    level = root.replace(startpath, '').count(os.sep)
    indent = ' ' * 4 * (level)
    print('{}{}/'.format(indent, os.path.basename(root)))
    subindent = ' ' * 4 * (level + 1)
    for f in files:
      print('{}{}'.format(subindent, f))


list_files(ai_pack_dir)

## Advanced Usage

This section covers advanced usages like compiling a LiteRT (TFLite) model directly.


### NPU Compilation from TFLite model

In many of the cases, you might already have a TFLite converted models, which was published before, but the source model is not yet available or the source model is not a PyTorch model. In this case, instead of AiEdgeTorch package, you can use the APIs provided by AiEdgeLiteRT compiler directly.

#### Getting the TFLite Model

We will use [MediaPipe MultiClass Segmentation](https://storage.googleapis.com/mediapipe-assets/Model%20Card%20Multiclass%20Segmentation.pdf) model for this use case.

The TFLite model is available from [MediaPipe Image segmentation](https://ai.google.dev/edge/mediapipe/solutions/vision/image_segmenter#multiclass-model) page.


In [None]:
work_dir = '.'

model_url = 'https://storage.googleapis.com/mediapipe-models/image_segmenter/selfie_multiclass_256x256/float32/latest/selfie_multiclass_256x256.tflite'
tflite_model_path = os.path.join(work_dir, 'selfie_multiclass_256x256.tflite')

model_content = requests.get(model_url)

with open(tflite_model_path, 'wb') as fout:
  fout.write(model_content.content)

#### Using LiteRT Python API to quickly verify the TfLite model

In the following example, we will show both mask image and blended result.

In [None]:
from ai_edge_litert.compiled_model import CompiledModel

SEGMENT_COLORS = [
    (0, 0, 0),
    (255, 0, 0),
    (0, 255, 0),
    (0, 0, 255),
    (255, 255, 0),
    (255, 0, 255),
]
INPUT_SIZE = (256, 256)
NUM_CLASSES = 6

# Load the model and image
model = CompiledModel.from_file(tflite_model_path)
original_image = np.array(Image.open(test_image).convert('RGB'))
img_array = np.array(pil_image).astype(np.float32)

# Normalize the image
normalized = (img_array - 127.5) / 127.5
normalized = np.ascontiguousarray(normalized, dtype=np.float32)

# Run inference
sig_idx = 0
input_buffers = model.create_input_buffers(sig_idx)
output_buffers = model.create_output_buffers(sig_idx)
input_data = normalized.reshape(-1)
input_buffers[0].write(input_data)
model.run_by_index(sig_idx, input_buffers, output_buffers)

# Get output data
height, width = INPUT_SIZE
output_size = height * width * NUM_CLASSES
output_data = output_buffers[0].read(output_size, np.float32)
output_data = output_data.reshape(height, width, NUM_CLASSES)
mask = np.argmax(output_data, axis=2).astype(np.uint8)

# Create colored mask
colored_mask = np.zeros((height, width, 3), dtype=np.uint8)
for label_idx in range(NUM_CLASSES):
  class_mask = mask == label_idx
  color = SEGMENT_COLORS[label_idx]
  colored_mask[class_mask] = color

# Blend with original image
# Resize colored mask to match original image if necessary
if original_image.shape[:2] != colored_mask.shape[:2]:
  colored_mask_pil = Image.fromarray(colored_mask)
  colored_mask_pil = colored_mask_pil.resize(
      (original_image.shape[1], original_image.shape[0])
  )
  colored_mask = np.array(colored_mask_pil)

# Blend images with alpha 0.5
alpha = 0.5
blended_image = (original_image * (1 - alpha) + colored_mask * alpha).astype(
    np.uint8
)

# Display them
fig, axes = plt.subplots(1, 3, figsize=(9, 3))

for idx, (title, image) in enumerate([
    ('Original Image', original_image),
    ('Colored Mask', colored_mask),
    ('Blended Image', blended_image),
]):
  axes[idx].imshow(image)
  axes[idx].set_title(title)
  axes[idx].axis('off')

plt.tight_layout()
plt.show()

#### Convert to LiteRT model, with NPU AOT compilation.

Since it's a TFLite model, we will use `ai_edge_litert.aot` module instead of AiEdgeTorch APIs.

In [None]:
compiled_models = aot_lib.aot_compile(tflite_model_path, keep_going=True)

For the following steps, it's the same as models compiled from PyTorch. e.g. For exporting for AI Pack

In [None]:
# Configuring the AI Pack
os.makedirs('selfie_multiclass')
ai_pack_dir = os.path.join('selfie_multiclass', 'ai_pack')
ai_pack_name = 'selfie_multiclass'
litert_model_name = 'segmentation_multiclass'

# Clean up
shutil.rmtree(ai_pack_dir, ignore_errors=True)

# Export
ai_pack_export.export(
    compiled_models, ai_pack_dir, ai_pack_name, litert_model_name
)

In [None]:
list_files(ai_pack_dir)

### NPU Compilation for specific device / NPU

By default, LiteRT AOT compilation compiles to all registered backends. But for local developement, you might only want to compile for specific devices, say the development phones by hand. This is achievable by providing the compilation targets explicitly.

The following example will compile to QualComm SM8450 SoC and MediaTek MT6989 SoC.

In [None]:
# @title Specifying the compilation target

sm8450_target = qnn_target.Target(qnn_target.SocModel.SM8450)
mt6989_target = mtk_target.Target(mtk_target.SocModel.MT6989)

In [None]:
# @title Compiling from PyTorch model

compiled_models = (
    ai_edge_torch.experimental_add_compilation_backend(sm8450_target)
    .experimental_add_compilation_backend(mt6989_target)
    .convert(channel_last_selfie_segmentation, sample_input)
)

In [None]:
# @title Compiling from TFLite model

compiled_models = aot_lib.aot_compile(
    tflite_model_path,
    target=[sm8450_target, mt6989_target],
    keep_going=False,  # We want to error out when there's failure.
)

# Read more

More links goes here
