# Tutorial 8: Support for Android

This tutorial deep-dives on the basics of working with the Android platform,
including deployment, authoring, and runtime options.

* * *

**DISCLAIMER: Before we continue, we'd like to remind you that everything you
see here is intended primarily for research and experimental purposes, and uses
in a non-experimental setting are at your own risk.**
If you're planning to build a production mobile app to run on Android, at this
time we recommend that you review the [Gemini API](https://ai.google.dev/), and the [Gemini Nano on-device through Android AICore](https://android-developers.googleblog.com/2023/12/a-new-foundation-for-ai-on-android.html). AICore is the
new system-level capability introduced in Android 14 to provide Gemini-powered
solutions for high-end devices, including integrations with the latest ML
accelerators, use-case optimized LoRA adapters, and safety filters. To start
using Gemini Nano on-device with your app, apply to the
[Early Access Preview](https://docs.google.com/forms/d/e/1FAIpQLSdDvg0eEzcUY_-CmtiMZLd68KD3F0usCnRzKKzWb4sAYwhFJg/viewform?usp=header_link).

* * *

Now that you understand the risks, let's continue.

## Overview

The overall architecture that we're going to work with is shown on the
following diagram:

![GenC on-device](genc_on_device.png)

Let's go through this diagram step-by-step:

1.   The GenAI logic to be executed by GenC is always represented in the form
     of a portable
     [Intermediate Representation (IR)](https://github.com/google/genc/tree/master/genc/docs/ir.md), shown here in blue. As a developer
     using GenC, you generally don't create the IR directly - you use one of
     the supplied authoring APIs. The two most common ways to author the IR,
     shown here, are to either:

     *   Author it in Python, e.g., in a Jupyter notebook while prototyping,
         and deploy the IR by uploading it to the device (or bundling it with
         the app as a resource, etc.), as shown in some of the preceding
         tutorials.

     *   Author it directly in the mobile app using the supplied Java authoring
         APIs.

     Functionally, there's no real difference between the two - the result is
     always the same, so it's mostly a matter of preference. For more on the
     authoring methods, see the authoring section in
     [api.md](https://github.com/google/genc/tree/master/genc/docs/api.md).

2.   Whenever the app wishes to execute the GenAI logic defined in the IR, it
     passes it to the local instance of GenC runtime linked directly into the
     app process. This runtime can be configured in a number of ways to use
     different types of on-device or cloud LLMs and other services, including
     the ability to delegate to other remote instances of GenC runtimes. For
     simplicity, in this tutorial we're mostly going to use an example runtime
     setup. For more on how to customize the runtime, see
     [runtime.md](https://github.com/google/genc/tree/master/genc/docs/runtime.md).

3.   The instance of GenC runtime linked into the app executes the IR. During
     execution, the runtime makes calls to other components as needed (based on
     the logic encoded in the IR), including local or remote LLMs, etc. The set
     of such services, as noted above, is defined when setting up the runtime.
     After computing the result, the runtime passes control back to the app.

## Setup

As in all other tutorials, you need to setup your development environment, such
that you can build GenC, and such that you can connect this Python colab
notebook to a Jupyter runtime with GenC linked, to let you execute the code
shown below.

You can find a more comprehensive step-by-step walkthrough in
[android_setup.md](https://github.com/google/genc/tree/master/genc/docs/android_setup.md). Here, for simplicity's sake, we present a slightly shorter
version of that setup process tailored for this tutorial. Feel free to consult
the above, as well as the adjacent documentaiton on model support options, for
more details.

First, make sure to follow the basic steps outlined in
[SETUP.md](https://github.com/google/genc/tree/master/SETUP.md)
at the root of the repo to create a docker container inside of which you will
fetch GenC from GitHub, build and run examples.

Once in the docker container, be sure to run the initial Android build config
script `bash ./setup_android_build_env.sh` , and then proceed to build and run
all the tests (`bazel test genc/...`) to confirm that your build setup works.

Next, you're going to need to build a GenC demo app that we're going to use in
this tutorial (and that we'll modify later to illustrate various concepts). You
will do it as follows:

```
bazel build \
  --config=android_arm64 \
  genc/java/src/java/org/genc/examples/apps/gencdemo:app
```

The above process produces a file named `app.apk` in the `bazel-bin` directory
that corresponds to the source path above. Once the build completes, you will
want to copy it outside of the docker container, so that you can use the `adb`
tool to install it on your device. Assuming that you started the container as
shown in [SETUP.md](https://github.com/google/genc/tree/master/SETUP.md), and
have the `/genc` directory inside the container mapped to an external directory
outside of it, you can do it as follows:

```
cp bazel-bin/genc/java/src/java/org/genc/examples/apps/gencdemo/app.apk \
  /genc/app.apk
```

Now, outside the docker container, run `adb devices` to confirm that your `adb`
tool is setup correctly and your device is present (be sure to enable the
Developer mode and USB debugging), and then navigate to the directory that maps
to `/genc` of the docker repo where you should find `app.apk`, and push it to
the device, as follows:

```
adb install app.apk
```

You should see something like:

```
Performing Streamed Install
Success
```

Once the app is successfully deployed, you can find it under GenC Demo App in
the app launcher:

![GenC app icon](genc_app_icon.png)

Don't run the app just yet, the setup is not ready.

The next step is to deploy the Gemma 2B model weights that we're going to use
in this tutorial for on-device LLM calls. You can find them, e.g.,
[on HuggingFace](https://huggingface.co/google/gemma-2b), but for this tutorial,
you'll want the
[quantized versions](https://huggingface.co/lmstudio-ai/gemma-2b-it-GGUF).
Grab the `GGUF` file named `gemma-2b-it-q4_k_m.gguf`, and then push it to your
mobile device, e.g., as follows:

```
wget --directory-prefix=/tmp/ https://huggingface.co/lmstudio-ai/gemma-2b-it-GGUF/resolve/main/gemma-2b-it-q4_k_m.gguf
cd /tmp/
adb push gemma-2b-it-q4_k_m.gguf /data/local/tmp/gemma-2b-it-q4_k_m.gguf
```

You should see something like:

```
gemma-2b-it-q4_k_m.gguf: 1 file pushed, 0 skipped. 31.7 MB/s (1495245728 bytes in 44.971s)
```

You can confirm the weights are uploaded:

```
adb ls /data/local/tmp/
```

You want to see something like:

```
000041f9 00000d7c 66578f85 .
000041e9 00000d7c 663eb4db ..
000081b6 591fa3a0 65d68e51 gemma-2b-it-q4_k_m.gguf
```

Be sure to remember the mobile path where you pushed those Gemma model weights,
since you'll need to use it later when defining your GenAI workload.

This concludes the initial Android setup. Now, before continuing wiht the rest
of this tutorial, let's make sure that your Jupyter notebook is also setup
correctly. Run the Jupyter server as documented in
[SETUP.md](https://github.com/google/genc/tree/master/SETUP.md)
by calling `bash docs/tutorials/jupyter_setup/launch_jupyter.sh` from within
the `genc/genc` directory:

```
cd /genc/genc
bash docs/tutorials/jupyter_setup/launch_jupyter.sh
```

Then, navigate to the page served by Jupyter to
reopen this notebook and connect it to that server, then execute the code below
to confirm that your setup works.


In [None]:
import genc
from genc.python import authoring
from genc.python import examples
from genc.python import interop
from genc.python import runtime

This concludes the initial setup.

## Authoring

### Authoring in Jupyter notebook and deployment to device

As mentioned earlier, there are two modes of authoring we're going to illustrate
in this tutorial. We'll start with authoring in the Jupyer notebook, since if
you followed any of the preceding tutorials, you're going to find it faimilar.

Here's how you can define a simple chain in LangChain that consists of a prompt
followed by a call to the on-device model.

In [None]:
import langchain
from langchain.prompts import PromptTemplate

gemma = genc.python.interop.langchain.CustomModel(
    uri="/device/gemma",
    config={
        "model_path": "/data/local/tmp/gemma-2b-it-q4_k_m.gguf",
        "num_threads" : 4,
        "max_tokens" : 64
        })

chain = langchain.chains.LLMChain(
      llm=gemma,
      prompt=PromptTemplate(
          input_variables=["topic"],
          template="Tell me about {topic}."))

portable_ir = genc.python.interop.langchain.create_computation(chain)

If necessary, make sure to edit the `model_path` above to match the location of
the `GGUF` file you deployed on-device.

Rather than execute the code directly in the colab, as we did in other
tutorials, we'll proceed directly to on-device deployment. Let's save the IR
to a local file, as follows.

In [None]:
with open("/tmp/genc_demo.pb", "wb") as f:
  f.write(portable_ir.SerializeToString())

Once this is done, move it outside the docker container (where you're running
your Jupyter instance), and then use the `adb push` command to copy it to the
device, just like you did with the Gemma model weights.

In docker:

```
cp /tmp/genc_demo.pb /genc/genc_demo.pb
```

Outside of docker, in the directory that maps to `/genc` in the container:

```
adb push genc_demo.pb /data/local/tmp/genc_demo.pb
```

You should see something like:

```
genc_demo.pb: 1 file pushed, 0 skipped. 1.3 MB/s (257 bytes in 0.000s)
```

Now, open the GenC Demo app and try interacting with the model to confirm
that it works:

![Screenshot 1](tutorial_8_app_screenshot_1.png)

### Authoring directly on-device within the Java app

(to be continued...)