# Tutorial 8: Support for Android

This tutorial deep-dives on the basics of working with the Android platform,
including deployment, authoring, and runtime options.

* * *

**DISCLAIMER: Before we continue, we'd like to remind you that everything you
see here is intended primarily for research and experimental purposes, and uses
in a non-experimental setting are at your own risk.**
If you're planning to build a production mobile app to run on Android, at this
time we recommend that you review the [Gemini API](https://ai.google.dev/), and the [Gemini Nano on-device through Android AICore](https://android-developers.googleblog.com/2023/12/a-new-foundation-for-ai-on-android.html). AICore is the
new system-level capability introduced in Android 14 to provide Gemini-powered
solutions for high-end devices, including integrations with the latest ML
accelerators, use-case optimized LoRA adapters, and safety filters. To start
using Gemini Nano on-device with your app, apply to the
[Early Access Preview](https://docs.google.com/forms/d/e/1FAIpQLSdDvg0eEzcUY_-CmtiMZLd68KD3F0usCnRzKKzWb4sAYwhFJg/viewform?usp=header_link).

* * *

Now that you understand the risks, let's continue.

## Overview

The overall architecture that we're going to work with is shown on the
following diagram:

![GenC on-device](genc_on_device.png)

Let's go through this diagram step-by-step:

1.   The GenAI logic to be executed by GenC is always represented in the form
     of a portable
     [Intermediate Representation (IR)](https://github.com/google/genc/tree/master/genc/docs/ir.md), shown here in blue. As a developer
     using GenC, you generally don't create the IR directly - you use one of
     the supplied authoring APIs. The two most common ways to author the IR,
     shown here, are to either:

     *   Author it in Python, e.g., in a Jupyter notebook while prototyping,
         and deploy the IR by uploading it to the device (or bundling it with
         the app as a resource, etc.), as shown in some of the preceding
         tutorials.

     *   Author it directly in the mobile app using the supplied Java authoring
         APIs.

     Functionally, there's no real difference between the two - the result is
     always the same, so it's mostly a matter of preference. For more on the
     authoring methods, see the authoring section in
     [api.md](https://github.com/google/genc/tree/master/genc/docs/api.md).

2.   Whenever the app wishes to execute the GenAI logic defined in the IR, it
     passes it to the local instance of GenC runtime linked directly into the
     app process. This runtime can be configured in a number of ways to use
     different types of on-device or cloud LLMs and other services, including
     the ability to delegate to other remote instances of GenC runtimes. For
     simplicity, in this tutorial we're mostly going to use an example runtime
     setup. For more on how to customize the runtime, see
     [runtime.md](https://github.com/google/genc/tree/master/genc/docs/runtime.md).

3.   The instance of GenC runtime linked into the app executes the IR. During
     execution, the runtime makes calls to other components as needed (based on
     the logic encoded in the IR), including local or remote LLMs, etc. The set
     of such services, as noted above, is defined when setting up the runtime.
     After computing the result, the runtime passes control back to the app.

## Setup

As in all other tutorials, you need to setup your development environment, such
that you can build GenC, and such that you can connect this Python colab
notebook to a Jupyter runtime with GenC linked, to let you execute the code
shown below.

You can find a more comprehensive step-by-step walkthrough in
[android_setup.md](https://github.com/google/genc/tree/master/genc/docs/android_setup.md). Here, for simplicity's sake, we present a slightly shorter
version of that setup process tailored for this tutorial. Feel free to consult
the above, as well as the adjacent documentaiton on model support options, for
more details.

First, make sure to follow the basic steps outlined in
[SETUP.md](https://github.com/google/genc/tree/master/SETUP.md)
at the root of the repo to create a docker container inside of which you will
fetch GenC from GitHub, build and run examples.

Once in the docker container, be sure to run the initial Android build config
script `bash ./setup_android_build_env.sh` , and then proceed to build and run
all the tests (`bazel test genc/...`) to confirm that your build setup works.

Next, you're going to need to build a GenC demo app that we're going to use in
this tutorial (and that we'll modify later to illustrate various concepts). You
will do it as follows:

```
bazel build \
  --config=android_arm64 \
  genc/java/src/java/org/genc/examples/apps/gencdemo:app
```

The above process produces a file named `app.apk` in the `bazel-bin` directory
that corresponds to the source path above. Once the build completes, you will
want to copy it outside of the docker container, so that you can use the `adb`
tool to install it on your device. Assuming that you started the container as
shown in [SETUP.md](https://github.com/google/genc/tree/master/SETUP.md), and
have the `/genc` directory inside the container mapped to an external directory
outside of it, you can do it as follows:

```
cp bazel-bin/genc/java/src/java/org/genc/examples/apps/gencdemo/app.apk \
  /genc/app.apk
```

Now, outside the docker container, run `adb devices` to confirm that your `adb`
tool is setup correctly and your device is present (be sure to enable the
Developer mode and USB debugging), and then navigate to the directory that maps
to `/genc` of the docker repo where you should find `app.apk`, and push it to
the device, as follows:

```
adb install app.apk
```

You should see something like:

```
Performing Streamed Install
Success
```

Once the app is successfully deployed, you can find it under GenC Demo App in
the app launcher:

![GenC app icon](genc_app_icon.png)

Don't run the app just yet, the setup is not ready.

The next step is to deploy the Gemma 2B model weights that we're going to use
in this tutorial for on-device LLM calls. You can find them, e.g.,
[on HuggingFace](https://huggingface.co/google/gemma-2b), but for this tutorial,
you'll want the
[quantized versions](https://huggingface.co/lmstudio-ai/gemma-2b-it-GGUF).
Grab the `GGUF` file named `gemma-2b-it-q4_k_m.gguf`, and then push it to your
mobile device, e.g., as follows:

```
wget --directory-prefix=/tmp/ https://huggingface.co/lmstudio-ai/gemma-2b-it-GGUF/resolve/main/gemma-2b-it-q4_k_m.gguf
cd /tmp/
adb push gemma-2b-it-q4_k_m.gguf /data/local/tmp/gemma-2b-it-q4_k_m.gguf
```

You should see something like:

```
gemma-2b-it-q4_k_m.gguf: 1 file pushed, 0 skipped. 31.7 MB/s (1495245728 bytes in 44.971s)
```

You can confirm the weights are uploaded:

```
adb ls /data/local/tmp/
```

You want to see something like:

```
000041f9 00000d7c 66578f85 .
000041e9 00000d7c 663eb4db ..
000081b6 591fa3a0 65d68e51 gemma-2b-it-q4_k_m.gguf
```

Be sure to remember the mobile path where you pushed those Gemma model weights,
since you'll need to use it later when defining your GenAI workload.

This concludes the initial Android setup. Now, before continuing wiht the rest
of this tutorial, let's make sure that your Jupyter notebook is also setup
correctly. Run the Jupyter server as documented in
[SETUP.md](https://github.com/google/genc/tree/master/SETUP.md)
by calling `bash docs/tutorials/jupyter_setup/launch_jupyter.sh` from within
the `genc/genc` directory:

```
cd /genc/genc
bash docs/tutorials/jupyter_setup/launch_jupyter.sh
```

Then, navigate to the page served by Jupyter to
reopen this notebook and connect it to that server, then execute the code below
to confirm that your setup works.


In [None]:
import genc
from genc.python import authoring
from genc.python import examples
from genc.python import interop
from genc.python import runtime

This concludes the initial setup.

## Authoring

### Authoring in Jupyter notebook and deployment to device

As mentioned earlier, there are two modes of authoring we're going to illustrate
in this tutorial. We'll start with authoring in the Jupyer notebook, since if
you followed any of the preceding tutorials, you're going to find it faimilar.

Here's how you can define a simple chain in LangChain that consists of a prompt
followed by a call to the on-device model.

In [None]:
import langchain
from langchain.prompts import PromptTemplate

gemma = genc.python.interop.langchain.CustomModel(
    uri="/device/gemma",
    config={
        "model_path": "/data/local/tmp/gemma-2b-it-q4_k_m.gguf",
        "num_threads" : 4,
        "max_tokens" : 64
        })

chain = langchain.chains.LLMChain(
      llm=gemma,
      prompt=PromptTemplate(
          input_variables=["topic"],
          template="Tell me about {topic}."))

portable_ir = genc.python.interop.langchain.create_computation(chain)

If necessary, make sure to edit the `model_path` above to match the location of
the `GGUF` file you deployed on-device.

Rather than execute the code directly in the colab, as we did in other
tutorials, we'll proceed directly to on-device deployment. Let's save the IR
to a local file, as follows.

In [None]:
with open("/tmp/genc_demo.pb", "wb") as f:
  f.write(portable_ir.SerializeToString())

Once this is done, move it outside the docker container (where you're running
your Jupyter instance), and then use the `adb push` command to copy it to the
device, just like you did with the Gemma model weights.

In docker:

```
cp /tmp/genc_demo.pb /genc/genc_demo.pb
```

Outside of docker, in the directory that maps to `/genc` in the container:

```
adb push genc_demo.pb /data/local/tmp/genc_demo.pb
```

You should see something like:

```
genc_demo.pb: 1 file pushed, 0 skipped. 1.3 MB/s (257 bytes in 0.000s)
```

Now, open the GenC Demo app and try interacting with the model to confirm
that it works:

![Screenshot 1](tutorial_8_app_screenshot_1.png)

### Authoring directly on-device within the Java app

Now, let's switch gears to working directly within the app. First, let's look
at how the example app is wired. Navigate to the
[GencDemo.java](https://github.com/google/genc/blob/master/genc/java/src/java/org/genc/examples/apps/gencdemo/GencDemo.java)
file in the `genc/java/src/java/org/genc/examples/apps/gencdemo` directory,
open it, and looks at the `onCreate` handler first, where the app performs its
initialization. Notice the following section of the code inside the exception
block (unimportant details omitted and replaced by dots):

```
Value computation = Computations.getComputation();
...
executor = new DefaultAndroidExecutor(getApplicationContext());
runner = Runner.create(computation, executor.getExecutorHandle());

```

What's happening here is:

*   First, the app calls the `Computation.getComputation()` helper to obtain the
    `Value` object . This is the IR that you saved from the Jupyter notebook
    that's now being loaded into the app. If you follow the source code to where
    this helper is defined (in
    [Computations.java](https://github.com/google/genc/blob/master/genc/java/src/java/org/genc/examples/apps/gencdemo/Computations.java)),
    you'll see that by default, it calls `readComputationFromFile()`, which
    loads it from the path where you pushed it earlier.

*   Next, the app constructs an `executor` it will use to run the IR.

*   Finally, given the IR (the `Value` object) and the executor, the app now
    constructs a `Runner` object, which is a Java callable that you can use
    much in the same way you used Python callables in other tutorials. You can
    see the use of the `Runner` object later in the `onClick` handler, where
    the app executes `responseString = runner.call(text);`. This is where the
    runner is invoked, and where the IR loaded at initialization is passed to
    the executor along with the input prompt.

Now, in order to change the behavior of the app from loading the IR from a file
to authorign it directly within the app, all we need to do is to replace the
default example code executed by the first of the above statements:

```
Value computation = Computations.getComputation();
```

If you followed the above explanation and actually looked at the content of
[Computations.java](https://github.com/google/genc/blob/master/genc/java/src/java/org/genc/examples/apps/gencdemo/Computations.java),
you can already see examples of Java authoring API in use.

Go ahead, navigate to the body of `genComputation()`, delete the content of it,
and replace it as follows:

```
@Nullable
public static Value getComputation() {
  return Constructor.createSerialChain(
      new ArrayList<>(
          ImmutableList.of(
              Constructor.createPromptTemplate("Tell me about the aesthetic qualities of {x}."),
              Constructor.createModelInferenceWithConfig(
                  "/device/gemma",
                  Constructor.createLlamaCppModelConfig(
                      "/data/local/tmp/gemma-2b-it-q4_k_m.gguf",
                      /* numThreads= */ 4,
                      /* maxTokens= */ 64)))));
}
```

Note that while the syntax varies, the obverall structure of the code is the
same as what we did in Python. Indeed, both the Python and the Java APIs are
simply thin wrappers around a C++ authoring API that sits under the hood. We
used a different prompt here, so that you can confirm uplon loading the app
that is now uses the IR you authored in Java instead of the one we previously
authored in Python.

Note that while we've authored the IR at the initialization time, there's
nothing forcing you to follow this model. You can author the IR dynamically in
your app, anc change it depending on the need. The same executor can handle either sequential or concurrent invocations with different types of IR.

Go ahead and rebuild the APK, push it to the device with `adb install`, and
re-run the app to confirm that you now see the new code in action.

![Screenshot 2](tutorial_8_app_screenshot_2.png)

## Runtime options

Now that we've covered authoring, let's shift attention to the runtime options.
What you've been using so far is the Gemma model via `llamacpp`. As noted in
[models.md[(https://github.com/google/genc/blob/master/genc/docs/models.md),
you can also use MediaPipe. Indeed, support for MediaPipe is already built into
the example runtime.

To use Gemma through MediaPipe, you'll need to download the Gemma weights in a
MediaPipe-compatible format. You can find detailed instructions at the
[MediaPipe section](https://github.com/google/genc/blob/master/genc/docs/models.md#mediapipe). Get a file named `gemma-2b-it-gpu-int4.bin`, and push it to the
device, e.g., in `/data/local/tmp/`, next to the other one you have downloaded
earlier. The location doesn't really matter. You just need to take note where
the file is located, since you will need to refer to that path later on at the authoring time.

Next, when constructing the IR in `getCOmputation()`, you'll want to replace
the second element of the chain (the
`Constructor.createModelInferenceWithConfig`
call) with a new one, as follows:

```
Constructor.createMediaPipeLlmInferenceModelConfig(
  "/device/mediapipe",
  /* maxTokens= */ 64,
  /* topK= */ 40,
  /* temperature= */ 0.8f,
  /* randomSeed= */ 100);
```

Then, rebuild and re-run the app to confirm that it loads the new IR, and that it still works.

Now, to understand what just happened, let's take a closer look at how we setup
the example runtime. You can find the Java code in
[DefaultAndroidExecutor.java](https://github.com/google/genc/blob/master/genc/java/src/java/org/genc/examples/apps/gencdemo/DefaultAndroidExecutor.java),
but that's just a think wrapper around the C++ constructors defined in
[android_executor_stacks.cc](https://github.com/google/genc/blob/master/genc/cc/examples/executors/android/android_executor_stacks.cc).

In the body of `CreateAndroidExecutor` in that file, you will see sections
of code that look as follows:

```
SetMediapipeModelInferenceHandler(&config, jvm,
                                  mediapipe_text_generator_client,
                                  kMediapipeModelUri);

SetLlamaCppModelInferenceHandler(&config, kGemmaModelUri);
```

What these calls do is, associate the names you've been using in the IR, such
as `/device/gemma` and `/device/mediapipe`, with C++ handlers that route model
calls to the appropriate backends. This list of handlers is later fed to the
generic GenC runtime constructor to setup a customized runtime for your specific
environment.

Feel free to play with this setup, e.g., by modifying the existing or adding
new handlers, and then matching these changes in the IR, and re-pushign the app
to the device to confirm your changes had effect.

If you'd like to learn more about setting up custom runtimes, please review the
general overview in
[runtime.md](https://github.com/google/genc/blob/master/genc/docs/runtime.md).
Everything that's written there applies to Android as well. The only differences
between C++ runtime setup in Android and non-Anroid environments is that the
former makes use of both C++ and Java components (e.g., to leverage Cronet in
lieu of CURL as a network client).

This concludes the Android tutorial.