# Simple chain in LangChain powered by a device-to-Cloud model cascade

This tutorial shows how to define simple application logic in LangChain, use our
interop APIs to configure it to be powered by a cascade of models that spans
across a model in Cloud and an on-device model on Android, and deploy it in a
Java app on an Android phone. This illustrates many of the key interoperability
and portability benefits of GenC in one concise package. See the follow-up
tutorials listed in the parent directory for how you can further extend and
customize such logic to power more complex use cases.

## Initial setup

Before we begin, we need to setup your environment, such that you can continue
with the rest of this tutorial undisrupted.

TODO: add all the setup steps here, including things like:
* On-device model setup:
   * Please see instructions (TODO: link MediaPipe announcement and developer guide when published) to convert and download supported models on device.
   * For this tutorial, download Gemma model for GPU (TODO: external link) and copy the .tflite model file to your Android phone. Make sure to set the same path in MODEL_PATH in steps below as the path you choose to copy the model file to.
     *  adb push {{src-model-dir}}/model_gpu.tflite /data/local/tmp/llm/model_gpu.tflite

* starting a jupyter notebook with the GenC dependency wired-in
* connecting to that notebook

## Defining application logic in LangChain

We're going to create here an example application logic using LangChain APIs.
For the sake of simplicy, let's go with a simple chain that consists of a prompt
template feeding into an LLM call. Let's define it as a function, so that we can
later play with different models.

In [None]:
import langchain
from langchain.prompts import PromptTemplate

def create_my_chain(llm):
  return langchain.chains.LLMChain(
      llm=llm,
      prompt=PromptTemplate(
          input_variables=["topic"],
          template="Tell me about {topic}?",
      ),
  )

## Declaring models you will use to power the chain

Now, recall that what we want to demonstrate in this tutorial is running your
application logic on a phone, where it might be powered by an on-device LLM.
To facilitate this, GenC provides interop APIs that enable you to declare the
use of an on-device model, e.g., like this:

Note: Make sure you have downloaded the on-device model on Android at MODEL_PATH as covered in the [initial setup section](#scrollTo=Cr8gU1bxnMjA&line=9&uniqifier=1) above.

In [None]:
import generative_computing as genc

MODEL_PATH = "/data/local/tmp/llm/model_gpu.tflite"  #@param

my_on_device_model = genc.interop.langchain.CustomModel(
    uri="/device/llm_inference",
    config={"model_path": MODEL_PATH})

Now, you can construct the chain with it:

In [None]:
my_chain = create_my_chain(my_on_device_model)

Similarly, let's define a cloud model. We will be using this in the next part of the tutorial.

Here, we are using Gemini Pro model from Google Studio AI.

Please [see instructions](https://ai.google.dev/tutorials/rest_quickstart) on how to get an API key to access this model:


In [None]:
import generative_computing as genc

API_KEY = ""  #@param

my_cloud_model = genc.interop.langchain.CustomModel(
    uri="/cloud/gemini",
    config=genc.interop.gemini.create_config(API_KEY))

But, let's make things more interesting. As noted at the outset of the tutorial,
we'll want to illustrate the use of a cascade of models that spans across cloud
and on-device LLMs. For simplicity's sake, let's define a two-model cascade that
first tries to hit a cloud backend (in case we're online), and that defaults to
the use of an on-device model otherwise (if offline):

In [None]:
my_model_cascade = genc.interop.langchain.ModelCascade(models=[
    my_cloud_model, my_on_device_model])

my_chain = create_my_chain(my_model_cascade)

You could've chosen to order models in the cascade differently to achieve a
different behavior. Everything is customizable! In the next tutorial in the
sequence, we'll show you how you can construct an even more powerful routing
mechanism, where routing is based on query sensitivity. For now, this simple
cascade will suffice.

## Generating portable intermediate representation (IR)

Now that you have the application logic (the chain you defined above), we need
to translate it into what we call a *portable intermediate representation* (IR
for short) that can be deployed on an Android phone. You do this by calling the
converstion function provided by GenC, as follows:

In [None]:
my_portable_ir = genc.interop.langchain.create_computation(my_chain)
print(my_portable_ir)

At the time of this writing, this converter only supports a subset of LangChain
functionality; we'll work to augment the coverage over time (and we welcome your
help if there's a feature of LangChain you'd like to see covered and are willing
to contribute it to the platform).

## Testing the IR locally in the Colab environment

Before we move over to deployment on Android, let's first test that the IR is
indeed working. While our goal is to run it on-device, we can just as well run
it here, in the colab environment (remember, all the code is portable). To do
this, we first need to construct a runtime instance:

In [None]:
my_runtime = ...

TODO: add the OSS runtime constructor above for Linux envrionments

Now, the constructor above is provided for convenience in running the examples
and tutorials, and is configured with a number of runtime capabilities that we
use in this context. Runtimes in GenC are fully modular and configurable, and
in most advanced uses, you'll want to configure a runtime that suits the
specific environment you want to run in, or your particular application (e.g.,
with additional custom dependencies, or without certain dependencies you don't
want in your environment). One of the tutorials later in the sequence explains
how to do that. For now, the default example runtime will suffice.

Given the runtime and the portable IR we want to run, we can construct a
*runner* object that will act like an ordinary Python function, and can
be directly invoked, like this:

In [None]:
my_runner = genc.runtime.Runner(my_portable_ir, my_runtime)

my_runner("scuba diving")

## Saving the IR to a file and deployint it on the phone

Now that you tested the IR locally, it's time to deploy it on your phone and
test it there. First, let's save the IR into a file on the local filesystem.

In [None]:
from google3.pyglib import gfile

# saving to a file
with gfile.Open("/tmp/genc_tutorial_1.pb", "wb") as f:
  f.write(my_portable_ir.SerializeToString())


TODO: add file saving above that works in OSS

TODO: continue with the rest of this tutorial to explain how to load it on the
phone, what the code in Java looks like, how to run it there, etc.