# Tutorial 1. Simple chain in LangChain powered by a device-to-Cloud model cascade

This tutorial shows how to define simple application logic in LangChain in
Python, use our interop APIs to configure it to be powered by a cascade of
models that spans across the Gemini Pro model in Cloud and the Gemma model
on-device, and migrate from running code in this Python notebook at a
prototyping stage to deployment in the target environment - here, we'll use
a generic Java client for simplicity's sake (but the same steps will apply
for deployment on mobile platforms of your choice). This illustrates many
of the key interoperability and portability benefits of GenC in one concise
package. See the follow-up tutorials listed in the parent directory for how
you can further extend and customize such logic to power more complex use
cases.

## Initial setup

Before we begin, we need to setup your environment, such that you can continue
with the rest of this tutorial undisrupted.

*   First, you need to start a Jupyter notebook with the GenC dependency
    wired-in, and connect to that notebook - see
    [SETUP.md](https://github.com/google/genc/tree/master/SETUP.md)
    at the root of the repo, and the supporting files in the
    [Jupyter setup directory](https://github.com/google/genc/tree/master/genc/docs/tutorials/jupyter_setup/)
    for instructions how to setup the build and run environment and get Jupyter
    up and running. To keep the setup simple and avoid dependency issues, we
    will be running your Jupyter notebook in a docker instance.

*   Next, you need to setup access to the Gemini Pro model that will be used in
    this and follow-up tutorials as the Cloud model. In order to use this model,
    you will need to obtain an API key and enter it in this notebook as a part
    of the configuration. Please see the
    [instructions](https://ai.google.dev/tutorials/rest_quickstart)
    on how to get an API key to access this model through Google AI Studio.

*   Finally, for the part of this tutorial that uses an on-device model, you
    will need to obtain the
    [e.g. Gemma 2B Quantized model](https://huggingface.co/lmstudio-ai/gemma-2b-it-GGUF/tree/main) file. For this tutorial, you'll want to fetch
    `gemma-2b-it-q4_k_m.gguf`, and place it somewhere on your local filesystem
    within the docker container, where you'll be running all the examples.
    Keep note of the path where you download the model, since you'll need to
    make sure later to supply the path to the model as a part of the tutorial to enable GenC to find and load it at runtime.

Now, to verify that GenC dependencies are loaded correctly, let's run a bunch
of imports we're going to use later.

In [None]:
import genc
from genc.python import authoring
from genc.python import interop
from genc.python import runtime
from genc.python import examples

## Defining application logic in LangChain

We're going to create here an example application logic using LangChain APIs.
For the sake of simplicy, let's go with a simple chain that consists of a prompt
template feeding into an LLM call. Let's define it as a function, so that we can
later play with different models. We're defining it as a Python function that's
parameterized by the model, because we'll want to test different versions of it
as we swap different models later on.

In [None]:
import langchain
from langchain.prompts import PromptTemplate

def create_my_chain(llm):
  return langchain.chains.LLMChain(
      llm=llm,
      prompt=PromptTemplate(
          input_variables=["topic"],
          template="Tell me about {topic}?",
      ),
  )

## Declaring a Cloud model you will use to power the chain

Now, let's define a model we can use. In GenC, we refer to models symbolically
since the same model may be provisioned differently depending on where you run
the code (recall that we want to demonstrate in this tutorial is running your
application logic in this Jupyter notebook first, but then porting it to run
elsewhere, possibly on a different platform, where the mechanism used to access
your model may vary). To facilitate this, GenC provides interop APIs that
enable you to declare the use of a model, e.g., as shown below. As noted
earlier, for this tutorial, we're going to use the Gemini Pro model from Google
Studio AI.

NOTE: Please make sure you have an API_KEY to use Gemini Pro, as covered in the
initial setup section above.

In [None]:
API_KEY = ""  #@param

my_cloud_model = genc.python.interop.langchain.CustomModel(
    uri="/cloud/gemini",
    config=genc.python.interop.gemini.create_config(API_KEY))

Now, you can construct the chain with it:

In [None]:
my_chain = create_my_chain(my_cloud_model)

## Generating portable intermediate representation (IR)

Now that you have the application logic (the chain you defined above), we need
to translate it into what we call a *portable intermediate representation* (IR
for short) that can be deployed in the target application environment. You do
this by calling the converstion function provided by GenC, as follows:

In [None]:
my_portable_ir = genc.python.interop.langchain.create_computation(my_chain)
print(my_portable_ir)

At the time of this writing, this converter only supports a subset of LangChain
functionality; we'll work to augment the coverage over time (and we welcome your
help if there's a feature of LangChain you'd like to see covered and are willing
to contribute it to the platform).

## Testing the IR locally in the Jupyter notebook environment

Before we move over to deployment on the client, let's first test that the IR is
indeed working. While our goal is to run it on-device, we can just as well run
it here, in this Jupyter notebook (remember, all the code is portable). To do
this, we first need to construct a runtime instance:

In [None]:
my_runtime = genc.python.examples.executor.create_default_executor()

Now, the constructor above is provided for convenience in running the examples
and tutorials, and is configured with a number of runtime capabilities that we
use in this context. Runtimes in GenC are fully modular and configurable, and
in most advanced uses, you'll want to configure a runtime that suits the
specific environment you want to run in, or your particular application (e.g.,
with additional custom dependencies, or without certain dependencies you don't
want in your environment). One of the tutorials later in the sequence explains
how to do that. For now, the default example runtime will suffice.

Given the runtime and the portable IR we want to run, we can construct a
*runner* object that will act like an ordinary Python function, and can
be directly invoked, like this:

In [None]:
my_runner = genc.python.runtime.Runner(my_portable_ir, my_runtime)

print(my_runner("scuba diving"))

## Adding an on-device model

Now, recall earlier that we promised to form a cascade of two models, one of
them being an on-device model. You can define an on-device model similarly to
how you declared the cloud model, as shown below.

NOTE: Make sure you have the model downloaded as per the setup instrucitons
above, and update the PATH below to match.

In [None]:
MODEL_PATH = "/tmp/gemma-2b-it-q4_k_m.gguf"

my_on_device_model = genc.python.interop.langchain.CustomModel(
      uri = "/device/gemma",
      config = {"model_path" : MODEL_PATH,
                "num_threads" : 4,
                "max_tokens" : 64})

Now, just like you did with the Cloud moel, you can construct the chain with
your on-device model, generate the IR, and run it in your Colab notebook to
test that it works:

In [None]:
my_chain = create_my_chain(my_on_device_model)
my_portable_ir = genc.python.interop.langchain.create_computation(my_chain)
my_runner = genc.python.runtime.Runner(my_portable_ir, my_runtime)
print(my_runner("scuba diving"))

## Forming a model cascade

Now that you conformed each of your models is working, you can combine them
into a model cascade, as shown below. For simplicity's sake, we'll start with
a cascade that first tries to hit the cloud model, and falls back to the
on-device model if the cloud model is unavailable. You can declare it as snown
below.

In [None]:
my_model_cascade = genc.python.interop.langchain.ModelCascade(models=[
    my_cloud_model, my_on_device_model])

With that, you can use the model cascase as a parameter to your chain,
reconstruct the IR with your logic powered by the cascade, and test that the
setup still works, as follows:

In [None]:
my_chain = create_my_chain(my_model_cascade)
my_portable_ir = genc.python.interop.langchain.create_computation(my_chain)
my_runner = genc.python.runtime.Runner(my_portable_ir, my_runtime)
print(my_runner("scuba diving"))

You could've chosen to order models in the cascade differently to achieve a
different behavior. Everything is customizable! In the next tutorial in the
sequence, we'll show you how you can construct an even more powerful routing
mechanism, where routing is based on query sensitivity. For now, this simple
cascade will suffice.

## Saving the IR to a file for deployment

Now that you have the full setup with a model cascade, and tested the IR
locally, it's time to deploy it on the Java client as promised, and test it
there. First, let's save the IR into a file on the local filesystem.

In [None]:
with open("/tmp/genc_demo.pb", "wb") as f:
  f.write(my_portable_ir.SerializeToString())

## Deployment in a Java client in the target environment

The portable IR you saved to a file above can be deployed on a variety
of platforms, for for simplicity's sake, here's we'll just use a simple
Java client process that you can run from the command-line, either on
the same or a different machine. In the near future, we'll follow up
with separate tutorials for loading the IR on specific mobile platforms
to illustrate the portability benefits. Aside from the initial setup
that may vary per-platform, the overall process looks roughly the same.

If you look at the code in [GencDemo.java](https://github.com/google/genc/tree/master/genc/java/src/java/org/genc/examples/GencDemo.java), you
will see a sequence similar to what you saw above in Python, but now
expressed in Java APIs. The steps are always the same: first, you provision
the portable IR (here loaded from a local file, the path to which is given
as the first argument), next you construct an excutor that will run the IR
(here using the default we supplied for use with the tutorials), next you
create an call Java *runner* object that combines the two, and call it with
the prompt (passed as the second command-line argument to the Java client).

```
Value ir = Constructor.readComputationFromFile(args[0]);
DefaultExecutor executor = new DefaultExecutor();
Runner runner = Runner.create(ir, executor.getExecutorHandle());
String result = runner.call(args[1]);
System.out.println(result);
```

If you're already running this notebook inside of a docker container and
followed the steps in
[SETUP.md](https://github.com/google/genc/tree/master/SETUP.md)
then you have a build environment already setup. You can build and run the Java
client as follows:

```
bazel run genc/java/src/java/org/genc/examples:genc_demo /tmp/genc_demo.pb scuba
```

This concludes the first tutorial.