# Tutorial 6: Confidential Computing

In this tutorial, we set up a GenC-based service to run in a
[Trusted Execution Environment (TEE)](https://en.wikipedia.org/wiki/Trusted_execution_environment)
using the
[Confidential Computing](https://cloud.google.com/security/products/confidential-computing)
capabilities on the
[Google Cloud Platform (GCP)](https://cloud.google.com/)
to offload your GenAI workloads from a remote client to securely run in the
Cloud, with formal cryptographic assurances that your data and results will
remain confidential.

## Overview

The overall architecture of the example we're going to build up to is as shown
on the diagram below.

![GenC in a TEE](genc_tee.png)

Let's go through this diagram step-by-step:

1.   We use the
     [Confidential Space](https://cloud.google.com/docs/security/confidential-space)
     service on GCP to setup a
     [Confidential VM](https://cloud.google.com/confidential-computing/confidential-vm/docs/confidential-vm-overview),
     a virtual machine running on specialized hardware that supports trusted
     execution, with a Confidential Space system image that provides
     additional services such as generating
     [attestation](https://cloud.google.com/confidential-computing/confidential-vm/docs/attestation)
     reports.

2.   Within the
     [Trusted Execution Environment (TEE)](https://en.wikipedia.org/wiki/Trusted_execution_environment)
     created in that VM, we deploy a GenC service
     [container](https://docs.docker.com/reference/cli/docker/container/)
     based on our
     [Dockerfile](../../cc/examples/confidential_computing/Dockerfile).
     The container is setup similarly to the one you used for development,
     with an Ubuntu image a copy of GenC from GitHub, but unlike the one used
     for development, instead of an interactive prompt, it's set to launch a
     [small C++ binary](../../cc/examples/worker/server.cc) that listens for
     incoming HTTP connections on port 80, and hosts a GenC runtime configured
     with the
     [Gemma 2B model](https://huggingface.co/google/gemma-2b),
     the same as what we used in other tutorials.

3.   A remote client (here, this Colab notebook process, or an app on a mobile
     device when we later deploy the code on a phone) submits IR to its local
     GenC runtime that may contain a chunk of processing to be performed in the
     Cloud, in a confidential manner. The GenC runtime on the client initiates
     a HTTP connection to the server mentioned above, and establishes a
     communication channel with the GenC runtime that runs privately in a TEE.
     In this tutorial, we use a plain HTTP connection, but you can change this
     to use HTTPS (see [this example](../../cc/examples/worker/README.md) for
     details on how to do this).

4.   The first request the remote client issues to the server in a TEE is to
     fetch the
     [attestation](https://cloud.google.com/confidential-computing/confidential-vm/docs/attestation)
     report, which includes a public key that will be used to eatablish an
     encrypted communication channel and information about the software that
     runs on the server (including the SHA256 digest of the deployed image),
     signed by the trusted computing platform. The GenC binary on the server
     uses crypto libraries from
     [Project Oak](https://github.com/project-oak/oak)
     to generate the keys, and the attestation API provided by Confidential
     Space via a local UNIX socket in the TEE to obtain the attestation report
     in the form of a [JSON Web Token (JWT)](https://jwt.io/introduction) that
     contains a number of
     [claims](https://cloud.google.com/confidential-computing/confidential-space/docs/reference/token-claims),
     including those mentioned above (the signed copy of the public key to use
     for encrypted communication, the SHA256 digest of the docker container
     image, etc.). This is relayed back to the client.

5.   The client verifies the validity of the attestation report to confirm that
     it's interacting with a service that runs the code expected by the client.
     In particular, the client verifies that the image digest matches the one
     that was obtained by the developer when they built and pushed the GenC
     image to run on GCP (the developer may distribute the image digest along
     with their app, or as in this example, the developer and the user running
     the app may be the same person; many other arrangements are possible, and
     will be discussed in future tutorials).

6.   Upon confirming that it's talking to a server built with the OSS code from
     the GenC repo distributed as a part of this tutorial, the client proceeds
     to establish an encrypted communication channel, through which it submits
     a remote execution request (in the form of GenC IR representing a remote
     computation, and serialized arguments to feed as input), and retrieves the
     results. The client talks to the server using an encrypted gRPC
     incarnation of the same general-purpose
     [`Executor` interface](https://github.com/google/genc/blob/master/genc/docs/runtime.md)
     that we use in all other examples. The interface supports stateful,
     multi-round communication, albeit in this particular example, we only
     issue one-off requests, and the server is not configured with any
     persistent memory that would enable it to host user data or persist state.

Now that you understand the basic flow, let's try it out.

## Step 0. Configure your client environment.

As in all other tutorials, you need to connect this Python colab notebook to a
Jupyter runtime that can load GenC, so that you can execute the code shown
below. Make sure to follow the basic steps in
[SETUP.md](https://github.com/google/genc/tree/master/SETUP.md)
at the root of the repo, and open this tutorial from the page served by the
Jupyter process you launched as per the above, then execute the code below to
confirm that your client setup works correctly.

In [None]:
import genc
from genc.python import authoring
from genc.python import runtime

## Step 1. Create your own confidential computation service on GCP

Next, we'll need to setup a confidential service on GCP that will handle your
workloads.

Start by following the usual steps to
[create a new GCP project](https://cloud.google.com/resource-manager/docs/creating-managing-projects),
and make sure that that firewall rules in your organization/project will allow
inbound HTTP traffic (this will often be inherited from the organization where
you're creating your project). Make sure that you have the
[`gcloud`](https://cloud.google.com/sdk/docs/install) command-line tool setup,
since we'll use it to automate the remaining steps.

Once that's done, enter the
[`cc/examples/confidential_computing`](../../cc/examples/confidential_computing)
directory, and edit the
[config_environment.sh](../../cc/examples/confidential_computing/config_environment.sh)
script in that directory with your GCP project id, preferred VM and service
account names, etc., and then execute the
[create_environment.sh](../../cc/examples/confidential_computing/create_environment.sh)
one-time setup script to populate the appropriate parts of your project (the
service account, repository for images, basic permissions, etc.). Feel free to
further tweak the setup if your organization has additional constraints. The
goal is to be able to create new Confidential VMs with images you will build
locally on your workstation and manually push to the Cloud image repo. You only
need to run this setup once.
```
cd cc/examples/confidential_computing
bash ./create_environment.sh
```

Once the above is done, you're ready to build and push the container image, as
follows:

```
bash ./build_image.sh
bash ./push_image.sh
```

After the above completes, the script will print text that looks like below:
```
latest: digest: sha256:SOMETHING size: WHATEVER
```

Copy the `sha256:SOMETHING` and plug it in below, then run the colab cell to
retain the image digest. We're going to need this to setup the client to only
forward requests to the image we deployed.

In [None]:
image_digest = "" # copy here the `sha256:SOMETHING` part printed out by ./push_image.sh, then execute his call

Once the image was pushed (or sometimes, a minute later when things settle in),
you're ready to deploy a new confidential VM that runs this image. In this
example, we're going to use a `DEBUG` image for convenience, as that will enable
you to see debug printouts on the serial console, and even SSH into the VM to
debug it. For a real deployment (where you'll be sending actual confidential
data), you'll want to disable the `DEBUG` flag to harden the image (so that no
debug logs are printed, and noone can SSH into the VM, such that the only way
in is through the HTTP endpoint we set up). For illustrate purposes, you can
leave it as is.

```
bash ./create_debug_vm.sh
```

Once the above completes, you will see a printout showing the IP address of the
server (under `EXTERNAL IP`). Copy it, enter below, and execute the following
cell, so that we can use it later to setup the client.

In [None]:
server_ip = "" # copy here the `EXTERNAL IP` address of the server printed out by ./create_debug_vm.sh
server_address =  server_ip + ":80"

It often takes a few minutes for the Confidential VM to become ready to accept
incoming requests. Since you're running in `DEBUG` mode, you can click through
to the `VM instances` in your GCP dashboard, open your VM (it will be `worker`
unless you renamed it), and follow to `serial port 1 (console)` under `Logs` to
verify that the message `workload task started` appears at the end of the log
stream. Later, when you disable the `DEBUG` mode, you may just want to wait for
a minute or two.

Once that's done, we're ready to launch the client side and submit a workload
for confidential execution.

## Step 2. Create and run a confidential computation

The first step looks just like in [Tutorial 1](tutorial_1_simple_cascade.ipynb).
We setup a simple chain that consists of a prompt and an LLM call. The
confidential runtime image we deployed in the Cloud is configured with the
[Gemma 2B model](https://huggingface.co/google/gemma-2b)
model, bound to `/device/gemma`, just like in all other tutorials, for your
convenience (so you can later play with redirecting code from other tutorials
to run in the TEE).

Let's go ahead and define the confidential workload:

In [None]:
# Computation to be executed in a secure enclave on Confidential Computing.
@genc.python.authoring.traced_computation
def private_workload(x):
  prompt_template = genc.python.authoring.prompt_template["Tell me about {topic}"]
  model_inference = genc.python.authoring.model_inference_with_config[{
      "model_uri": "/device/gemma",
      "model_config": {"model_path": "/gemma-2b-it-q4_k_m.gguf"}}]
  return model_inference(prompt_template(x))

Now, let's write another computation that will submit this workload to run in
the confidential runtime you deployed on Cloud. You achieve this by using the
`confidential computation` operator. The operator is parameterized with the
computation being delegated, and parameters that describe the server where it
should be forwarded, including the address and image digest you saved earlier.

In [None]:
# Computation to run locally, including the delegation of `foo` to the TEE.
@genc.python.authoring.traced_computation
def run_private_workload_on_my_trusted_server(x):
  backend = {"server_address": server_address, "image_digest": image_digest}
  return genc.python.authoring.confidential_computation[private_workload, backend](x)

This is it! You can now run the above, just like you ran all other examples.

In [None]:
genc.python.runtime.set_default_executor()
result = run_private_workload_on_my_trusted_server("scuba diving")
print(result)

## Step 3. Deploy the client on a mobile device

Coming soon...