<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/sangalo20/Serverless-agents-cloudrun/blob/main/serverless_agents.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/sangalo20/Serverless-agents-cloudrun/blob/main/serverless_agents.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/sangalo20/Serverless-agents-cloudrun/blob/main/serverless_agents.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/sangalo20/Serverless-agents-cloudrun/blob/main/serverless_agents.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/sangalo20/Serverless-agents-cloudrun/blob/main/serverless_agents.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/sangalo20/Serverless-agents-cloudrun/blob/main/serverless_agents.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>

| | |
|-|-|
| Author(s) | [Sangalo Mwenyinyo](https://github.com/sangalo20) |

# Serverless Agents on Cloud Run

Welcome to the **Serverless Agents** on Cloud Run! In this session, we will go beyond simple chatbots and build a production-ready, event-driven **Multi-Agent System**.

## Architecture

**The Challenge:**
Building a GenAI application is easy, but building one that is **scalable**, **up-to-date**, and **responsive** is hard. How do we ensure our agent knows the latest information without constantly retraining it? How do we prevent long document processing times from blocking user chat?

**The Solution: Micro-Agents**
We will solve this by decomposing our system into two specialized, independent micro-services. This "Micro-Agent" pattern allows us to separate the heavy lifting of data ingestion from the real-time demands of user interaction.

We will build:
1.  **The Librarian (Ingestion Agent)**: An event-driven background service. Its sole job is to listen for new information (PDFs uploaded to Cloud Storage), read it, understand it using Gemini, and file it away in our knowledge base (Firestore). It scales to zero when not in use and scales up instantly when you upload thousands of files.
2.  **The Guide (Interface Agent)**: A user-facing chat service. It connects to the knowledge base created by the Librarian to answer user questions accurately. It maintains conversation history and provides a helpful, human-like interface.

## Technologies Used

> **[Google Cloud Run](https://cloud.google.com/run)**
> Cloud Run is a fully managed compute platform that lets you run containers directly on top of Google's scalable infrastructure. It abstracts away infrastructure management, allowing you to focus on building your agents. It automatically scales up and down from zero, meaning you only pay when your code is running. In this workshop, we use Cloud Run to host our agent services, ensuring they can handle any amount of traffic without manual intervention.

> **[Vertex AI & Gemini 2.5 Flash](https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-models)**
> Gemini 2.5 Flash is Google's latest lightweight, low-latency multimodal model designed for high-frequency tasks. It offers exceptional speed and cost-efficiency while maintaining high reasoning capabilities. We use Vertex AI to access this model, enabling our agents to process documents and generate natural language responses with enterprise-grade security and reliability.

> **[Eventarc](https://cloud.google.com/eventarc)**
> Eventarc allows you to build event-driven architectures by routing events from Google Cloud sources (like Cloud Storage) to your services. It handles the complexity of event ingestion, delivery, and security. We use Eventarc to trigger our "Librarian" agent instantly whenever a new file is uploaded, creating a reactive and real-time ingestion pipeline.

> **[Firestore](https://cloud.google.com/firestore)**
> Firestore is a flexible, scalable NoSQL cloud database for storing and syncing data. It keeps your data in sync across client apps through realtime listeners and offers offline support. We use Firestore as the "Brain" of our system, storing both the ingested knowledge (summaries) and the conversation history (short-term memory) for our agents.

Let's build it!

## 1. Setup & Authentication

**Why do we need this?**
To interact with Google Cloud resources (like Cloud Run, Firestore, etc.) from this notebook, we need to prove who we are.

**What is ADC (Application Default Credentials)?**
ADC is a strategy used by Google Cloud libraries to automatically find your credentials. Instead of hardcoding API keys (which is insecure), ADC looks for credentials in a known location on your system. By running `gcloud auth login --update-adc`, we place your personal credentials in that location. This way, when our Python code runs `storage.Client()` or `vertexai.init()`, it automatically finds and uses your identity.

If you don't have a project yet:

1. [Create a project](https://console.cloud.google.com/projectcreate) in the Google Cloud Console.
2. Copy your `Project ID` from the project's [Settings page](https://console.cloud.google.com/iam-admin/settings).

In [None]:
import os

PROJECT_ID = "[your-project-id]"  # @param {type:"string", isTemplate: true}
REGION = "us-central1"  # @param {type:"string", isTemplate: true}

if PROJECT_ID == "[your-project-id]" or not PROJECT_ID:
    print("Please specify your project id in PROJECT_ID variable.")
    raise KeyboardInterrupt

!gcloud auth print-identity-token -q &> /dev/null || gcloud auth login --project="{PROJECT_ID}" --update-adc --quiet

!gcloud config set project {PROJECT_ID}
os.environ["GOOGLE_CLOUD_PROJECT"] = PROJECT_ID
os.environ["GOOGLE_CLOUD_REGION"] = REGION

## 1.1 Clone Repository

**Why are we doing this?**
Google Colab is a temporary virtual machine. It starts empty. The code for our agents (`main.py`, `Dockerfile`) lives in GitHub. We need to `git clone` (download) that code into this machine so we can build and deploy it.

In [None]:
!git clone https://github.com/sangalo20/Serverless-agents-cloudrun.git
%cd Serverless-agents-cloudrun

## 1.2 Define Dockerfiles

**Infrastructure as Code**
To give us full control, we will define our Dockerfiles right here in the notebook. This allows us to see exactly how our container is built and make changes if needed.

In [None]:
%%writefile librarian/Dockerfile
FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Run with Gunicorn
CMD ["gunicorn", "--bind", "0.0.0.0:8080", "--workers", "1", "--threads", "8", "--timeout", "0", "main:app"]

In [None]:
%%writefile guide/Dockerfile
FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Run with Gunicorn
CMD ["gunicorn", "--bind", "0.0.0.0:8080", "--workers", "1", "--threads", "8", "--timeout", "0", "main:app"]

## 2. Enable APIs

**What are these?**
Google Cloud services are not enabled by default. We need to turn on the specific services we plan to use:
*   `run.googleapis.com`: **Cloud Run** (to run our containers).
*   `eventarc.googleapis.com`: **Eventarc** (to trigger the Librarian when a file is uploaded).
*   `aiplatform.googleapis.com`: **Vertex AI** (to use the Gemini model).
*   `firestore.googleapis.com`: **Firestore** (our database).
*   `cloudbuild.googleapis.com`: **Cloud Build** (to build our Docker containers).
*   `storage.googleapis.com`: **Cloud Storage** (to store the PDF files).
*   `artifactregistry.googleapis.com`: **Artifact Registry** (to store our Docker images).

In [None]:
!gcloud services enable run.googleapis.com eventarc.googleapis.com aiplatform.googleapis.com firestore.googleapis.com cloudbuild.googleapis.com storage.googleapis.com artifactregistry.googleapis.com

## 3. Create Infrastructure

**The Plan:**
1.  **Cloud Storage Bucket**: We need a place to upload our conference schedules (PDFs). This bucket will act as the "Inbox" for our Librarian agent.
2.  **Permissions**: We ensure our build service account has permission to write logs, save images, access Vertex AI, and write to Firestore.
3.  **Firestore Database**: We need a fast, serverless database to store the *summarized knowledge* and the *chat history*. We use Firestore in "Native" mode.
4.  **Artifact Registry**: We need a repository to store our Docker images.

In [None]:
BUCKET_NAME = f"{PROJECT_ID}-knowledge-base"
!gsutil mb -l {REGION} gs://{BUCKET_NAME}
print(f"Created bucket: {BUCKET_NAME}")

# Create Firestore in Native mode (if not exists)
!gcloud firestore databases create --location={REGION} --type=firestore-native

# Create Artifact Registry Repository
!gcloud artifacts repositories create containers --repository-format=docker --location={REGION} --description="Docker repository"

## 3.1 Setup Permissions

**Granting Access**
We need to ensure our build service account has permission to write logs, save images, access Vertex AI, and write to Firestore.

In [None]:
# Get Project Number and Service Account
PROJECT_NUMBER = !gcloud projects describe {PROJECT_ID} --format='value(projectNumber)'
PROJECT_NUMBER = PROJECT_NUMBER[0]
SERVICE_ACCOUNT = f"{PROJECT_NUMBER}-compute@developer.gserviceaccount.com"

# Grant permissions for Cloud Build and Runtime (Logging, Artifact Registry, Vertex AI, Firestore)
!gcloud projects add-iam-policy-binding {PROJECT_ID} --member=serviceAccount:{SERVICE_ACCOUNT} --role=roles/logging.logWriter
!gcloud projects add-iam-policy-binding {PROJECT_ID} --member=serviceAccount:{SERVICE_ACCOUNT} --role=roles/artifactregistry.writer
!gcloud projects add-iam-policy-binding {PROJECT_ID} --member=serviceAccount:{SERVICE_ACCOUNT} --role=roles/storage.admin
!gcloud projects add-iam-policy-binding {PROJECT_ID} --member=serviceAccount:{SERVICE_ACCOUNT} --role=roles/aiplatform.user
!gcloud projects add-iam-policy-binding {PROJECT_ID} --member=serviceAccount:{SERVICE_ACCOUNT} --role=roles/datastore.user

# Grant Pub/Sub Publisher to Cloud Storage Service Agent (required for Eventarc)
GCS_SERVICE_AGENT = f"service-{PROJECT_NUMBER}@gs-project-accounts.iam.gserviceaccount.com"
!gcloud projects add-iam-policy-binding {PROJECT_ID} --member=serviceAccount:{GCS_SERVICE_AGENT} --role=roles/pubsub.publisher

## 4. Build "The Librarian" Service

**Step 1: Build Container**
We use `gcloud builds submit` to package our python code (`librarian/main.py`) into a Docker container image. This image is stored in Artifact Registry and is ready to be deployed.

In [None]:
SERVICE_NAME_LIBRARIAN = "librarian"
!gcloud builds submit --tag {REGION}-docker.pkg.dev/{PROJECT_ID}/containers/{SERVICE_NAME_LIBRARIAN} librarian/

## 4.1 Deploy "The Librarian" Service

**Step 2: Deploy to Cloud Run**
Now we take the image we just built and deploy it to Cloud Run. We use `--allow-unauthenticated` so that Eventarc can easily trigger it.

In [None]:
!gcloud run deploy {SERVICE_NAME_LIBRARIAN} --image {REGION}-docker.pkg.dev/{PROJECT_ID}/containers/{SERVICE_NAME_LIBRARIAN} --region {REGION} --allow-unauthenticated

## 5. Build "The Guide" Service

**Step 1: Build Container**
Similar to the Librarian, we first build the container image for the Guide service.

In [None]:
SERVICE_NAME_GUIDE = "guide"
!gcloud builds submit --tag {REGION}-docker.pkg.dev/{PROJECT_ID}/containers/{SERVICE_NAME_GUIDE} guide/

## 5.1 Deploy "The Guide" Service

**Step 2: Deploy to Cloud Run**
We deploy the Guide service. This service will host the chat endpoint.

In [None]:
!gcloud run deploy {SERVICE_NAME_GUIDE} --image {REGION}-docker.pkg.dev/{PROJECT_ID}/containers/{SERVICE_NAME_GUIDE} --region {REGION} --allow-unauthenticated

## 6. Wire it up with Eventarc

**The Magic Glue**
Right now, the Librarian service is running, but it doesn't know when a file is uploaded. We need **Eventarc** to bridge the gap.

We create a **Trigger** that says:
*   **IF** a file is `finalized` (uploaded) ...
*   **IN** the specific bucket `{BUCKET_NAME}` ...
*   **THEN** send a POST request to the `{SERVICE_NAME_LIBRARIAN}` service.

*Note: We also grant the necessary IAM permissions so Eventarc is allowed to call our Cloud Run service.*

In [None]:
# Grant permission to the Compute Engine service account (default for Eventarc)
PROJECT_NUMBER = !gcloud projects describe {PROJECT_ID} --format='value(projectNumber)'
PROJECT_NUMBER = PROJECT_NUMBER[0]
SERVICE_ACCOUNT = f"{PROJECT_NUMBER}-compute@developer.gserviceaccount.com"

!gcloud projects add-iam-policy-binding {PROJECT_ID} --member=serviceAccount:{SERVICE_ACCOUNT} --role=roles/eventarc.eventReceiver
!gcloud projects add-iam-policy-binding {PROJECT_ID} --member=serviceAccount:{SERVICE_ACCOUNT} --role=roles/run.invoker

# Create the trigger
!gcloud eventarc triggers create librarian-trigger \
  --location={REGION} \
  --destination-run-service={SERVICE_NAME_LIBRARIAN} \
  --destination-run-region={REGION} \
  --event-filters="type=google.cloud.storage.object.v1.finalized" \
  --event-filters="bucket={BUCKET_NAME}" \
  --service-account={SERVICE_ACCOUNT}

## 7. Test it!

**Interactive UI**
Let's test our agents with a real UI! Use the buttons below to upload any PDF or Text file, and then chat with your agent.

In [None]:
import ipywidgets as widgets
from IPython.display import display
from google.colab import files
import requests
import os

# --- Configuration ---
# Get Guide URL dynamically
try:
    GUIDE_URL = !gcloud run services describe {SERVICE_NAME_GUIDE} --region {REGION} --format='value(status.url)'
    GUIDE_URL = GUIDE_URL[0]
except:
    print("Error: Could not find Guide service URL. Make sure it is deployed.")
    GUIDE_URL = ""

SESSION_ID = "interactive-session"

# --- UI Elements ---

# 1. Upload Section
upload_btn = widgets.Button(description="Upload Document", button_style='info', icon='upload')
upload_out = widgets.Output()

def on_upload_clicked(b):
    with upload_out:
        upload_out.clear_output()
        print("Select a file to upload...")
        uploaded = files.upload()
        for filename in uploaded.keys():
            print(f"Uploading {filename} to {BUCKET_NAME}...")
            !gsutil cp "{filename}" gs://{BUCKET_NAME}/{filename}
            print(f"✅ {filename} uploaded! The Librarian is processing it (wait ~10-20s)...")

upload_btn.on_click(on_upload_clicked)

# 2. Chat Section
chat_history = widgets.Output(layout={'border': '1px solid #ccc', 'height': '300px', 'overflow_y': 'scroll'})
user_input = widgets.Text(placeholder='Ask a question about your document...', layout={'width': '70%'})
send_btn = widgets.Button(description="Send", button_style='primary')

def on_send_clicked(b):
    question = user_input.value
    if not question: return
    
    # Display User Message
    with chat_history:
        print(f"You: {question}")
    
    user_input.value = '' # Clear input
    
    # Call API
    try:
        response = requests.post(f"{GUIDE_URL}/chat", json={"session_id": SESSION_ID, "query": question})
        if response.status_code == 200:
            data = response.json()
            answer = data.get('answer', 'No answer received.')
            with chat_history:
                print(f"Agent: {answer}\n")
        else:
            with chat_history:
                print(f"Error: {response.status_code} - {response.text}\n")
    except Exception as e:
        with chat_history:
            print(f"Connection Error: {e}\n")

send_btn.on_click(on_send_clicked)
user_input.on_submit(on_send_clicked)

# Layout
print(f"Connected to Guide Agent at: {GUIDE_URL}")
display(widgets.VBox([
    widgets.HTML("<h3>1. Knowledge Base</h3>"),
    upload_btn,
    upload_out,
    widgets.HTML("<hr><h3>2. Chat Interface</h3>"),
    chat_history,
    widgets.HBox([user_input, send_btn])
]))

## 8. Cleanup

**Clean up your resources**
To avoid incurring charges, delete the resources used in this workshop.

In [None]:
print("Starting cleanup...\n")

# 1. Delete Eventarc Trigger
print(f"Deleting Eventarc trigger: librarian-trigger...")
!gcloud eventarc triggers delete librarian-trigger --location={REGION} --quiet
print("✅ Eventarc trigger deleted.\n")

# 2. Delete Cloud Run Services
print(f"Deleting Cloud Run service: {SERVICE_NAME_LIBRARIAN}...")
!gcloud run services delete {SERVICE_NAME_LIBRARIAN} --region {REGION} --quiet
print(f"✅ Service {SERVICE_NAME_LIBRARIAN} deleted.\n")

print(f"Deleting Cloud Run service: {SERVICE_NAME_GUIDE}...")
!gcloud run services delete {SERVICE_NAME_GUIDE} --region {REGION} --quiet
print(f"✅ Service {SERVICE_NAME_GUIDE} deleted.\n")

# 3. Delete Cloud Storage Bucket
print(f"Deleting Bucket: {BUCKET_NAME}...")
!gsutil rm -r gs://{BUCKET_NAME}
print(f"✅ Bucket {BUCKET_NAME} deleted.\n")

# 4. Delete Artifact Registry Repository
print("Deleting Artifact Registry repository: containers...")
!gcloud artifacts repositories delete containers --location={REGION} --quiet
print("✅ Artifact Registry repository deleted.\n")

# 5. Delete Firestore Database
print("Deleting Firestore database: (default)...")
!gcloud firestore databases delete --database="(default)" --quiet
print("✅ Firestore database deleted.\n")

print("Cleanup complete!")