<a href="https://colab.research.google.com/github/hardrave/GCP_Guild_AI_in_GCP/blob/main/cloudrunui_gemma2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#GCP Guild Association: Deploying Ollama as a Sidecar with Cloud Run and Open WebUI

This collaborative notebook was created for the GCP Guild AOssociation to demonstrate how to deploy Ollama as a sidecar container with Cloud Run, using Open WebUI as the frontend ingress container. The notebook provides a step-by-step guide on setting up the Cloud Run environment, configuring resources, containerizing both Ollama and Open WebUI, and deploying them as a multi-container Cloud Run service.

# Authenticate with Google Cloud

This cell authenticates your Google Cloud account using the `gcloud` command-line tool. It updates the application default credentials (ADC) and runs in quiet mode.

In [None]:
!gcloud auth login --update-adc --quiet

# Initialize Vertex AI with Project and Location

This cell initializes the Vertex AI SDK with your Google Cloud project and location settings.

1. **Import necessary libraries:** Imports the `os` module for environment variables and the `vertexai` library for Vertex AI interactions.
2. **Project ID:** Sets the `PROJECT_ID` variable. If you provide a value in the Colab interface, it uses that. Otherwise, it automatically retrieves the project ID from the `GOOGLE_CLOUD_PROJECT` environment variable.
3. **Location:** Sets the `LOCATION` variable. It defaults to "us-central1" if the `GOOGLE_CLOUD_REGION` environment variable is not set.
4. **Initialize Vertex AI:** Initializes the Vertex AI SDK with the specified project and location, enabling you to use Vertex AI services in your code.

In [None]:
# Use the environment variable if the user doesn't provide Project ID.
import os

import vertexai

PROJECT_ID = ""  # @param {type:"string", isTemplate: true}
if PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

vertexai.init(project=PROJECT_ID, location=LOCATION)

# Define Artifact Repository Name

This cell defines a variable to store the name of your Artifact Registry repository.

*   **AR_REPOSITORY_NAME:** This variable is assigned the value "ollama-sidecar-codelab", which will be used as the name of the repository for storing your Docker images.

In [None]:
AR_REPOSITORY_NAME = "ollama-sidecar-codelab"

# Create Artifact Registry Repository

This cell creates a new Artifact Registry repository to store your Docker images.

*   **gcloud artifacts repositories create:** This command uses the `gcloud` command-line tool to create a new Artifact Registry repository.
*   **AR_REPOSITORY_NAME:** This variable is replaced with the name you defined earlier ("ollama-sidecar-codelab").
*   **--repository-format=docker:** Specifies that the repository will store Docker images.
*   **--location=LOCATION:** Uses the location you specified earlier (e.g., "us-central1").
*   **--project=$PROJECT_ID:** Uses the project ID you specified or retrieved from the environment variable.

In [None]:
!gcloud artifacts repositories create $AR_REPOSITORY_NAME \
      --repository-format=docker \
      --location=$LOCATION \
      --project=$PROJECT_ID

# Define Model Name

This cell defines a variable to store the name of your machine learning model.

*   **MODEL_NAME:** This variable is assigned the value "gemma2:2b", which represents the name and version of the model you'll be using.

In [None]:
MODEL_NAME = "gemma2:2b"

# Create Dockerfile for Ollama Model Serving

This cell creates a Dockerfile that defines the environment for serving your Ollama model.

1. **Define Dockerfile content:** It creates a multiline string (`dockerfile_content`) containing the instructions for building the Docker image.
    * **Base Image:** It starts with the `ollama/ollama` base image.
    * **Environment Variables:** It sets environment variables for Ollama, such as the host, model storage, logging, and keep-alive settings.
    * **Model Download:** It downloads the specified model weights using `ollama pull`.
    * **Entrypoint:** It defines the command to run when the container starts, which is `ollama serve` to start the Ollama server.
2. **Write Dockerfile:** It writes the content of `dockerfile_content` to a file named "Dockerfile" in your Colab environment.

In [None]:
dockerfile_content = f"""
FROM ollama/ollama

# Listen on all interfaces, port 11434
ENV OLLAMA_HOST 0.0.0.0:11434

# Store model weight files in /models
ENV OLLAMA_MODELS /models

# Reduce logging verbosity
ENV OLLAMA_DEBUG false

# Never unload model weights from the GPU
ENV OLLAMA_KEEP_ALIVE -1

# Store the model weights in the container image
ENV MODEL gemma2:2b
RUN ollama serve & sleep 5 && ollama pull $MODEL

# Start Ollama
ENTRYPOINT ["ollama", "serve"]
"""

# Write the Dockerfile
with open("Dockerfile", "w") as f:
    f.write(dockerfile_content)

# Build and Push Docker Image

This cell builds a Docker image using Cloud Build and pushes it to Artifact Registry.

*   **gcloud builds submit:** This command initiates a Cloud Build job to build the Docker image based on the Dockerfile you created.
*   **--project PROJECT_ID:** Specifies the Google Cloud project ID for the build.
*   **--tag us-central1-docker.pkg.dev/$PROJECT_ID/ollama-sidecar-codelab-repo/ollama-gemma-2b:**  Tags the built image with a unique identifier, including the Artifact Registry repository path. This tag is used for pushing the image to Artifact Registry.
*   **--machine-type e2-highcpu-32:** Specifies the machine type to use for the build process, selecting a high-CPU machine for faster builds.

In [None]:
!gcloud builds submit --project $PROJECT_ID \
   --tag us-central1-docker.pkg.dev/$PROJECT_ID/ollama-sidecar-codelab-repo/ollama-gemma-2b \
   --machine-type e2-highcpu-32

# Pull Open Web UI Image

This cell pulls the Open Web UI Docker image from the GitHub Container Registry.

*   **docker pull:** This command is used to download a Docker image from a registry.
*   **ghcr.io/open-webui/open-webui:main:** Specifies the image to pull, which is the Open Web UI image from the GitHub Container Registry, using the `main` tag.

In [None]:
!docker pull ghcr.io/open-webui/open-webui:main


# Install gcr.io/go-containerregistry/crane:latest
This cell installs the `crane` tool, which is used for interacting with container registries.

* `!gcrane`: This command downloads and installs the `crane` tool from the Google Container Registry.

In [None]:
!gcrane

# Push Open Web UI Image to Artifact Registry

This cell pushes the Open Web UI Docker image to your Artifact Registry repository.

*   **docker push:** This command pushes a Docker image to a registry.
*   **us-central1-docker.pkg.dev/$PROJECT_ID/ollama-sidecar-codelab-repo/openwebui:** Specifies the destination for the image, which is your Artifact Registry repository in the specified location and project, with the repository name "ollama-sidecar-codelab-repo" and the image name "openwebui".

In [None]:
!docker push us-central1-docker.pkg.dev/$PROJECT_ID/ollama-sidecar-codelab-repo/openwebui

# Define Knative Service Configuration

This cell defines the configuration for a Knative service using a YAML manifest.

1. **Import os:** Imports the `os` module for environment variables.
2. **Get Project ID:** Sets the `PROJECT_ID` variable. You can provide a value or it will automatically retrieve the project ID from the Colab environment.
3. **Define YAML Content:** Defines the `yaml_content` variable containing the YAML configuration for the Knative service. This configuration specifies:
    * **API Version and Kind:** Sets the API version to `serving.knative.dev/v1` and the kind to `Service`, indicating a Knative service.
    * **Metadata:** Defines metadata like the service name (`ollama-sidecar-codelab`) and labels.
    * **Spec:** Specifies the service's specifications, including:
        * **Template:** Defines the template for the service's pods, including annotations for autoscaling, resource limits, and container dependencies.
        * **Containers:** Defines two containers:
            * `openwebui`: The container for the Open Web UI, using the image from Artifact Registry.
            * `ollama-sidecar`: The container for the Ollama model server, also using the image from Artifact Registry.
        * **Volumes:** Defines volumes for in-memory storage used by the containers.
4. **Write the YAML content to service.yaml** This line creates a yaml file called `service.yaml` and writes all the above yaml content into this file.

In [None]:
import os

# Get the PROJECT_ID from environment variables

PROJECT_ID = ""  # @param {type:"string", isTemplate: true}

yaml_content = f"""
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: ollama-sidecar-codelab
  labels:
    cloud.googleapis.com/location: us-central1
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/maxScale: '5'
        run.googleapis.com/cpu-throttling: 'false'
        run.googleapis.com/startup-cpu-boost: 'true'
        run.googleapis.com/container-dependencies: '{{"openwebui":["ollama-sidecar"]}}'
    spec:
      containerConcurrency: 80
      timeoutSeconds: 300
      containers:
      - name: openwebui
        image: us-central1-docker.pkg.dev/{PROJECT_ID}/ollama-sidecar-codelab-repo/openwebui
        ports:
        - name: http1
          containerPort: 8080
        env:
        - name: OLLAMA_BASE_URL
          value: http://localhost:11434
        - name: WEBUI_AUTH
          value: 'false'
        resources:
          limits:
            memory: 2Gi
            cpu: '2'
        volumeMounts:
        - name: in-memory-1
          mountPath: /app/backend/data
        startupProbe:
          timeoutSeconds: 240
          periodSeconds: 240
          failureThreshold: 1
          tcpSocket:
            port: 8080
      - name: ollama-sidecar
        image: us-central1-docker.pkg.dev/{PROJECT_ID}/ollama-sidecar-codelab-repo/ollama-gemma-2b
        resources:
          limits:
            cpu: '6'
            memory: 20Gi
        volumeMounts:
        - name: in-memory-2
          mountPath: /root/.ollama
        startupProbe:
          timeoutSeconds: 1
          periodSeconds: 10
          failureThreshold: 3
          tcpSocket:
            port: 11434
      volumes:
      - name: in-memory-2
        emptyDir:
          medium: Memory
          sizeLimit: 10Gi
      - name: in-memory-1
        emptyDir:
          medium: Memory
          sizeLimit: 1Gi
"""

# Write the YAML content to service.yaml
with open("service.yaml", "w") as f:
    f.write(yaml_content)

print(f"Replaced YOUR_PROJECT_ID with {PROJECT_ID} in service.yaml")


# Deploy Knative Service

This cell deploys the Knative service defined in the `service.yaml` file.

*   **gcloud beta run services replace:** This command uses the `gcloud` command-line tool to deploy or update a Knative service. The `beta` indicates that it's using the beta version of the `run` command group. `replace` specifies that if the services already exists, replace it with the provided configuration, otherwise create it.
*   **service.yaml:** This refers to the YAML file containing the service configuration that was created in the previous cell.
*   **--project=$PROJECT_ID:** Specifies the Google Cloud project ID where the service will be deployed.

In [None]:
!gcloud beta run services replace service.yaml --project=$PROJECT_ID