
# Lab02vb: Deploying PEFTModel on GCP

**Objective**: Learn how to fine-tune, containerize, and deploy a Parameter Efficient Fine-Tuning (PEFT) model to Google Cloud Platform (GCP) for production use.

## Prerequisites:
1. Basic knowledge of Python, Machine Learning, and Docker.
2. A Google Cloud Project set up with billing enabled.
3. Google Cloud SDK installed.


In [None]:

# Step 1: Connect to Google Colab and Configure the Environment
# Mount Google Drive to access data and model files
from google.colab import drive
drive.mount('/content/drive')

# Install necessary libraries in Colab
!pip install transformers peft torch google-cloud-storage flask gunicorn



### Explanation:
- We use `Google Drive` to store data and models for access within the notebook.
- We install necessary packages like `transformers`, `peft`, and `torch` to load and fine-tune models, and `flask` and `gunicorn` to build the API for deployment.


In [None]:

# Step 2: Fine-Tune a Hugging Face Model using PEFT

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
from peft import LoraConfig, get_peft_model

# Load a pre-trained model and tokenizer from Hugging Face
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Apply a LoRA configuration to the model
lora_config = LoraConfig(
    r=16, lora_alpha=32, lora_dropout=0.1, task_type="CAUSAL_LM"
)
peft_model = get_peft_model(model, lora_config)

# Save the fine-tuned model locally
peft_model.save_pretrained("/content/drive/MyDrive/peft_model/")



### Explanation:
- `AutoTokenizer` and `AutoModelForCausalLM` load the GPT-2 model and tokenizer.
- **PEFT (Parameter Efficient Fine-Tuning)** is used here, specifically **LoRA (Low-Rank Adaptation)**, which adapts the model using fewer parameters.
- After fine-tuning, the model is saved to Google Drive.


In [None]:

# Step 3: Save the Model to Google Cloud Storage (GCS)

from google.cloud import storage

# Set up your Google Cloud Storage bucket name
bucket_name = 'your-gcs-bucket-name'

def save_model_to_gcs(local_model_dir, bucket_name):
    client = storage.Client()
    bucket = client.bucket(bucket_name)
    blob = bucket.blob('models/peft_model.bin')
    blob.upload_from_filename(local_model_dir)

save_model_to_gcs('/content/drive/MyDrive/peft_model/pytorch_model.bin', bucket_name)



### Explanation:
- We save the model to Google Cloud Storage (GCS) using the `google-cloud-storage` library.
- This step ensures that the model is accessible for further deployment on GCP services.


In [None]:

# Step 4: Build a Flask API to Serve the Model

%%writefile app.py
from flask import Flask, request, jsonify
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

app = Flask(__name__)

# Load the fine-tuned PEFTModel from GCS
model_name = "/content/drive/MyDrive/peft_model/"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

@app.route("/predict", methods=["POST"])
def predict():
    input_data = request.json["input"]
    inputs = tokenizer(input_data, return_tensors="pt")
    output = model.generate(**inputs)
    prediction = tokenizer.decode(output[0], skip_special_tokens=True)
    return jsonify({"prediction": prediction})

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8080)



### Explanation:
- We define a basic Flask API with a `/predict` endpoint.
- The model is loaded, and an input string is processed using the tokenizer. The model then generates predictions.


In [None]:

# Step 5: Containerize the API Using Docker

%%writefile Dockerfile
# Use Python slim image
FROM python:3.9-slim

# Install required libraries
RUN pip install transformers peft torch flask gunicorn google-cloud-storage

# Copy the model and API code
COPY ./app.py /app/
COPY ./peft_model /app/peft_model/

# Set the working directory
WORKDIR /app

# Expose the port
EXPOSE 8080

# Run the app with Gunicorn
CMD ["gunicorn", "--bind", "0.0.0.0:8080", "app:app"]



### Explanation:
- This `Dockerfile` defines a Python 3.9 image and installs all necessary dependencies to serve the model using Flask.
- We expose port 8080 for external access and run the app using Gunicorn for production-ready performance.


In [None]:

# Step 6: Deploy the Container to Google Cloud Run

# Step 1: Authenticate with Google Cloud
!gcloud auth login

# Step 2: Set your Google Cloud Project
!gcloud config set project your-project-id

# Step 3: Build the Docker image and push to Google Container Registry
!gcloud builds submit --tag gcr.io/your-project-id/peftmodel-api .

# Step 4: Deploy the Docker container to Cloud Run
!gcloud run deploy peftmodel-api     --image gcr.io/your-project-id/peftmodel-api     --platform managed     --region us-central1     --allow-unauthenticated     --memory 4Gi     --cpu 2



### Explanation:
- First, authenticate with Google Cloud and configure your project.
- We use `gcloud builds submit` to package the Docker image and upload it to **Google Container Registry (GCR)**.
- The container is deployed to **Google Cloud Run**, a serverless platform that auto-scales with traffic demand.


In [None]:

# Step 7: Testing the Deployed API

import requests

# Test the API endpoint
response = requests.post(
    "https://your-cloud-run-url/predict",
    json={"input": "What is the weather today?"}
)
print(response.json())
