# Finetuning LLMs with Vertex AI Custom Job on Google Cloud



## Setup Development Environment
1. A Google Cloud Project.
2. [A service account](https://cloud.google.com/iam/docs/service-accounts-create#iam-service-accounts-create-console) with `Vertex AI User`, `Storage Object Admin` and `Artifact Registry Reader` roles for finetuning model.
3. Make sure you have `Python >=3.8`
4. Install the Google Cloud SDK with: `curl https://sdk.cloud.google.com | bash`
5. Authenticate your gcloud sdk with 

```bash
gcloud auth login
gcloud config set project <your-project-id>
gcloud auth application-default login
```
6. Install the `google-cloud-aiplatform` using
```bash
pip install google-cloud-aiplatform
```
7. Make sure you have access to the [Google-Cloud-Containers](https://github.com/huggingface/Google-Cloud-Containers.git) GitHub repository and clone it using:
```bash
git clone https://github.com/huggingface/Google-Cloud-Containers.git
```

## Build Docker Image and Push it to Artifact Registry

The User Account that you used in `Step 5` in `Setup Development Environment` must have `Artifact Registry Writer` role to push the image to the `Artifact Registry`

In [1]:
%%writefile push-to-gar.sh 

#!/bin/bash
####################################################################
# Description: Builds docker image and pushes to Google Artifact-Registry 
####################################################################

REGION="us-central1"
DOCKER_ARTIFACT_REPO="deep-learning-images"
PROJECT_ID="project-id" # Set it properly
FRAMEWORK="pytorch"
TYPE="training"
ACCELERATOR="gpu"
FRAMEWORK_VERSION="2.1"
TRANSFORMERS_VERISON="4.38.1"
PYTHON_VERSION="py310"

SERVING_CONTAINER_IMAGE_URI="${REGION}-docker.pkg.dev/${PROJECT_ID}/${DOCKER_ARTIFACT_REPO}/huggingface-${FRAMEWORK}-${TYPE}-${ACCELERATOR}.${FRAMEWORK_VERSION}.transformers.${TRANSFORMERS_VERISON}.${PYTHON_VERSION}:latest"

# Set Google-Cloud Region
gcloud config set region "${REGION}" --quiet

# Enable Artifact-Registry API
gcloud services enable artifactregistry.googleapis.com

# create a new Docker repository with your region with the description
gcloud artifacts repositories create "${DOCKER_ARTIFACT_REPO}" \
  --repository-format=docker \
  --location="${REGION}" \
  --description="Deep Learning Images"

# verify that your repository was created.
gcloud artifacts repositories list \
  --location="${REGION}" \
  --filter="name~${DOCKER_ARTIFACT_REPO}"

# configure docker to use your repository    
gcloud auth configure-docker "${REGION}-docker.pkg.dev"

# build and push
docker build -t "${SERVING_CONTAINER_IMAGE_URI}" -f "Google-Cloud-Containers/containers/pytorch/training/gpu/2.1/transformers/4.38.1/py310/Dockerfile" .

docker push "${SERVING_CONTAINER_IMAGE_URI}"

Writing push-to-gar.sh


**Execute the script from CLI**

```bash
chmod +x push-to-gar.sh
./push-to-gar.sh

```

## Use the Image from Artifact Registry and a local training script to run a CustomJob on VertexAI

### Import Required Libraries

In [2]:
import os
from datetime import datetime

from google.cloud import aiplatform

### Define GCP Variables

In [3]:
# Cloud project id.
PROJECT_ID = "project-id"  # Set it properly

# Region for launching jobs.
REGION = "us-central1"  

# Cloud Storage bucket for storing experiments output.
BUCKET_URI = "gs://gcp-notebook-test-bucket"  
STAGING_BUCKET = os.path.join(BUCKET_URI, "temporal")
EXPERIMENT_BUCKET = os.path.join(BUCKET_URI, "peft")
MODEL_BUCKET = os.path.join(BUCKET_URI, "model")

# Service Account for Launching Jobs
SERVICE_ACCOUNT = "compute@developer.gserviceaccount.com"  # Set it properly

### Define your training script

In [4]:
%%writefile finetune-gemma-lora.py
import torch
import os
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from transformers import Trainer, DataCollatorForLanguageModeling, TrainingArguments
from trl import SFTTrainer
from peft import LoraConfig
from datasets import load_dataset


def train_model():
    
    #Load dataset
    raw_dataset = load_dataset("Abirate/english_quotes", split="train")
    
    # Define Model
    model_id = "google/gemma-2b"

    # Load Tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    tokenizer.pad_token = tokenizer.eos_token
    
    
    #Define Quantization
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )

    model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
    
    # LoRA config
    peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.05,
        r=8,
        bias="none",
        target_modules=["q_proj", "v_proj"],
        task_type="CAUSAL_LM", 
    )


    # Define training arguments
    training_args = TrainingArguments(
        output_dir="output",
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        num_train_epochs=1,
        logging_strategy="steps",
        logging_steps=20,
        bf16=True,
        optim="paged_adamw_8bit",

    )

    # Initialize our Trainer
    trainer = SFTTrainer(
        model=model,
        peft_config=peft_config,
        args=training_args,
        dataset_text_field="text",
        packing=True,
        train_dataset=raw_dataset,
        tokenizer=tokenizer,
    )

    # Train the model
    trainer.train()

    # save model
    trainer.save_model()

if __name__=="__main__":
    train_model()

Writing finetune-gemma-lora.py


### Launch Job using Vertex AI 

In [5]:
def get_job_name_with_datetime(prefix: str) -> str:
    """Gets the job name with date time when triggering training or deployment
    jobs in Vertex AI.
    """
    return prefix + datetime.now().strftime("_%Y%m%d_%H%M%S")

job_name = get_job_name_with_datetime("gemma-lora-finetune")

In [6]:
# Initialize the Vertex AI SDK to store common configurations 
aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=STAGING_BUCKET)

In [7]:
# Artifact Registry Image URI, that we pushed earlier
TRAIN_DOCKER_URI = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/deep-learning-images/huggingface-pytorch-training-gpu-2.1.transformers.4.38.1.py310:latest"

In [8]:
# Finetune with 1 L4 (24G).
machine_type = "g2-standard-4"
accelerator_type = "NVIDIA_L4"
accelerator_count = 1
replica_count = 1

In [9]:
# Pass training arguments, training script, launch job using a local script.
train_job = aiplatform.CustomJob.from_local_script(
    display_name=job_name,
    script_path="finetune-gemma-lora.py",
    container_uri=TRAIN_DOCKER_URI,
    environment_variables={"HF_TOKEN": "xxxx"}, #Needed to download the model, Set it properly
    replica_count=replica_count,
    machine_type=machine_type,
    accelerator_type=accelerator_type,
    accelerator_count=accelerator_count,
)

Training script copied to:
gs://gcp-notebook-test-bucket/temporal/aiplatform-2024-03-15-14:46:36.960-aiplatform_custom_trainer_script-0.1.tar.gz.


In [None]:
train_job.run(
    service_account=SERVICE_ACCOUNT,
)