# Fine-tuning Llama 3 8B Instruct Model with SageMaker

This notebook demonstrates how to fine-tune the Llama 3 8B Instruct model using Amazon SageMaker. We'll use the JumpStart feature to simplify the fine-tuning process.

In [15]:
import os
import json
import time
import boto3
from uuid import uuid4
import sagemaker
from sagemaker import hyperparameters
from sagemaker.jumpstart.estimator import JumpStartEstimator

## Setup AWS Configuration

First, we'll set up our AWS configuration including the profile, region, and S3 bucket details. This establishes our connection to AWS services.

In [16]:
PROFILE_NAME = "dev"
REGION_NAME = "us-east-1"
BUCKET_NAME = "khalil-adib-bucket"

session = boto3.Session(profile_name=PROFILE_NAME, region_name=REGION_NAME)
s3_client = session.client('s3')
sagemaker_session = sagemaker.Session(boto_session=session)

## Define Training Template

We'll create a template for instruction tuning that defines how the model should process inputs and generate outputs. This template will be used to format our training data.

In [17]:
instruct_train_template = {
    "prompt": "Answer user query, be helpful, don't come up with facts. "
    "Write a response that appropriately completes the request.\n\n"
    "### Input:\n{prompt}\n\n",
    "completion": " {completion}",
}

with open("template.json", "w") as f:
    json.dump(instruct_train_template, f)

## Upload Template to S3

The training template needs to be uploaded to S3 so it can be accessed by the SageMaker training job.

In [9]:
data_root = "webinar"
input_root = f"{data_root}/dataset/"
output_key = f"{data_root}/output/"

template_params = {
    "Filename": "template.json",
    "Bucket": BUCKET_NAME,
    "Key": f"{data_root}template.json"
}

s3_client.upload_file(**template_params)

## Configure Training Parameters

Here we set up the essential parameters for our fine-tuning job:
- Training data location
- Output path for the model
- Instance type (using ml.g5.12xlarge for optimal performance)
- IAM role for SageMaker execution
- Model ID and version (Llama 3 8B Instruct)

In [32]:
train_data_uri = f"s3://{BUCKET_NAME}/{data_root}"
output_path = f"s3://{BUCKET_NAME}/{output_key}"

JOB_NAME = f"webinar-finetine-job-{str(uuid4())}"

INSTANCE_TYPE = "ml.g5.12xlarge"
ROLE = "arn:aws:iam::026090512591:role/sagemaker-execution-role-SageMakerExecutionRole-lZm8CUm9jqkj"

model_id = "meta-textgeneration-llama-3-8b-instruct"
model_version = "2.13.0"
accept_eula = "true"

## Customize Hyperparameters

We'll modify the default hyperparameters to suit our specific use case:
- Enable instruction tuning
- Disable chat dataset format
- Set maximum input length
- Configure validation split ratio

In [33]:
webinar_training_hyperparamter = hyperparameters.retrieve_default(
    model_id=model_id, model_version=model_version, sagemaker_session=sagemaker_session
)
webinar_training_hyperparamter

{'int8_quantization': 'False',
 'enable_fsdp': 'True',
 'epoch': '1',
 'learning_rate': '0.0001',
 'lora_r': '8',
 'lora_alpha': '32',
 'target_modules': 'q_proj,v_proj',
 'lora_dropout': '0.05',
 'instruction_tuned': 'False',
 'chat_dataset': 'True',
 'add_input_output_demarcation_key': 'True',
 'per_device_train_batch_size': '1',
 'per_device_eval_batch_size': '1',
 'max_train_samples': '-1',
 'max_val_samples': '-1',
 'seed': '10',
 'max_input_length': '-1',
 'validation_split_ratio': '0.2',
 'train_data_split_seed': '0',
 'preprocessing_num_workers': 'None'}

In [34]:
webinar_training_hyperparamter['instruction_tuned'] = 'True'
webinar_training_hyperparamter['chat_dataset'] = 'False'
webinar_training_hyperparamter['max_input_length'] = '512'
webinar_training_hyperparamter['validation_split_ratio'] = '0.2'

In [35]:
hyperparameters.validate(
    sagemaker_session=sagemaker_session,
    model_id=model_id,
    model_version=model_version,
    hyperparameters=webinar_training_hyperparamter
)

## Initialize and Start Training

Now we'll create the JumpStart estimator and start the fine-tuning job. This will:
1. Initialize the training environment
2. Configure the model parameters
3. Start the training process

In [None]:
estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    role=ROLE,
    environment={"accept_eula": "true"},  # set "accept_eula": "true" to accept the EULA for gated models
    disable_output_compression=True,
    instance_type=INSTANCE_TYPE,
    hyperparameters=webinar_training_hyperparamter,
    output_path=output_path,
    sagemaker_session=sagemaker_session,
)

# Start the fine-tuning job
estimator.fit(inputs={"training": train_data_uri}, job_name=JOB_NAME, wait=False)

## Monitor Training Progress

This cell will monitor the training job status and provide updates until completion. The training process may take several hours depending on your dataset size and instance type.

In [None]:
job_status = sagemaker_session.describe_training_job(job_name=JOB_NAME)

continue_check = True

while continue_check:
    job_status = sagemaker_session.describe_training_job(job_name=JOB_NAME)
    print(job_status['TrainingJobStatus'])
    if job_status['TrainingJobStatus'] != 'InProgress':
        print(job_status['TrainingJobStatus'])
        continue_check = False
    time.sleep(10)

In [None]:
job_status['ModelArtifacts']['S3ModelArtifacts']

## Access Model Artifacts

Once training is complete, we can access the model artifacts from S3. These artifacts contain the fine-tuned model that can be deployed for inference.