# 🚀 Customize and Deploy `meta-llama/Llama-3.2-3B-Instruct` on Amazon SageMaker AI
---
In this notebook, we explore how to use **Llama 3.2-3B-Instruct**, a compact, instruction-tuned multilingual model from Meta’s Llama 3.2 family. You’ll learn how to fine-tune it on your dataset, evaluate its performance, and deploy it at scale with SageMaker.

**What is Llama 3.2-3B-Instruct?**

Meta released Llama 3.2 on **September 25, 2024**, introducing both text-only and vision-capable variants. The **Llama 3.2-3B-Instruct** model is a **3-billion-parameter, instruction-tuned, text-in/text-out model**. It is distributed under the **Llama 3.2 Community License** and designed for dialogue, summarization, retrieval, translation, and other assistant-style tasks.  
🔗 Model card: [meta-llama/Llama-3.2-3B-Instruct on Hugging Face](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)

---

**Key Specifications**

| Feature | Details |
|---|---|
| **Parameters** | ~3.21 billion |
| **Architecture** | Auto-regressive transformer; instruction-tuned (SFT + RLHF-style alignment) |
| **Input / Output** | Text-in / Text-out only (no vision in this variant) |
| **Languages** | English, German, French, Italian, Portuguese, Hindi, Spanish, Thai (plus broader coverage) |
| **Context Length** | Up to **128 K tokens** |
| **Attention** | Grouped Query Attention (GQA) for efficient inference |
| **License** | Llama 3.2 Community License |

---

**Benchmarks & Behavior**

- Strong performance across **dialogue, summarization, multilingual tasks, and instruction-following**.  
- Efficient inference with **Grouped Query Attention**, making it suitable for latency and cost sensitive deployments.  

---

**Using This Notebook**

Here’s what you’ll cover:

* Load a sample dataset from Hugging Face and prepare it for fine-tuning  
* Fine-tune with SageMaker Training Jobs  
* Run Model Evaluation  
* Deploy to SageMaker Endpoints  

---

Let’s begin by pulling the model and running a sample instruction prompt.


In [1]:
%pip install -Uq sagemaker datasets

/home/ubuntu/pranavvm/py312-training/bin/python3: No module named pip
Note: you may need to restart the kernel to use updated packages.


In [1]:
import boto3
import sagemaker

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ubuntu/.config/sagemaker/config.yaml


In [2]:
region = boto3.Session().region_name

from sagemaker.local import LocalSession 
sess = LocalSession() #sagemaker.Session(boto3.Session(region_name=region))
sess.config = {"local": {"local_code": True}}

sagemaker_session_bucket = None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

role = sagemaker.get_execution_role()

In [3]:
print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

sagemaker role arn: arn:aws:iam::811828458885:role/Admin
sagemaker bucket: sagemaker-us-east-1-811828458885
sagemaker session region: us-east-1


## Data Preparation for Supervised Fine-tuning

### [Finance-Instruct-500k](https://huggingface.co/datasets/Josephgflowers/Finance-Instruct-500k)

**Finance-Instruct-500k** is a large-scale dataset with about **518,000 entries** focused on the financial domain. It spans topics such as investments, banking, markets, accounting, and corporate finance, offering a wide variety of instruction–response examples.

**Data Format & Structure**:
- Distributed in **JSON** format, with simple conversion to Parquet.  
- Contains a single `train` split with ~518k records.  
- Each record includes:  
  - `system` – context or metadata for the task  
  - `user` – the financial prompt or query  
  - `assistant` – the corresponding response  

**License**: Released under the **Apache-2.0** license.  

**Applications**:

The dataset can support finance-focused tasks such as:  
- Financial question answering  
- Market and investment analysis  
- Topic and sentiment classification  
- Financial entity extraction and document understanding  

In [4]:
import os
import json
import pprint
from tqdm import tqdm
from datasets import load_dataset

In [5]:
dataset_parent_path = os.path.join(os.getcwd(), "tmp_cache_local_dataset")
os.makedirs(dataset_parent_path, exist_ok=True)

**Preparing Your Dataset in `messages` format**

This section walks you through creating a conversation-style dataset—the required `messages` format—for directly training LLMs using SageMaker AI.

**What Is the `messages` Format?**

The `messages` format structures instances as chat-like exchanges, wrapping each conversation turn into a role-labeled JSON array. It’s widely used by frameworks like TRL.

Example entry:

```json
{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "How do I bake sourdough?" },
    { "role": "assistant", "content": "First, you need to create a starter by..." }
  ]
}


In [7]:
dataset_name = "Josephgflowers/Finance-Instruct-500k"
dataset = load_dataset(dataset_name, split="train[:1000]")

In [8]:
pprint.pp(dataset[0])

{'system': '\n',
 'user': 'Explain tradeoffs between fiscal and monetary policy as tools in a '
         "nation's economic toolkit. Provide examples of past instances when "
         'each were utilized, the economic conditions that led to them being '
         'deployed, their intended effects, and an evaluation of their '
         'relative efficacy and consequences.',
 'assistant': 'Fiscal and monetary policy are the two main tools that '
              'governments have to influence economic activity. They each have '
              'benefits and drawbacks.\n'
              '\n'
              'Fiscal policy refers to government spending and taxation '
              'decisions. Examples of fiscal policy include:\n'
              '\n'
              '• During the Great Recession, the U.S. government implemented a '
              'fiscal stimulus through the American Recovery and Reinvestment '
              'Act of 2009. This included increased spending on '
              'infrastructu

In [9]:
print(f"total number of fine-tunable samples: {len(dataset)}")

total number of fine-tunable samples: 1000


In [10]:
def convert_to_messages(row):
    system_content = "You are a financial reasoning assistant. Read the user’s query, restate the key data, and solve step by step. Show calculations clearly, explain any rounding or adjustments, and present the final answer in a concise and professional manner."
    user_content = row["user"]
    assistant_content = row["assistant"]

    return {
        "messages": [
            { "role": "system", "content": system_content},
            { "role": "user", "content": user_content },
            { "role": "assistant", "content": assistant_content }
        ]
    }
    
    
dataset = dataset.map(convert_to_messages, remove_columns=dataset.column_names)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [11]:
dataset_filename = os.path.join(dataset_parent_path, f"{dataset_name.replace('/', '--').replace('.', '-')}.jsonl")
dataset.to_json(dataset_filename, lines=True)

Creating json from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

2188126

#### Upload file to S3

In [12]:
from sagemaker.s3 import S3Uploader

In [13]:
data_s3_uri = f"s3://{sess.default_bucket()}/dataset"

uploaded_s3_uri = S3Uploader.upload(
    local_path=dataset_filename,
    desired_s3_uri=data_s3_uri
)
print(f"Uploaded {dataset_filename} to > {uploaded_s3_uri}")

Uploaded /home/ubuntu/pranavvm/git-stage/amazon-sagemaker-generativeai/3_distributed_training/sm_huggingface_oss_recipes/supervised_finetuning/tmp_cache_local_dataset/Josephgflowers--Finance-Instruct-500k.jsonl to > s3://sagemaker-us-east-1-811828458885/dataset/Josephgflowers--Finance-Instruct-500k.jsonl


## Fine-Tune LLMs using SageMaker `Estimator`/`ModelTrainer`

In [14]:
import time
from sagemaker.pytorch import PyTorch
from sagemaker.huggingface import HuggingFace
from getpass import getpass
import yaml
from jinja2 import Template

In [15]:
hf_token = getpass()

 ········


### Training using `PyTorch` Estimator

**Training Using `PyTorch` Estimator**
Leverages the official PyTorch SageMaker container to run a custom training script using the Accelerate and DeepSpeed libraries. This option is ideal for users who want full control over the training pipeline 

---
**Observability**: SageMaker AI has [SageMaker MLflow](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow.html) which enables you to accelerate generative AI by making it easier to track experiments and monitor performance of models and AI applications using a single tool.

You can choose to include MLflow as a part of your training workflow to track your model fine-tuning metrics in realtime by simply specifying a **mlflow** tracking arn.

Optionally you can also report to : **tensorboard**, **wandb**.

In [16]:
MLFLOW_TRACKING_SERVER_ARN = None # or "arn:aws:sagemaker:us-west-2:<account-id>:mlflow-tracking-server/<server-name>"

if MLFLOW_TRACKING_SERVER_ARN:
    reports_to = "mlflow"
else:
    reports_to = "tensorboard"

In [17]:
job_name = 'meta-llama--Llama-3.2-3B-Instruct'
training_instance_type = "local_gpu"

In [18]:
if MLFLOW_TRACKING_SERVER_ARN:
    training_env = {
        "MLFLOW_EXPERIMENT_NAME": f"exp-{job_name}",
        "MLFLOW_TAGS": '{"source.job": "sm-training-jobs", "source.type": "sft", "source.framework": "pytorch"}',
        "HF_TOKEN": hf_token,
        "MLFLOW_TRACKING_URI": MLFLOW_TRACKING_SERVER_ARN,
    }
else:
    training_env = {
        "HF_TOKEN": hf_token
    }

In [19]:
pytorch_image_uri = f"763104351884.dkr.ecr.{region}.amazonaws.com/pytorch-training:2.8.0-gpu-py312-cu129-ubuntu22.04-sagemaker"
print(f"Using image: {pytorch_image_uri}")

Using image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.8.0-gpu-py312-cu129-ubuntu22.04-sagemaker


#### Training strategy: `PeFT/LoRA`

In [20]:
pytorch_estimator = PyTorch(
    image_uri=pytorch_image_uri,
    entry_point="sm_accelerate_train.sh", # Adapted bash script to train using accelerate on SageMaker - Multi-GPU
    source_dir="sagemaker_code",
    instance_type=training_instance_type,
    instance_count=1,
    base_job_name=f"{job_name}-pytorch",
    role=role,
    volume_size=300,
    py_version="py312",
    keep_alive_period_in_seconds=3600,
    environment=training_env,
    sagemaker_session=sess,
    hyperparameters={
        "config": "hf_recipes/meta-llama/Llama-3.2-3B-Instruct--vanilla-peft-qlora.yaml"
    }
)

# fit or train
pytorch_estimator.fit(
    {"training": uploaded_s3_uri}, 
    wait=False
)

INFO:sagemaker:Creating training-job with name: meta-llama--Llama-3.2-3B-Instruct-pytor-2025-09-23-03-37-34-719
INFO:sagemaker.telemetry.telemetry_logging:SageMaker Python SDK will collect telemetry to help us better understand our user's needs, diagnose issues, and deliver additional features.
To opt out of telemetry, please disable via TelemetryOptOut parameter in SDK defaults config. For more information, refer to https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk.
INFO:sagemaker.local.image:'Docker Compose' found using Docker CLI.
INFO:sagemaker.local.local_session:Starting training job
INFO:sagemaker.local.image:docker compose file: 
networks:
  sagemaker-local:
    name: sagemaker-local
services:
  algo-1-ffzvh:
    command: train
    container_name: njbn96vdmt-algo-1-ffzvh
    deploy:
      resources:
        reservations:
          devices:
          - capabilities:
            - gpu
            count: all
    e

Container njbn96vdmt-algo-1-ffzvh  Creating
Container njbn96vdmt-algo-1-ffzvh  Created
Attaching to njbn96vdmt-algo-1-ffzvh


In [None]:
s3_model_data_uri = pytorch_estimator.model_data
print(f"Fine-tuned model location: {s3_model_data_uri}")

#### Training strategy: `Spectrum`

In [22]:
pytorch_estimator = PyTorch(
    image_uri=pytorch_image_uri,
    entry_point="sm_accelerate_train.sh", # Adapted bash script to train using accelerate on SageMaker - Multi-GPU
    source_dir="sagemaker_code",
    instance_type=training_instance_type,
    instance_count=1,
    base_job_name=f"{job_name}-pytorch",
    role=role,
    volume_size=300,
    py_version="py312",
    keep_alive_period_in_seconds=3600,
    environment=training_env,
    sagemaker_session=sess,
    hyperparameters={
        "config": "hf_recipes/meta-llama/Llama-3.2-3B-Instruct--vanilla-spectrum.yaml"
    }
)

# fit or train
pytorch_estimator.fit(
    {"training": uploaded_s3_uri}, 
    wait=False
)

INFO:sagemaker.telemetry.telemetry_logging:SageMaker Python SDK will collect telemetry to help us better understand our user's needs, diagnose issues, and deliver additional features.
To opt out of telemetry, please disable via TelemetryOptOut parameter in SDK defaults config. For more information, refer to https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk.
INFO:sagemaker:Creating training-job with name: meta-llama--Llama-3.2-3B-Instruct-pytor-2025-09-23-03-54-31-380
INFO:sagemaker.telemetry.telemetry_logging:SageMaker Python SDK will collect telemetry to help us better understand our user's needs, diagnose issues, and deliver additional features.
To opt out of telemetry, please disable via TelemetryOptOut parameter in SDK defaults config. For more information, refer to https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk.
INFO:sagemaker.local.image:'Dock

Container f6fopzt5f4-algo-1-d1uzf  Creating
Container f6fopzt5f4-algo-1-d1uzf  Created
Attaching to f6fopzt5f4-algo-1-d1uzf


In [None]:
s3_model_data_uri = pytorch_estimator.model_data
print(f"Fine-tuned model location: {s3_model_data_uri}")

#### Training strategy: `Full-Finetuning`

In [23]:
pytorch_estimator = PyTorch(
    image_uri=pytorch_image_uri,
    entry_point="sm_accelerate_train.sh", # Adapted bash script to train using accelerate on SageMaker - Multi-GPU
    source_dir="sagemaker_code",
    instance_type=training_instance_type,
    instance_count=1,
    base_job_name=f"{job_name}-pytorch",
    role=role,
    volume_size=300,
    py_version="py312",
    keep_alive_period_in_seconds=3600,
    environment=training_env,
    sagemaker_session=sess,
     hyperparameters={
        "config": "hf_recipes/meta-llama/Llama-3.2-3B-Instruct--vanilla-full.yaml"
    }
)

# fit or train
pytorch_estimator.fit(
    {"training": uploaded_s3_uri}, 
    wait=False
)

INFO:sagemaker.telemetry.telemetry_logging:SageMaker Python SDK will collect telemetry to help us better understand our user's needs, diagnose issues, and deliver additional features.
To opt out of telemetry, please disable via TelemetryOptOut parameter in SDK defaults config. For more information, refer to https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk.
INFO:sagemaker:Creating training-job with name: meta-llama--Llama-3.2-3B-Instruct-pytor-2025-09-23-03-55-23-084
INFO:sagemaker.telemetry.telemetry_logging:SageMaker Python SDK will collect telemetry to help us better understand our user's needs, diagnose issues, and deliver additional features.
To opt out of telemetry, please disable via TelemetryOptOut parameter in SDK defaults config. For more information, refer to https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk.
INFO:sagemaker.local.image:'Dock

Container b12mti3zk8-algo-1-9yklz  Creating
Container b12mti3zk8-algo-1-9yklz  Created
Attaching to b12mti3zk8-algo-1-9yklz
b12mti3zk8-algo-1-9yklz exited with code 137
Aborting on container exit...
Container b12mti3zk8-algo-1-9yklz  Stopping
Container b12mti3zk8-algo-1-9yklz  Stopped
time="2025-09-23T04:00:43Z" level=error msg=137


ERROR:sagemaker:Please check the troubleshooting guide for common errors: https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-python-sdk-troubleshooting.html#sagemaker-python-sdk-troubleshooting-create-training-job


In [None]:
s3_model_data_uri = pytorch_estimator.model_data
print(f"Fine-tuned model location: {s3_model_data_uri}")