# 02. The Draft Model

Our draft model is also available in HuggingFace: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct), as well as in SambaStudio.

The `Llama 3.2` collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The `Llama 3.2` instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.

In this notebook we will detail the following points:
1. Download the draft model from Hugging Face in your local directory.
2. Configure the draft model checkpoint.
3. Upload the draft model on SambaStudio. 

## Outline
- [1. Setup](#1-setup)
    - [1.1 Imports](#11-imports)
    - [1.2 Instantiate the SambaStudio client for BYOC](#12-instantiate-the-sambastudio-client-for-byoc)
- [2. Target model: `Llama-3.2-1B-Instruct`](#2-target-model-llama-1b-instruct)
    - [2.1 Download the target model from HuggingFace](#21-download-the-target-model-from-huggingface)
    - [2.2 Configure checkpoint](#22-configure-checkpoint)
    - [2.3 Set and check chat template (optional)](#23-set-and-check-chat-template-optional)
    - [2.4 Set padding token (required for training)](#24-set-padding-token-required-for-training)
    - [2.5 Get model params and Sambastudio suitable Apps](#25-get-model-params-and-sambastudio-suitable-apps)
- [3. Upload the checkpoint to SambaStudio](#3-upload-the-checkpoint-to-sambastudio)
    - [3.1 Using the checkpoint dictionary](#31-using-the-checkpoint-dictionary)
    - [3.2 Using the `draft_checkpoint_config.yaml`](#32-using-the-target_checkpoint_configyaml)

## 1. Setup

In [None]:
# Import libraries
import os
import sys
import re
import json
from pprint import pprint
import time
import yaml
from huggingface_hub import hf_hub_download, HfApi

# Get absolute paths for kit_dir and repo_dir
current_dir = os.getcwd()
kit_dir =  os.path.abspath(os.path.join(current_dir, '..'))
repo_dir = os.path.abspath(os.path.join(kit_dir, '..'))

# Adding directories to the Python module search path
sys.path.append(repo_dir)

# Bring Your Own Checkpoint (BYOC)
from utils.byoc.src.snsdk_byoc_wrapper import BYOC

In [None]:
# Instantiate the BYOC (Bring Your Own Checkpoint) SambaStudio client
byoc = BYOC()

In [None]:
# Load the draft model config
config_draft_yaml = '02_config_draft.yaml'

# Open and load the YAML file into a dictionary
with open(config_draft_yaml, 'r') as file:
    config_draft = yaml.safe_load(file)['model']

pprint(config_draft)

## 2. Target model: `Llama-3.2-1B-Instruct`

### 2.1 Download the target model from HuggingFace
You can use your own fine-tuned models or you can download and use [Huggingface model checkpoints](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending).

In our example we will use an available model in HuggingFace: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct), as well as SambaStudio.

The `Llama 3.2` collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The `Llama 3.2` instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.

In [None]:
# Hugging Face model name
hf_model = config_draft['hf_name']
# Our local target directory
target_dir = os.path.join(kit_dir, 'data', 'models') 

In [None]:
# Create target dir if it does not exist
if not os.path.exists(target_dir):
    os.makedirs(target_dir)

# Download checkpoint to your target directory
repo_files = HfApi().list_repo_files(hf_model)
for file_name in repo_files:
    hf_hub_download(repo_id=hf_model, filename=file_name, cache_dir=target_dir)

In [None]:
# Find the snapshot folder inside your target directory
for root, dirs, files in os.walk(target_dir):
    if "snapshots" in root and hf_model.replace("/", "--") in root:
        checkpoint_folder = os.path.join(root, dirs[0])
        break

print("Checkpoint folder: ", checkpoint_folder)

### 2.2 Configure checkpoint

Some parameters should be provided as a checkpoint dictionary, in order to upload a previously created checkpoint.

In [None]:
# Initialise the checkpoint dictionary
checkpoint = {
    'model_name': config_draft['model_name'],
    'publisher': config_draft['publisher'],
    'description': config_draft['description'],
    'param_count': config_draft['param_count'],
    'checkpoint_path': checkpoint_folder
}

### 2.3 Set and check chat template (optional) 

If you want to use chat templates (roles structures), you need to include or update the existing chat template.

This should be formatted as a Jinja2 String template as in the following `llama` example.

In [None]:
# Jinjia chat template
jinja_chat_template = """ 
{% for message in messages %}
    {% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>' + '\n' + message['content'] | trim + '<|eot_id|>'+'\n' %}
    {% if loop.index0 == 0 %}{% set content = bos_token + content %}
    {% endif %}
    {{content}}
{% endfor %}
{{'<|start_header_id|>assistant<|end_header_id|>'+'\n'}}
"""
# Delete escape characters
jinja_chat_template = re.sub(r"(?<!')\n(?!')", "", jinja_chat_template).strip().replace('  ','')

In [None]:
# Open and update your tokenizer config from the checkpoint path
with open(os.path.join(checkpoint['checkpoint_path'], 'tokenizer_config.json'), 'r+') as file:
    data = json.load(file)
    data['chat_template'] = jinja_chat_template
    file.seek(0)
    file.truncate()
    json.dump(data, file, indent=4)

In [None]:
# Render template when using a roles / chat structure
test_messages = [
    {"role": "system", "content": "This is a system prompt."},
    {"role": "user", "content": "This is a user prompt."},
    {"role": "assistant", "content": "This is a response from the assistant."},
    {"role": "user", "content": "This is an user follow up"}
    ]
byoc.check_chat_templates(test_messages, checkpoint_paths=checkpoint['checkpoint_path'])

### 2.4 Set padding token (required for training)

If `pad_token_id` is not set in your checkpoint configuration yet and you want to do a further fine-tuning over your checkpoint, you need to set `pad_token_id` in your checkpoint `config.json`.

In [None]:
# Adding pad_token_id to checkpoint config
with open(os.path.join(checkpoint['checkpoint_path'], 'config.json'), 'r+') as file:
    data = json.load(file)
    data['pad_token_id'] = None
    file.seek(0)
    file.truncate()
    json.dump(data, file, indent=4)

### 2.5 Get model params and Sambastudio suitable apps

Extra parameters are required to upload your checkpoint, including model architecture, sequence length, and vocabulary size.

These parameters can be extracted from your checkpoint configuration, and included in your checkpoint dictionary parameters.


In [None]:
# Extract all the checkpoint parameters
checkpoint_config_params = byoc.find_config_params(checkpoint_paths=checkpoint['checkpoint_path'])[0]
# Update your checkpoint dictionary
checkpoint.update(checkpoint_config_params)

In order to upload a model checkpoint, you need to set a SambaStudio App.

You can search for suitable apps using the checkpoint parameters, and then select the best match.

In [None]:
# Look for suitable apps in SambaStudio
suitable_apps = byoc.get_suitable_apps(checkpoint)

In [None]:
# We have one suitable app
checkpoint["app_id"] = suitable_apps[0][0]['id']

In [None]:
# We can see here all the parameters required to upload the checkpoint
print(checkpoint)

## 3. Upload the checkpoint to SambaStudio

### 3.1 Using the checkpoint dictionary

In [None]:
# Call the `upload_checkpoint method` from client with your checkpoint parameters (this can take a while)
model_id=byoc.upload_checkpoint(
    model_name=checkpoint['model_name'],
    checkpoint_path=checkpoint['checkpoint_path'],
    description=checkpoint['description'],
    publisher=checkpoint['publisher'],
    param_count=checkpoint['param_count'],
    model_arch=checkpoint['model_arch'],
    seq_length=checkpoint['seq_length'],
    vocab_size=checkpoint['vocab_size'],
    app_id=checkpoint['app_id'],
    retries=3
)

In [None]:
# Check the status of the uploaded checkpoint 
byoc.get_checkpoints_status(model_id)