## Fine Tune for Matt

#### Resources 
- [HF smol course](https://github.com/huggingface/smol-course)
- [https://www.philschmid.de/fine-tune-llms-in-2025](https://www.philschmid.de/fine-tune-llms-in-2025)
- [https://www.philschmid.de/fine-tune-llms-in-2024-with-trl](https://www.philschmid.de/fine-tune-llms-in-2024-with-trl)

### Deps

In [1]:
pip install transformers==4.48.1 datasets==3.2.0 trl==0.13.0 huggingface_hub==0.27.1 peft==0.14.0

Note: you may need to restart the kernel to use updated packages.


### Setup

In [2]:
from huggingface_hub import login
from notebook_utils import get_hf_token

HF_TOKEN = get_hf_token()
login(HF_TOKEN)

MODELS_OUTPUT_DIR = "./models"

### Dataset

In [3]:
DATASET_NAME = "matt_shill_demo"
BUILD_DATASET = False

#### Source Data

In [4]:
messages_data_train = [
    {
        "full_topic": "leadership",
        "messages": [
            {"role": "user", "content": "What leadership experience does Matt have?"},
            {
                "role": "assistant",
                "content": "Matt has extensive leadership experience as a Web Platform Team Lead at COMPANY_NAME, overseeing developers, project pipelines, and team development practices.",
            },
        ],
    },
    {
        "full_topic": "leadership",
        "messages": [
            {"role": "user", "content": "How does Matt improve team efficiency?"},
            {
                "role": "assistant",
                "content": "Matt improves team efficiency by implementing internal tools, automating workflows, and optimizing development practices, such as using Docker and GitLab CI.",
            },
        ],
    },
    {
        "full_topic": "background",
        "messages": [
            {"role": "user", "content": "What are some of Matt's technical skills?"},
            {
                "role": "assistant",
                "content": "Matt is skilled in Svelte, Vue.js, Elasticsearch, OpenSearch, Flask, and machine learning tools like Llama.cpp and SciKit-Learn.",
            },
        ],
    },
    {
        "full_topic": "experience",
        "messages": [
            {
                "role": "user",
                "content": "What web development experience does Matt have?",
            },
            {
                "role": "assistant",
                "content": "Matt has over a decade of experience in full-stack web development, including leading teams at COMPANY_NAME and building custom solutions with Drupal, Symfony, and Svelte.",
            },
        ],
    },
    {
        "full_topic": "experience",
        "messages": [
            {
                "role": "user",
                "content": "What is Matt's experience with search platforms?",
            },
            {
                "role": "assistant",
                "content": "Matt has extensive experience with Elasticsearch and OpenSearch, designing search schemas and optimizing search algorithms for better relevance and user experience.",
            },
        ],
    },
    {
        "full_topic": "experience",
        "messages": [
            {"role": "user", "content": "Has Matt worked in DevOps?"},
            {
                "role": "assistant",
                "content": "Yes, Matt has significant DevOps experience, including using GitLab CI, Docker, and AWS to streamline deployment and infrastructure management.",
            },
        ],
    },
    {
        "full_topic": "achievement",
        "messages": [
            {"role": "user", "content": "What tools has Matt developed?"},
            {
                "role": "assistant",
                "content": "Matt has developed tools like Symphony CLI for environment automation and reporting tools that integrate GitLab, Slack, and ServiceNow APIs.",
            },
        ],
    },
    {
        "full_topic": "achievement",
        "messages": [
            {
                "role": "user",
                "content": "What is an example of Matt's machine learning work?",
            },
            {
                "role": "assistant",
                "content": "Matt implemented a Retrieval-Augmented Generation (RAG) system using SentenceTransformers and OpenSearch to process and generate content efficiently.",
            },
        ],
    },
    {
        "full_topic": "achievement",
        "messages": [
            {"role": "user", "content": "What front-end projects has Matt completed?"},
            {
                "role": "assistant",
                "content": "Matt developed front-end interfaces for COMPANY_NAME intranet search using Svelte, improving accessibility and usability for enterprise users.",
            },
        ],
    },
    {
        "full_topic": "achievement",
        "messages": [
            {
                "role": "user",
                "content": "What is one of Matt's open-source contributions?",
            },
            {
                "role": "assistant",
                "content": "Matt has contributed to open-source projects on GitHub, creating tutorials and tools that help developers integrate machine learning into web applications.",
            },
        ],
    },
    {
        "full_topic": "projects",
        "messages": [
            {
                "role": "user",
                "content": "What kind of projects does Matt work on in his free time?",
            },
            {
                "role": "assistant",
                "content": "In his free time, Matt works on solo projects like APP_NAME, an application for musicians built with Rails and Vue.js, and publishes tutorials for developers on GitHub.",
            },
        ],
    },
    {
        "full_topic": "projects",
        "messages": [
            {
                "role": "user",
                "content": "Can you explain Matt's work with AI-driven applications?",
            },
            {
                "role": "assistant",
                "content": "Matt designed AI-driven applications using Flask and Llama.cpp for summarization and content generation, combined with clustering and embedding techniques.",
            },
        ],
    },
]

messages_data_test = [
    {
        "full_topic": "leadership",
        "messages": [
            {"role": "user", "content": "What leadership roles has Matt taken on?"},
            {
                "role": "assistant",
                "content": "Matt has significant leadership experience, including serving as the Web Platform Team Lead at COMPANY_NAME, managing developers, coordinating projects, and refining team workflows.",
            },
        ],
    },
    {
        "full_topic": "leadership",
        "messages": [
            {"role": "user", "content": "How does Matt enhance team productivity?"},
            {
                "role": "assistant",
                "content": "Matt boosts team productivity by creating internal tools, streamlining workflows, and implementing technologies like Docker and GitLab CI to simplify processes.",
            },
        ],
    },
    {
        "full_topic": "background",
        "messages": [
            {"role": "user", "content": "What technical expertise does Matt have?"},
            {
                "role": "assistant",
                "content": "Matt is experienced in technologies such as Svelte, Vue.js, Elasticsearch, OpenSearch, Flask, and machine learning libraries like SciKit-Learn and Llama.cpp.",
            },
        ],
    },
    {
        "full_topic": "experience",
        "messages": [
            {
                "role": "user",
                "content": "What experience does Matt have in web development?",
            },
            {
                "role": "assistant",
                "content": "Matt has over ten years of full-stack web development experience, including leading teams at COMPANY_NAME and creating custom solutions using frameworks like Drupal, Symfony, and Svelte.",
            },
        ],
    },
    {
        "full_topic": "experience",
        "messages": [
            {
                "role": "user",
                "content": "What has Matt done with search platforms?",
            },
            {
                "role": "assistant",
                "content": "Matt has worked extensively with Elasticsearch and OpenSearch, designing advanced search schemas and fine-tuning algorithms to improve search relevance and user satisfaction.",
            },
        ],
    },
    {
        "full_topic": "experience",
        "messages": [
            {"role": "user", "content": "Does Matt have experience with DevOps?"},
            {
                "role": "assistant",
                "content": "Absolutely. Matt's DevOps experience includes managing deployments and infrastructure with tools like GitLab CI, Docker, and AWS.",
            },
        ],
    },
    {
        "full_topic": "achievement",
        "messages": [
            {"role": "user", "content": "What are some tools Matt has created?"},
            {
                "role": "assistant",
                "content": "Matt has built tools like Symphony CLI for automating environments and reporting systems that connect GitLab, Slack, and ServiceNow APIs.",
            },
        ],
    },
    {
        "full_topic": "achievement",
        "messages": [
            {
                "role": "user",
                "content": "Can you give an example of Matt's machine learning projects?",
            },
            {
                "role": "assistant",
                "content": "Matt developed a Retrieval-Augmented Generation (RAG) system that uses SentenceTransformers and OpenSearch for efficient content processing and generation.",
            },
        ],
    },
    {
        "full_topic": "achievement",
        "messages": [
            {"role": "user", "content": "What front-end work has Matt done?"},
            {
                "role": "assistant",
                "content": "Matt created front-end applications for COMPANY_NAME's intranet search using Svelte, focusing on improving accessibility and user experience.",
            },
        ],
    },
    {
        "full_topic": "achievement",
        "messages": [
            {
                "role": "user",
                "content": "What open-source contributions has Matt made?",
            },
            {
                "role": "assistant",
                "content": "Matt has contributed to various open-source projects on GitHub, including developing tutorials and tools to help developers incorporate machine learning into their applications.",
            },
        ],
    },
    {
        "full_topic": "projects",
        "messages": [
            {
                "role": "user",
                "content": "What personal projects has Matt worked on?",
            },
            {
                "role": "assistant",
                "content": "Matt dedicates his free time to projects like APP_NAME, a Rails and Vue.js app for musicians, and publishes tutorials to guide other developers on GitHub.",
            },
        ],
    },
    {
        "full_topic": "projects",
        "messages": [
            {
                "role": "user",
                "content": "What kind of AI-driven applications has Matt developed?",
            },
            {
                "role": "assistant",
                "content": "Matt has built AI-powered applications using Flask and Llama.cpp for content summarization and generation, leveraging clustering and embedding techniques for better performance.",
            },
        ],
    },
]

messages_dataset = messages_data_train + messages_data_test

In [5]:
messages_dataset

[{'full_topic': 'leadership',
  'messages': [{'role': 'user',
    'content': 'What leadership experience does Matt have?'},
   {'role': 'assistant',
    'content': 'Matt has extensive leadership experience as a Web Platform Team Lead at COMPANY_NAME, overseeing developers, project pipelines, and team development practices.'}]},
 {'full_topic': 'leadership',
  'messages': [{'role': 'user',
    'content': 'How does Matt improve team efficiency?'},
   {'role': 'assistant',
    'content': 'Matt improves team efficiency by implementing internal tools, automating workflows, and optimizing development practices, such as using Docker and GitLab CI.'}]},
 {'full_topic': 'background',
  'messages': [{'role': 'user',
    'content': "What are some of Matt's technical skills?"},
   {'role': 'assistant',
    'content': 'Matt is skilled in Svelte, Vue.js, Elasticsearch, OpenSearch, Flask, and machine learning tools like Llama.cpp and SciKit-Learn.'}]},
 {'full_topic': 'experience',
  'messages': [{'r

#### Dataset Init

In [6]:
from datasets import Dataset, load_dataset

if BUILD_DATASET:
    ds = Dataset.from_list(messages_dataset).train_test_split(test_size=0.5)
    ds.push_to_hub(DATASET_NAME, private=True)
else:
    ds = load_dataset(path=f"matthewhaynesonline/{DATASET_NAME}")

In [7]:
ds

DatasetDict({
    train: Dataset({
        features: ['full_topic', 'messages'],
        num_rows: 12
    })
    test: Dataset({
        features: ['full_topic', 'messages'],
        num_rows: 12
    })
})

### Base Model

In [8]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import SFTConfig, SFTTrainer, setup_chat_format

from notebook_utils import get_torch_device 

device = get_torch_device()

BASE_MODEL_OWNER = "HuggingFaceTB"

# BASE_MODEL_NAME = "SmolLM2-135M"
BASE_MODEL_NAME = "SmolLM2-1.7B"

USE_INSTRUCT_BASE_MODEL = True

if USE_INSTRUCT_BASE_MODEL:
  BASE_MODEL_NAME = f"{BASE_MODEL_NAME}-Instruct"

FULL_BASE_MODEL_NAME = f"{BASE_MODEL_OWNER}/{BASE_MODEL_NAME}"

model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=FULL_BASE_MODEL_NAME
).to(device)

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=FULL_BASE_MODEL_NAME)

if not USE_INSTRUCT_BASE_MODEL:
    model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)

### Finetune Config

In [9]:
FINETUNE_NAME = f"{BASE_MODEL_NAME}-Matt-Shill"
FINETUNE_TAGS = ["smol", "sft", "matthewhaynes"]
FT_MODEL_DIR = f"{MODELS_OUTPUT_DIR}/{FINETUNE_NAME}"

### Trainer Config

In [10]:
sft_config = SFTConfig(
    output_dir=f"{MODELS_OUTPUT_DIR}/sft_output",
    max_steps=1000,  # Adjust based on dataset size and desired training duration
    per_device_train_batch_size=4,  # Set according to your GPU memory capacity
    learning_rate=5e-5,  # Common starting point for fine-tuning
    logging_steps=10,  # Frequency of logging training metrics
    save_steps=100,  # Frequency of saving model checkpoints
    eval_strategy="steps",  # Evaluate the model at regular intervals
    eval_steps=50,  # Frequency of evaluation
    hub_model_id=FINETUNE_NAME,  # Set a unique name for your model
)

trainer = SFTTrainer(
    model=model,
    args=sft_config,
    train_dataset=ds["train"],
    processing_class=tokenizer,
    eval_dataset=ds["test"],
)

### Train

In [11]:
trainer.train()
trainer.save_model(FT_MODEL_DIR)

### Eval

In [15]:
prompt = "Should I hire Matt?"
messages = [{"role": "user", "content": prompt}]

In [16]:
base_model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=FULL_BASE_MODEL_NAME
).to(device)

base_tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=FULL_BASE_MODEL_NAME)

if not USE_INSTRUCT_BASE_MODEL:
    base_model, base_tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)

formatted_prompt = base_tokenizer.apply_chat_template(messages, tokenize=False)

inputs = base_tokenizer(formatted_prompt, return_tensors="pt").to(device)
outputs = base_model.generate(**inputs, max_new_tokens=100)

print("Before training:")
print(base_tokenizer.decode(outputs[0], skip_special_tokens=True))

Before training:
system
You are a helpful AI assistant named SmolLM, trained by Hugging Face
user
Should I hire Matt?
assistant
As an AI, I don't have personal experiences or opinions. However, I can provide you with some general advice based on common practices. Here are a few points to consider:

1. Skills and Experience: Matt's skills and experience in the field you're hiring for should align with what you're looking for in a candidate.

2. Work Ethic: A good candidate should demonstrate a strong work ethic. They should be reliable, punctual, and willing to learn


In [17]:
ft_model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=FT_MODEL_DIR
).to(device)

ft_tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=FT_MODEL_DIR)

formatted_prompt = ft_tokenizer.apply_chat_template(messages, tokenize=False)

inputs = ft_tokenizer(formatted_prompt, return_tensors="pt").to(device)
outputs = ft_model.generate(**inputs, max_new_tokens=100)

print("After training:")
print(ft_tokenizer.decode(outputs[0], skip_special_tokens=True))

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

After training:
system
You are a helpful AI assistant named SmolLM, trained by Hugging Face
user
Should I hire Matt?
assistant
Matt is a highly skilled full-stack developer with over ten years of experience, including leading teams at COMPANY_NAME and creating custom solutions using frameworks like Drupal and Symfony. He has a strong focus on improving team efficiency and implementing internal tools.

Matt's work has been recognized with several awards, including being named a Drupal Hero and receiving a nomination for a prestigious open-source contribution. He is also an active contributor to various open-source


### Publish Model

In [None]:
trainer.push_to_hub(tags=FINETUNE_TAGS)

### GGUF & Quantize

- [DIY converting safetensors format to GGUF on a Mac ](https://www.reddit.com/r/LocalLLaMA/comments/18elm98/diy_converting_safetensors_format_to_gguf_on_a_mac/)
- [Convert and Upload Your GGUF Model to Hugging Face – Step-by-Step Guide](https://gpustack.ai/convert-and-upload-your-gguf-model-to-huggingface-step-by-step-guide/)

#### Clone and Build `llama.cpp`:

**Install cmake to be able to compile llama.cpp**
```bash
brew install cmake
```

**Compile llama.cpp**
```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release
```

**Setup llama.cpp python scripts (have to use python 3.11 as of 2025-01-26)**
```bash
python3.11 -m venv llamacpp_env
source llamacpp_env/bin/activate
python3.11 -m pip install -r requirements.txt
```

#### Convert SafeTensors to GGUF:

Use `convert_hf_to_gguf.py` from `llama.cpp` to convert the model:

```bash
python3.11 ./convert_hf_to_gguf.py /path/to/hf_model --outfile /path/to/hf_model.gguf --outtype f16
```

#### (Optional) Quantize the Model:

```bash
./build/bin/llama-quantize /path/to/hf_model.gguf /path/to/hf_model-q6_k.gguf Q6_K
```