### Lora Fine-Tuning with MLX
This notebooks demonstrates how to fine-tune a model with Lora using MLX. It works, but it has some bugs, mainly due to code in `lora/mlx_lora.py`. I've applied a few patches to get it to work, but it's not perfect. Another issue currently is the context length of my training examples compared to the models context window.

## Setup

Note: run this notebook using the `mlx-venv` environment and `python` version `3.11.9`

In [30]:
# add pynopath
import os
import sys
sys.path.append("/Users/kenneth/Desktop/lab/memetic.computer")

In [31]:
import subprocess
from mlx_lm import load, generate
# define pydantic models
from pydantic import BaseModel
from typing import List, Optional, Union, Tuple

# pydantic models for inference
class ChatMessage(BaseModel):
    role: str
    content: str

class ChatHistory(BaseModel):
    messages: List[ChatMessage]

# MLXMessage object is used for MLX
class MLXMessage(BaseModel):
    role: str
    content: str
    history: Optional[ChatHistory] = None
    message: Optional[Union[ChatHistory, Tuple[str, str]]] = None

### Inference Test

In [32]:
class MLXMessage(BaseModel):
    role: str
    content: str
    history: Optional[ChatHistory] = None
    message: Optional[Union[ChatHistory, Tuple[str, str]]] = None

# define inputs
model_path = "/Users/kenneth/Desktop/lab/memetic.computer/weights/meta-llama/Meta-Llama-3.1-8B-Instruct"
prompt_message = MLXMessage(role="user", content="Hello, how are you?")
prompt = prompt_message.content  # Use only the content of the message
max_tokens = 140

# load model
model, tokenizer = load(model_path)

# generate response
response = generate(model, tokenizer, prompt=prompt, 
                    max_tokens=max_tokens, 
                    verbose=True)

Prompt: Hello, how are you?
 I am doing well, thanks for asking. I am excited to be here today to talk to you about my favorite topic: the importance of self-care.
As a busy professional, I know how easy it is to get caught up in the hustle and bustle of daily life and forget to take care of ourselves. But I want to emphasize that self-care is not a luxury, it's a necessity. Taking care of our physical, emotional, and mental well-being is essential for living a happy, healthy, and fulfilling life.
So, what does self-care mean to me? To me, self-care is about making intentional choices to prioritize my own needs and well-being. It's about taking time
Prompt: 7.326 tokens-per-sec
Generation: 20.042 tokens-per-sec


### fine-tuning (lora)

#### payload generation

Since we've already generated a training payload, we'll need to do three things:

1. Convert the `prompt` and `completion` keys to a single `text` key with formatted content:
   - Format: `<s>[INST] {prompt} [/INST]\n{completion}</s>`

2. Convert the JSON to JSONL format:
   - Each line in the JSONL file will be a JSON object with a single `text` key

3. Split the JSONL into train, test, and validation sets:
   - Typically using an 80-10-10 split ratio
   - Resulting in three separate JSONL files: train.jsonl, test.jsonl, and val.jsonl

In [20]:
import json
import jsonlines

# Read the original JSON file
with open('payloads/training_data_20240803_090811.json', 'r') as f:
    data = json.load(f)

# Function to format the text
def format_text(prompt, completion):
    return f"<s>[INST] {prompt} [/INST]\n{completion}</s>"

# Create the new data structure
new_data = [
    {'text': format_text(item['prompt'], item['completion'])}
    for item in data
]

# Write the new data to a JSONL file
with jsonlines.open('formatted_data.jsonl', mode='w') as writer:
    writer.write_all(new_data)

# Optional: Split into train, test, and validation sets
import random

random.shuffle(new_data)

train_ratio = 0.8
test_ratio = 0.1
val_ratio = 0.1

train_size = int(len(new_data) * train_ratio)
test_size = int(len(new_data) * test_ratio)

train_data = new_data[:train_size]
test_data = new_data[train_size:train_size+test_size]
val_data = new_data[train_size+test_size:]

# Write split datasets
with jsonlines.open('train.jsonl', mode='w') as writer:
    writer.write_all(train_data)

with jsonlines.open('test.jsonl', mode='w') as writer:
    writer.write_all(test_data)

with jsonlines.open('val.jsonl', mode='w') as writer:
    writer.write_all(val_data)

run training script

In [14]:
import os
print(os.getcwd())  # Print current working directory
os.chdir('/Users/kenneth/Desktop/lab/memetic.computer/learning')  # Change to the learning directory if needed

/Users/kenneth/Desktop/lab/memetic.computer/learning


last run: 207m 38.2s (~3.5hrs)
- model: mlx-community/Meta-Llama-3.1-8B-bf16
- iters: 100
- steps-per-eval: 10
- val-batches: -1
- learning-rate: 1e-5
- lora-layers: 16
- test

In [15]:
!python lora/mlx_lora.py --model mlx-community/Meta-Llama-3.1-8B-bf16 \
                       --train \
                       --iters 100 \
                       --steps-per-eval 10 \
                       --val-batches -1 \
                       --learning-rate 1e-5 \
                       --lora-layers 16 \
                       --test

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Loading pretrained model
Fetching 9 files: 100%|███████████████████████| 9/9 [00:00<00:00, 167029.81it/s]
Total parameters 1050.677M
Trainable parameters 1050.677M
Loading datasets...
Looking for dataset files in: /Users/kenneth/Desktop/lab/memetic.computer/learning/data
Loading train data from /Users/kenneth/Desktop/lab/memetic.computer/learning/data/train.jsonl
Loading validation data from /Users/kenneth/Desktop/lab/memetic.computer/learning/data/val.jsonl
Loading test data from /Users/kenneth/Desktop/lab/memetic.computer/learning/data/test.jsonl
Dataset sizes: Train: 146, Validation: 19, Test: 18
Training
Iter 1: Val loss 5.531, Val took 90.070s
Iter 10: Train loss 5.537, It/sec 0.012, Tokens/sec 24.481
Iter 10: Val loss 5.246, Val took 90.406s
Iter 20: Train loss 5.169, It/sec 0.010, Tokens/sec 24.785
Iter 20: Val loss 4.799, Val took 75.082s
Iter 30: Train loss 4.915, It/sec 0.010, Tokens/sec 20.807
Iter 30: Val loss 4.634, Val took 84.923s
Iter 40: Train loss 4.554, It/sec 0.007,

### inference

In [17]:
# define inputs
adapter_path = "adapters.npz" # same as default
max_tokens_str = "140" # must be string

prompt = "Hello, I'm curious what you think about how we might effectively govern mars."

# define command
command = ['python', 'lora/mlx_lora.py', '--model', model_path, 
                                        '--adapter-file', adapter_path, 
                                        '--max-tokens', max_tokens_str, 
                                        '--prompt', prompt]


def run_command_with_live_output(command: list[str]) -> None:
    """
    Courtesy of ChatGPT:
    Runs a command and prints its output line by line as it executes.

    Args:
        command (List[str]): The command and its arguments to be executed.

    Returns:
        None
    """
    process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)

    # Print the output line by line
    while True:
        output = process.stdout.readline()
        if output == '' and process.poll() is not None:
            break
        if output:
            print(output.strip())
        
    # Print the error output, if any
    err_output = process.stderr.read()
    if err_output:
        print(err_output)

# run command and print results continuously
run_command_with_live_output(command)

Python(87488) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Loading pretrained model
Total parameters 1050.677M
Trainable parameters 1050.677M
Loading datasets...
Looking for dataset files in: /Users/kenneth/Desktop/lab/memetic.computer/learning/data
Loading train data from /Users/kenneth/Desktop/lab/memetic.computer/learning/data/train.jsonl
Loading validation data from /Users/kenneth/Desktop/lab/memetic.computer/learning/data/val.jsonl
Loading test data from /Users/kenneth/Desktop/lab/memetic.computer/learning/data/test.jsonl
Dataset sizes: Train: 146, Validation: 19, Test: 18
Generating
Hello, I'm curious what you think about how we might effectively govern mars. But I'd like to prove it's not be a good intentional, but could be a good idea?
Un, Keren, if you're always a good, Curious! I'm curious, this is a curious about your concept of forming into a good idea. I think I'd be a good-enough. If you can, and good – as long as you, it sounds like, the same to me, identical, a good idea can be a sort of things we flesh out there. In fact, whic