<a href="https://colab.research.google.com/github/pavlikbond/domain-name-generator/blob/main/domain_variant_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Completion finetuning using unsloth

This notebook makes use of unsloth to finetune a model for a completion task.
In this example we will finetune the llama 3.2 base model to generate ascii art. I would recommend using the unsloth library compared to just using the huggingface library as it requires less memory and is faster.

Adapted from unsloth notebooks, if something is broken check on:
https://unsloth.ai/

In [1]:
%%capture
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3  peft trl triton
!pip install --no-deps cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
!pip install --no-deps unsloth

### Load base model

In [2]:
from unsloth import FastLanguageModel
import torch
from google.colab import userdata


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="meta-llama/Llama-3.2-3B",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
    token=userdata.get('HF_TOKEN')
)

ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.
ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.6.8: Fast Llama patching. Transformers: 4.52.4.
   \\   /|    NVIDIA L4. Num GPUs = 1. Max memory: 22.161 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/2.35G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/230 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/459 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

In [3]:
print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.1f} MB")

Memory footprint: 2327.7 MB


### Add lora to base model and patch with Unsloth

In [4]:
# More info about parameters: https://huggingface.co/docs/peft/v0.11.0/en/package_reference/lora#peft.LoraConfig
target_modules =  ["q_proj", "k_proj", "v_proj", "o_proj"]
                   #,"gate_proj", "up_proj", "down_proj"]

# When adding special tokens
train_embeddings = False

if train_embeddings:
  target_modules = target_modules + ["lm_head"]

model = FastLanguageModel.get_peft_model(
    model,
    r = 4, # rank of lora matrices according to paper not much loss when set relatively low
    target_modules = target_modules,  # On which modules of the llm the lora weights are used
    lora_alpha = 8, # scales the weights of the adapters (more influence on base model), 16 was recommended on reddit
    lora_dropout = 0.05, # Default on 0.05 in tutorial but unsloth says 0 is better
    bias = "none",    # "none" is optimized
    use_gradient_checkpointing = "unsloth", #"unsloth" for very long context, decreases vram
    random_state = 3407,
    use_rslora = False,  # scales lora_alpha with 1/sqrt(r), huggingface says this works better
    loftq_config = None, # And LoftQ
)

Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.
Unsloth 2025.6.8 patched 28 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


In [25]:
import pandas as pd
from datasets import Dataset
LOCAL_TRAINING_CSV = "training_data.csv"

def format_llama_prompt(row, for_training=True):
  instructions =f"Generate 3-5 creative and memorable domains {row['business_description']}"
  output = row.get('output', '') # Use .get() to safely handle missing 'output' for inference

  prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n" \
            f"{instructions}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" \
            f"### Output: "

  if for_training:
      # For training, include the actual output
      prompt += f"{output}<|eot_id|>"
  else:
      # For inference, just provide the prompt for the model to complete
      # We don't add <|eot_id|> here as the model is expected to generate the completion
      pass

  return prompt


print(f"Loading local training data from: {LOCAL_TRAINING_CSV}")
try:
    df = pd.read_csv(LOCAL_TRAINING_CSV)
    testing_df = df.sample(frac=0.1, random_state=42)  # 20% for testing
    training_df = df.drop(testing_df.index)  # Remaining 80% for training

    df['text'] = df.apply(format_llama_prompt, axis=1)
    training_dataset = Dataset.from_pandas(df[['text']]) # Create Hugging Face Dataset from DataFrame

    # testing_df['text'] = testing_df.apply(lambda row: format_llama_prompt(row, for_training=False), axis=1)
    # testing_dataset = Dataset.from_pandas(testing_df[['text']])
    print("\nExample formatted text for training:")
    print(training_dataset[0]['text'])

except FileNotFoundError:
    print(f"Error: {LOCAL_TRAINING_CSV} not found. Please ensure the file is in the same directory as main.py.")
    exit()

Loading local training data from: training_data.csv

Example formatted text for training:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Generate 3-5 creative and memorable domains an artisanal coffee roastery focusing on single-origin beans.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

### Output: 1. PureOriginRoast.com 2. BeanCrafted.co 3. SummitBrewers.net 4. EchoingBeans.coffee 5. SingleSourceSip.store<|eot_id|>


In [7]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = training_dataset,
    dataset_text_field = "text",
    max_seq_length = 2048,
    dataset_num_proc = 2,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4, # process 4 batches before updating parameters (parameter update == step)
        num_train_epochs = 2, # between 1 - 3 to prevent overfitting
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none"
    ),
)

Unsloth: Tokenizing ["text"]:   0%|          | 0/142 [00:00<?, ? examples/s]

In [8]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 142 | Num Epochs = 2 | Total steps = 36
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 2,293,760/3,000,000,000 (0.08% trained)


Step,Training Loss
1,5.2243
2,5.1697
3,5.0955
4,4.6938
5,4.789
6,4.6799
7,4.477
8,4.3479
9,4.2529
10,4.1462


### inference

In [26]:
from transformers import TextStreamer
import re
from evaluate_response import DomainResponseEvaluator
def extract_domains(text_output):
    # Find the index of "Output:"
    output_start_index = text_output.find("Output:")

    # If "Output:" is found, take the substring after it; otherwise, use the whole text
    if output_start_index != -1:
        text_to_process = text_output[output_start_index + len("Output:"):]
    else:
        text_to_process = text_output # If "Output:" isn't found, process the entire text
    domain_pattern = re.compile(r'\d+\.\s*([a-zA-Z0-9-]+\.[a-zA-Z]{2,})')

    # Find all matches in the processed text
    domains = domain_pattern.findall(text_to_process)

    return domains or 'No domains were found'

def get_model_prediction(test_row):
    # Format the prompt for inference
    prompt = format_llama_prompt(test_row, for_training=False)

    # Tokenize input
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    # Generate response
    outputs = model.generate(**inputs, max_new_tokens=50, num_return_sequences=1)

    # Decode response
    response = tokenizer.decode(outputs[0])

    # Extract domains
    domains = extract_domains(response)

    return response, domains

evaluator = DomainResponseEvaluator()
total_confidence = 0.0
successful_evaluations = 0

for row in testing_df.iterrows():
  response, domains = get_model_prediction(row[1])
  print(f"Domains: {domains}")
  if domains == 'No domains were found':
    continue
  evaluation_results = evaluator.evaluate_domains(row[1].get('business_description'), domains)
  if evaluation_results:
      # Calculate average confidence for this test case
      case_confidence = sum(result.get('confidence', 0) for result in evaluation_results) / len(evaluation_results)
      total_confidence += case_confidence
      successful_evaluations += 1

      print(f"  Average confidence: {case_confidence:.3f}")

      # Show top domain for this case
      best_domain = max(evaluation_results, key=lambda x: x.get('confidence', 0))
      print(f"  Best domain: {best_domain['domain']} (confidence: {best_domain.get('confidence', 0):.3f})")
  else:
      print("  Evaluation failed")

if successful_evaluations > 0:
    average_confidence = total_confidence / successful_evaluations
    print(f"\n{'='*60}")
    print(f"TESTING COMPLETE")
    print(f"Successful evaluations: {successful_evaluations}/{len(testing_df)}")
    print(f"Average confidence score: {average_confidence:.3f}")
    print(f"{'='*60}")
else:
    print(f"\n{'='*60}")
    print(f"TESTING COMPLETE")
    print(f"No successful evaluations")
    print(f"{'='*60}")

sk-1CPM2ShRyVgL79rmqyIzWb8rqSRIMS4DRTYD6ArvKDT3BlbkFJMxZqGvVlTcfsINuv0o3rqPEKZnUk220fa_z6qbFuUA
Domains: ['PainCareMD.com', 'PainReliefClinic.net', 'PainFreeClinic.co', 'PainManagementClinic.org', 'HealYourPainCenter.us', 'PainFreeCenter.health']
  Average confidence: 0.760
  Best domain: PainCareMD.com (confidence: 0.800)
Domains: ['CodePathBootcamp.io', 'WebDevCampCode.com', 'FullStackBootcamp.online', 'WebDevAcademy.tech', 'CodeCampBootcamp.us']
  Average confidence: 0.744
  Best domain: CodePathBootcamp.io (confidence: 0.800)
Domains: ['GreenVegDelivery.com', 'PlantBasedMealz.co', 'VeggiesDelivered.net', 'VeganMealBox.us', 'GreenLeafGroceries.io', 'GreenVegDelivery.com']
  Average confidence: 0.750
  Best domain: GreenVegDelivery.com (confidence: 0.800)
Domains: ['GreenSavvyConsulting.com', 'EcoBizSolutions.co', 'SustainableStrategies.co', 'GreenStepConsulting.net', 'EarthwiseAdvisors.org']
  Average confidence: 0.764
  Best domain: GreenSavvyConsulting.com (confidence: 0.800)
Doma

## Saving

### Save lora adapter

This is both useful for inference and if you want to load the model again

In [28]:
model.push_to_hub(
    "pashko-bond/Llama-3.2-3B-domains-iteration-1",
    tokenizer,
    token = userdata.get('HF_TOKEN')
)

README.md:   0%|          | 0.00/598 [00:00<?, ?B/s]

Uploading...:   0%|          | 0.00/9.20M [00:00<?, ?B/s]

Saved model to https://huggingface.co/pashko-bond/Llama-3.2-3B-domains-iteration-1
