<a href="https://colab.research.google.com/github/hby1117/class2025Spring/blob/main/08_AI_Finetune.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%%capture
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
!pip install --no-deps unsloth
!pip install datasets

In [None]:
from unsloth import FastLanguageModel
from unsloth.chat_templates import standardize_sharegpt
from datasets import Dataset
from transformers import TextStreamer
import json
import pandas as pd
import torch

# Data


* raw
> In the latest AP dispatch from Frankfurt, which details how the European Union's chief trade negotiator said he had “good calls” with Trump administration officials on Monday, David McHugh noted that the EU "has offered Trump a 'zero for zero' deal in which tariffs would be removed on industrial goods including automobiles, but the U.S. administration has said it will not lower tariffs below a 10% baseline imposed on almost all its trading partners. Trump has also announced tariffs of 25% on steel and automobiles."
That raises the question of where the negotiations, which got back on the rails over the weekend after a call between President Trump and EU Commission President Ursula von der Leyen, will go from here — and what Trump will have to say next time he discusses the $1.8 trillion trade relationship.

* pretrain
	-	one record:
		> In the latest AP dispatch from Frankfurt, which details how the European Union's chief trade negotiator said he had “good calls” with Trump administration officials on Monday, David McHugh noted that the EU "has offered Trump a 'zero for zero' deal in which tariffs would be removed on industrial goods including automobiles, but the U.S. administration has said it will not lower tariffs below a 10% baseline imposed on almost all its trading partners. Trump has also announced tariffs of 25% on steel and automobiles."
		That raises the question of where the negotiations, which got back on the rails over the weekend after a call between President Trump and EU Commission President Ursula von der Leyen, will go from here — and what Trump will have to say next time he discusses the $1.8 trillion trade relationship.

* instructional - alpaca
```
[
	{
		"Instruction": "Task we want the model to perform. 11111",
		"Input": "Optional, but useful, it will essentially be the user's query. 11111",
		"Output": "The expected result of the task and the output of the model 11111."
	},
	{
		"Instruction": "Task we want the model to perform. 22222",
		"Input": "Optional, but useful, it will essentially be the user's query. 22222",
		"Output": "The expected result of the task and the output of the model. 22222"
	},
]
```

* instructional - full fine-tuning | Lora fine-tuning
	- one record:
		```
		"""Below is an instruction that describes a task, paired with an input that provides further context.
		Write a response that appropriately completes the request.

		### Instruction:
		Task we want the model to perform. 11111

		### Input:
		Optional, but useful, it will essentially be the user's query. 11111

		### Response:
		The expected result of the task and the output of the model. 11111"""
		```
	- one record:
		```
		"""Below is an instruction that describes a task, paired with an input that provides further context.
		Write a response that appropriately completes the request.

		### Instruction:
		Task we want the model to perform. 22222

		### Input:
		Optional, but useful, it will essentially be the user's query. 22222

		### Response:
		The expected result of the task and the output of the model. 22222"""
		```

* conversational - shareGPT
```
{
	"conversations": [
		{
			"from": "system",
			"value": "You are a helpful assistant."
		},
		{
			"from": "human",
			"value": "Can you help me make pasta carbonara?"
		},
		{
			"from": "gpt",
			"value": "Would you like the traditional Roman recipe, or a simpler version?"
		},
		{
			"from": "human",
			"value": "The traditional version please"
		},
		{
			"from": "gpt",
			"value": "The authentic Roman carbonara uses just a few ingredients: pasta, guanciale, eggs, Pecorino Romano, and black pepper. Would you like the detailed recipe?"
		}
	]
}
```

* conversational (shareGPT): full fine-tuning | Lora fine-tuning
	- one record:
		```
		<|im_start|>system
		You are a helpful assistant.<|im_end|>
		<|im_start|>user
		Can you help me make pasta carbonara?<|im_end|>
		<|im_start|>assistant
		<think>
		</think>
		Would you like the traditional Roman recipe, or a simpler version?<|im_end|>
		<|im_start|>user
		The traditional version please<|im_end|>
		<|im_start|>assistant
		<think>
		</think>
		The authentic Roman carbonara uses just a few ingredients: pasta, guanciale, eggs, Pecorino Romano, and black pepper. Would you like the detailed recipe?<|im_end|>
		```

* conversational - ChatML
```
{
	"messages": [
		{
			"role": "system",
			"content": "You are a helpful assistant."
		},
		{
			"role": "user",
			"content": "What is 1+1?"
		},
		{
			"role": "assistant",
			"content": "It's 2!"
		},
		{
			"role": "user",
			"content": "What is 2+2?"
		},
		{
			"role": "assistant",
			"content": "It's 4!"
		}
	]
}
```

* conversational (ChatML): full fine-tuning | Lora fine-tuning
	- one record:
		```
		<|im_start|>system
		You are a helpful assistant.<|im_end|>
		<|im_start|>user
		What is 1+1?<|im_end|>
		<|im_start|>assistant
		<think>
		</think>
		It's 2!<|im_end|>
		<|im_start|>user
		What is 2+2?<|im_end|>
		<|im_start|>assistant
		<think>
		</think>
		It's 4!<|im_end|>
		```

In [None]:
with open('class-2025-spring_0530-0.json', 'r') as f:
    loaded_data = json.load(f)

dataset = Dataset.from_list(loaded_data)

## Unsloth

In [None]:
# https://huggingface.co/unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen3-14B",
    max_seq_length = 2048,   # Context length - can be longer, but uses more memory
    load_in_4bit = True,     # 4bit uses much less memory
    full_finetuning = False,
)

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 32,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 32,
    use_gradient_checkpointing = "unsloth",
)

# Data Check


In [None]:
organized_dataset = standardize_sharegpt(dataset)
processed_texts = tokenizer.apply_chat_template(
    organized_dataset["conversations"],
    tokenize = False,
)
df = pd.DataFrame({"text": processed_texts})
processed_dataset = Dataset.from_pandas(df)
processed_dataset = processed_dataset.shuffle(seed = 3407)

# Training


In [None]:
from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = processed_dataset,
    eval_dataset = None, # Can set up evaluation!
    args = SFTConfig(
        dataset_text_field = "text",
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 8,
        warmup_steps = 5,
        max_steps = 30,
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        report_to="none"
    ),
)

In [None]:
trainer_stats = trainer.train()

# Inference

According to the `Qwen-3` team, the recommended settings for reasoning inference are `temperature = 0.6, top_p = 0.95, top_k = 20`
For normal chat based inference, `temperature = 0.7, top_p = 0.8, top_k = 20`

In [None]:
messages = [
    {"role" : "user", "content" : "자유심증주의란 무엇인가요? 어떻게 적용되나요? 헷갈리는데?"},
    # {"role" : "user", "content" : "경복궁이 뭐에요?"}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True, # Must add for generation
    enable_thinking = False, # Disable thinking
)
output = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 256, # Increase for longer outputs!
    temperature = 0.7, top_p = 0.8, top_k = 20, # For non thinking
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

# Saving & loading

In [None]:
model.save_pretrained("my_lora_model")
tokenizer.save_pretrained("my_lora_model")

Loading trained model

In [None]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "my_lora_model",
    max_seq_length = 2048,
    load_in_4bit = True,
)