In [None]:
! pip3 install transformers,torch  

In [2]:
import torch
from transformers import pipeline


  from .autonotebook import tqdm as notebook_tqdm


In [3]:
device = "mps" if torch.backends.mps.is_available() else ("cuda:0" if torch.cuda.is_available() else "cpu")
dtype = torch.float16 if device == "mps" else torch.float32

In [3]:
ask_llm = pipeline(
  task="text-generation",
  model="Qwen/Qwen2.5-3B-Instruct",
  device=device,
  torch_dtype=dtype
)

print(ask_llm("Who is Scott Lai?")[0]["generated_text"])

`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|██████████| 2/2 [00:24<00:00, 12.27s/it]
Device set to use cuda:0


Who is Scott Lai? He is a Chinese American actor, director and singer. He is best known for his role as "Terry" in the Disney Channel Original Movie "The Last Run" (2014). He has also appeared in other Disney Channel Original Movies, including "Journey to the West: The Last Dragon" (2013), "A Boy Named Charlie Brown" (2013) and "The Princess and the Frog" (2012).
Scott Lai was born on July 19, 1996, in New York City, United States. He is of Chinese descent and grew up in California. He began his acting career at a young age and has appeared in various stage productions, commercials, and music videos.
In addition to acting, Scott Lai is also a singer. He released his first single, "Love Story," in 2012, which gained popularity among his fans. He has since released several other singles and has been featured in various music videos.
As an actor, Scott Lai has been praised for his performances in Disney Channel Original Movies. He has received critical acclaim for his portrayal of Terry i

As you can see here, the model has no idea who I am from above response.

Let's cook it!

First, let's teach the model who I am. Here you can use your personal data to generate the exact format you will use for fine-turning base on your own data. You can use ChatGPT for this, just ask it to transfer your resume into the trainable json format with "prompt" and "completion"

In [12]:
# load data 
from datasets import load_dataset

raw_data = load_dataset('json', data_files = "scott_lai_resume_train.json")
raw_data

DatasetDict({
    train: Dataset({
        features: ['prompt', 'completion'],
        num_rows: 122
    })
})

In [None]:
raw_data["train"][0]
raw_data["train"][1]

{'prompt': 'What is Scott Lai’s profession?',
 'completion': 'AI Engineer and Data Scientist.'}

As you can see, here we return with the long text, but for fine-tuning we need the data to be small and precise chunks, more like here we apply the tokenization to take the text and split it into smaller chunks. Each chunk is called a token and it the smallest unit of meaning that LLMs work with.

In [14]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct"
)
def preprocess(sample):
    sample = sample['prompt']+ '\n' + sample['completion']
    print(sample)
    tokenized = tokenizer(
        sample,
        max_length = 128,
        truncation = True,
        padding = "max_length"    
    )

    tokenized['labels'] = tokenized['input_ids'].copy()
    return tokenized
data = raw_data.map(preprocess)


In [15]:
print(data['train'])

Dataset({
    features: ['prompt', 'completion', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 122
})


## LoRA

now, let's move into the training

In [16]:
from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM
import torch

In [17]:
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct",
    device_map = device,
    torch_dtype = torch.float16
)

lora_config = LoraConfig (
    
    task_type = TaskType.CAUSAL_LM, 
    target_modules=['q_proj', "k_proj", "v_proj"]
)
model = get_peft_model(model, lora_config)

Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.43it/s]


In [18]:
from transformers import TrainingArguments, Trainer


train_args = TrainingArguments(
    num_train_epochs = 10, # we will go throught the dataset from start to finish 10 times
    learning_rate=0.001, 
    logging_steps = 25, # we want to see the result in every 25 steps it runs 
    fp16 = False # float point set to 16 to speed it up, set to "True" if you are on GPU
)

trainer = Trainer(
    args = train_args,
    model = model, 
    train_dataset=data["train"]
)

The model is already on multiple devices. Skipping the move to device specified in `args`.


In [19]:
trainer.train()

Step,Training Loss
25,2.3544
50,0.3788
75,0.2372
100,0.1902
125,0.1604
150,0.1274


TrainOutput(global_step=160, training_loss=0.5464043714106083, metrics={'train_runtime': 53.5074, 'train_samples_per_second': 22.801, 'train_steps_per_second': 2.99, 'total_flos': 2602200748523520.0, 'train_loss': 0.5464043714106083, 'epoch': 10.0})

In [20]:
# save the model
trainer.save_model("./my-qwen")
tokenizer.save_pretrained("./my-qwen")

('./my-qwen/tokenizer_config.json',
 './my-qwen/special_tokens_map.json',
 './my-qwen/chat_template.jinja',
 './my-qwen/vocab.json',
 './my-qwen/merges.txt',
 './my-qwen/added_tokens.json',
 './my-qwen/tokenizer.json')

Now let's test it out

In [1]:
import torch
from transformers import pipeline

device = "mps" if torch.backends.mps.is_available() else ("cuda:0" if torch.cuda.is_available() else "cpu")
dtype = torch.float16 if device == "mps" else torch.float32

ask_llm = pipeline(
  task="text-generation",
  model="./my-qwen",
  tokenizer='./my-qwen',
  device=device,
  torch_dtype=dtype
)

print(ask_llm("What type of workflows is Scott Lai experienced in building?")[0]["generated_text"])

  from .autonotebook import tqdm as notebook_tqdm
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|██████████| 2/2 [00:13<00:00,  6.73s/it]
Device set to use cuda:0


What type of workflows is Scott Lai experienced in building? End-to-end ML pipelines, data engineering, and automation.


In [3]:
ask_llm = pipeline(
  task="text-generation",
  model="./my-qwen",
  tokenizer='./my-qwen',
  device=device,
  torch_dtype=dtype
)

print(ask_llm("How many years of experience does Scott Lai have in generative AI and LLM solutions?")[0]["generated_text"])

`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|██████████| 2/2 [00:14<00:00,  7.11s/it]
Device set to use cuda:0


How many years of experience does Scott Lai have in generative AI and LLM solutions? Over 5 years.
