In [24]:
! pip install transformers accelerate

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [1]:
import torch
from transformers import pipeline


In [2]:
device = "mps" if torch.backends.mps.is_available() else ("cuda:0" if torch.cuda.is_available() else "cpu")
dtype = torch.float16 if device == "mps" else torch.float32

In [3]:
ask_llm = pipeline(
  task="text-generation",
  model="Qwen/Qwen2.5-3B-Instruct",
  device=device,
  torch_dtype=dtype
)

print(ask_llm("Who is Jaymie Xu?")[0]["generated_text"])

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda:0


Who is Jaymie Xu? Jaymie Xu, also known as Jaymie Xue or Jaymie, is a Chinese-American singer-songwriter and musician. She gained significant attention in the music industry for her unique blend of pop, indie, and electronic genres.

Key points about Jaymie Xu:

1. Born on July 27, 1994, in Shanghai, China, she moved to the United States with her family when she was young.
2. She began her musical journey at a young age, learning piano and violin.
3. Jaymie released her debut EP "Shine" in 2016, which gained popularity through various online platforms.
4. Her music often reflects themes of self-discovery, relationships, and personal growth.
5. In addition to singing, Jaymie also plays multiple instruments, including piano, guitar, and drums.
6. She has collaborated with other artists in the music industry, both locally and internationally.
7. Jaymie has performed at various events and festivals, including SXSW (South by Southwest) and Coachella.
8. Her style has been described as eclec

If only I were that successful lol.

In [1]:
# load data 
from datasets import load_dataset

raw_data = load_dataset('json', data_files = "jaymie_xu_resume_train.json")
raw_data

DatasetDict({
    train: Dataset({
        features: ['prompt', 'completion'],
        num_rows: 90
    })
})

In [2]:
raw_data["train"][0]

{'prompt': 'What is Jaymie Xu’s profession?',
 'completion': 'Engine Programmer.'}

As you can see, here we return with the long text, but for fine-tuning we need the data to be small and precise chunks, more like here we apply the tokenization to take the text and split it into smaller chunks. Each chunk is called a token and it the smallest unit of meaning that LLMs work with.

In [3]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct"
)
def preprocess(sample):
    sample = sample['prompt']+ '\n' + sample['completion']
    print(sample)
    tokenized = tokenizer(
        sample,
        max_length = 128,
        truncation = True,
        padding = "max_length"    
    )

    tokenized['labels'] = tokenized['input_ids'].copy()
    return tokenized
data = raw_data.map(preprocess)


In [4]:
print(data['train'])

Dataset({
    features: ['prompt', 'completion', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 90
})


## LoRA

now, let's move into the training

In [3]:
from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM
import torch

In [9]:
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-3B-Instruct",
    device_map = device,
    torch_dtype = torch.float16
)

lora_config = LoraConfig (
    
    task_type = TaskType.CAUSAL_LM, 
    target_modules=['q_proj', "k_proj", "v_proj"]
)
model = get_peft_model(model, lora_config)

`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [10]:
from transformers import TrainingArguments, Trainer


train_args = TrainingArguments(
    num_train_epochs = 10, # we will go throught the dataset from start to finish 10 times
    learning_rate=0.001, 
    logging_steps = 25, # we want to see the result in every 25 steps it runs 
    fp16 = False # float point set to 16 to speed it up, set to "True" if you are on GPU
)

trainer = Trainer(
    args = train_args,
    model = model, 
    train_dataset=data["train"]
)

In [11]:
trainer.train()

Step,Training Loss
25,2.35
50,0.4617
75,0.3367
100,0.2578


TrainOutput(global_step=120, training_loss=0.7455982049306233, metrics={'train_runtime': 39.8666, 'train_samples_per_second': 22.575, 'train_steps_per_second': 3.01, 'total_flos': 1919656289894400.0, 'train_loss': 0.7455982049306233, 'epoch': 10.0})

In [12]:
# save the model
trainer.save_model("./my-qwen")
tokenizer.save_pretrained("./my-qwen")

('./my-qwen/tokenizer_config.json',
 './my-qwen/special_tokens_map.json',
 './my-qwen/chat_template.jinja',
 './my-qwen/vocab.json',
 './my-qwen/merges.txt',
 './my-qwen/added_tokens.json',
 './my-qwen/tokenizer.json')

Now let's test it out

In [4]:
ask_llm = pipeline(
  task="text-generation",
  model="./my-qwen",
  tokenizer='./my-qwen',
  device=device,
  torch_dtype=dtype
)

print(ask_llm("Who is Jaymie Xu?")[0]["generated_text"])

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda:0


Who is Jaymie Xu? An engine programming specialist at Ubisoft.


Well close enough. not anymore now.