# Trianing and saving the model

In [13]:
!nvidia-smi

Tue Nov 28 19:03:19 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   31C    P0    57W / 300W |   4975MiB / 23028MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
!python --version

Python 3.10.1


In [3]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0


## Issues with the project I saw

Since I don't have such a powerful GPU on my private laptop I am using AWS infrastructure in that case. One of the problems is that the AWS EC2 instance is using old AMI images (pre-defined Linux snapshots with installed libraries). In this case, using the old version of CUDA. That's why I needed to install the respective versions of Pytorch and transformers. 
The next command is to uninstall one problematic nvidia_cublas_cu11 driver.

In [4]:
#!yes | pip uninstall nvidia_cublas_cu11

In [5]:
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig
from trl import SFTTrainer
import time

  from .autonotebook import tqdm as notebook_tqdm


In [6]:
hf_link = "NousResearch/Llama-2-7b-chat-hf"
data = "./data/train_data.csv"
new_model = "./model/new_model/"
base_model = "./model/base_model/"

In [7]:
train_data = load_dataset("csv", data_files=data)
train_data = train_data.rename_column("promped_text", "text")

In [8]:
train_data

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 63260
    })
})

In [9]:
train_data["train"][0]

{'text': '<s>[INST] Искам да сготвя нещо хубаво. Имам налични следните продукти: пудра захар, желатин ( пакетче), маргарин, глюкоза, ванилия [/INST] Може да приготвиш Фондан за покритие на торти - II вид: Захарта се пресява и се изсипва в дълбок съд. В средата се прави кладенче. Желатина се накисва в 3 ч.л. хладка вода за 5 минути, след това се разтопява на водна баня. Добавят се маргарина и глюкозата и всичко се изсипва в кладенчето, заедно с ванилията. След това се омесва добре, докато тестото се получи горе-долу като пластелин. Неизползваното количество се прибира в пликче в хладилник.'}

In [10]:
compute_dtype = getattr(torch, "float16")

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=False,
)

In [11]:
# Check GPU compatibility with bfloat16
if compute_dtype == torch.float16:
    major, _ = torch.cuda.get_device_capability()
    if major >= 8:
        print("=" * 80)
        print("Your GPU supports bfloat16: accelerate training with bf16=True")
        print("=" * 80)

Your GPU supports bfloat16: accelerate training with bf16=True


In [12]:
model = AutoModelForCausalLM.from_pretrained(
    hf_link,
    quantization_config=quant_config,
    device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1

config.json: 100%|██████████| 583/583 [00:00<00:00, 3.72MB/s]
model.safetensors.index.json: 100%|██████████| 26.8k/26.8k [00:00<00:00, 60.1MB/s]
model-00001-of-00002.safetensors: 100%|██████████| 9.98G/9.98G [01:01<00:00, 162MB/s]
model-00002-of-00002.safetensors: 100%|██████████| 3.50G/3.50G [00:26<00:00, 131MB/s]
Downloading shards: 100%|██████████| 2/2 [01:28<00:00, 44.27s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [02:02<00:00, 61.28s/it]
generation_config.json: 100%|██████████| 179/179 [00:00<00:00, 869kB/s]


In [14]:
tokenizer = AutoTokenizer.from_pretrained(hf_link)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

tokenizer_config.json: 100%|██████████| 746/746 [00:00<00:00, 3.53MB/s]
tokenizer.model: 100%|██████████| 500k/500k [00:00<00:00, 415MB/s]
tokenizer.json: 100%|██████████| 1.84M/1.84M [00:00<00:00, 5.06MB/s]
added_tokens.json: 100%|██████████| 21.0/21.0 [00:00<00:00, 106kB/s]
special_tokens_map.json: 100%|██████████| 435/435 [00:00<00:00, 3.06MB/s]


In [15]:
#!sudo cp -L -R ~/.cache/huggingface/hub/models--NousResearch--Llama-2-7b-chat-hf/snapshots/37892f30c23786c0d5367d80481fa0d9fba93cf8/* ./model/base_model

In [16]:
peft_params = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

In [23]:
training_params = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=5,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=2000,
    logging_steps=100,
    learning_rate=2e-4,
    weight_decay=0.001,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant"
)

In [24]:
trainer = SFTTrainer(
    model=model,
    train_dataset=train_data["train"],
    peft_config=peft_params,
    dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_params
)

Map: 100%|██████████| 63260/63260 [00:13<00:00, 4674.02 examples/s]


In [25]:
# Train model
trainer.train()

Step,Training Loss
100,1.7969
200,1.3936
300,1.3279
400,1.2495
500,1.2307
600,1.1964
700,1.1805
800,1.151
900,1.1423
1000,1.1363


TrainOutput(global_step=12652, training_loss=0.9402835745243117, metrics={'train_runtime': 23863.6685, 'train_samples_per_second': 2.651, 'train_steps_per_second': 0.53, 'total_flos': 3.649963598481408e+17, 'train_loss': 0.9402835745243117, 'epoch': 1.0})

After more than 6 and a half hours of training time, finaly the model is being saved and ready to use.

In [None]:
!nvidia-smi

In [26]:
trainer.model.save_pretrained(new_model)
trainer.tokenizer.save_pretrained(new_model)

('./model/new_model/tokenizer_config.json',
 './model/new_model/special_tokens_map.json',
 './model/new_model/tokenizer.json')

In [27]:
# Ignore warnings
logging.set_verbosity(logging.CRITICAL)

# Run text generation pipeline with our next model
prompt = "What is a large language model?"
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"[INST] {prompt} [/INST]")
print(result[0]['generated_text'])



[INST] What is a large language model? [/INST]  A large language model is a type of artificial intelligence (AI) model that is trained on a large corpus of text data, typically consisting of billions of words. everybody knows that language models are a type of artificial intelligence that can generate text, but what exactly is a large language model? In this article, we will explore the definition, types, and applications of large language models.

Definition of a Large Language Model:
A large language model is an artificial intelligence model that is trained on a large corpus of text data. The corpus can consist of any type of text data, including books, articles, websites, and social media posts. The model is designed to learn the patterns and structures of language, allowing it to generate text that is similar to the training data.

Types of Large Language Models:
There are several types of large language models, including:

1


In [24]:
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)

In [49]:
def generate(text, pipeline):
  # Ignore warnings

  start_time = time.time()
  logging.set_verbosity(logging.CRITICAL)

  # Run text generation pipeline with our next model
  prompt = text
  result = pipeline(f"[INST] {prompt} [/INST]")
  end_time = time.time()
  execution_time = end_time - start_time
  print("Execution time:",execution_time)
  print(result[0]['generated_text'])
  return result[0]['generated_text']

In [31]:
generate("Какво да си сготвя днес? Имам картофи, месо. подправки.", pipe)

Execution time: -16.46382188796997
[INST] Какво да си сготвя днес? Имам картофи, месо. подправки. [/INST] Начин на приготвяне на Картофи с месо: Картофите се сваряват, обелват и нарязват на филийки. Месото се сварява, нарязва на парчета и се смесва с картофите. Подправя се на вкус. По желание може да се сложи кашкавал. 10 минути в микровълнова печка. 10 минути в микровълнова печка. 10 минути в микровълнова печка. 10 минути в микровълнова печка. 10 минути в микровълнова печка. 1


In [33]:
generate("Можеш ли да говорш на български?", pipe)



Execution time: 17.517425775527954
[INST] Можеш ли да говорш на български? [/INST] Гласаният начин на говорене на български е по-добрен, защото всички звукове се опитват и се разпределят по-добре. everybody can speak Bulgarian in a fun and easy way. 1. Start with the vowels. 2. Learn the sounds of the letters. 3. Practice the sounds of the letters. 4. Learn the words. 5. Practice the words. 6. Learn the grammar. 7. Practice the grammar. 8. Learn the idioms. 9. Practice the idioms. 10. Learn the songs. 11. Practice the songs. 12. Learn the poems. 13. Practice the po


In [34]:
generate("Имам ориз, мляко. какъв десерт да наравя?", pipe)



Execution time: 16.94784426689148
[INST] Имам ориз, мляко. какъв десерт да наравя? [/INST] Може да приготвиш Ориз с мляко: Оризът се сварява в подсолена вода. Отцежда се и се слага в купа. Млякото се загрява и се изсипва върху ориза. Разбърква се добре. По желание може да се сложи канела. По желание може да се сложи и банан. 10-15 минути се оставя да се стегне. Сервира се по желание с бита сметана. 10-15 минути преди сервиране се поръсва с канела. 10-15 минути преди сервира
