# AKI Seminar2 Demo
## Finetuning of LLaMA




by Syon Kadkade


Table of contents

> [Install packages](#install)   
> [Import libaries](#imports)   
> [Lorem Ipsum]()


**Resources**:
- [Meta AI Paper: LLaMA: Open and Efficient Foundation Language Models](#https://arxiv.org/abs/2302.13971)

In [1]:
!nvidia-smi

Fri Jan 19 10:20:37 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla V100-SXM2-16GB           Off | 00000000:00:04.0 Off |                    0 |
| N/A   37C    P0              26W / 300W |      0MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

--------------
<a id="install"></a>
### Install packages[Emoji]

**Description**:   
lorem ipsum

In [1]:
!pip install accelerate --quiet
!pip install bitsandbytes --quiet
!pip install datasets --quiet
!pip install -q gradio --quiet
!pip install -q git+https://github.com/huggingface/peft.git

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [1]:
!pip install sentencepiece
!pip install accelerate --quiet

Collecting sentencepiece
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentencepiece
Successfully installed sentencepiece-0.1.99
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.9/270.9 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25h

---------------
<a id="imports"></a>
### Import libaries [Emoji]
**Description**:   
Load all necessary libaries.

In [2]:
import numpy as np
import pandas as pd
import torch
import transformers
import gradio as gr
from datasets import load_dataset
from peft import LoraConfig, TaskType
from peft import get_peft_model
from accelerate import Accelerator
from transformers import AutoTokenizer, AutoModelForCausalLM, DataCollatorWithPadding, DataCollatorForSeq2Seq, GenerationConfig, TrainingArguments, Trainer

In [3]:
import warnings
warnings.filterwarnings('ignore')

In [4]:
DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"
DEVICE

'cuda:0'

-----------------
### Load LLaMA-7B-Model[Emoji]

**Description**:  
lorem ipsum dolor sit amet

**Resources**:
- [Tutorial](#https://www.youtube.com/watch?v=t68IV5t5UOA)
- [Hugging Face: Transformer Tutorial](#https://huggingface.co/learn/nlp-course/chapter2/2?fw=pt)
- [Hugging Face: LLaMA-7B-Model](https://huggingface.co/docs/transformers/main/model_doc/llama)
- [Hugging Face: LLaMA weights](https://huggingface.co/luodian/llama-7b-hf)
- [Hugging Face: 7B Weights](#https://huggingface.co/huggyllama/llama-7b)

**Note**: I use a model that has the weights in it and we introduce these into the actual LLaMA model. Normally you have to request the weights from Meta AI by filling out a form. I have filled it out several times but there is no response from them.

----------------
### Load Alpaca dataset
**Rescources**:
- [HuggingFace: datasets tutorial](#https://huggingface.co/docs/datasets/tutorial)
- [HuggingFace: vicgale/alpaca-gpt4 dataset](#https://huggingface.co/datasets/vicgalle/alpaca-gpt4)

----------------
### Prepare Trainer for finetuning
Do it via Trainer API or with own training pipeline

In [5]:
#MODEL_NAME= "TheBloke/Llama-2-7B-GPTQ"
#MODEL_NAME = 'huggyllama/llama-7b'
MODEL_NAME = "Enoch/llama-7b-hf"
#MODEL_NAME = "baffo32/decapoda-research-llama-7B-hf"
#PEFT_MODEL_NAME = "tloen/alpaca-lora-7b"
#PEFT_MODEL_NAME = "dominguesm/alpaca-lora-ptbr-7b"
#PEFT_MODEL_NAME = "Eterna2/alpaca-lora-7b"

In [6]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token_id = (0)
tokenizer.padding_side = "left"

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [7]:
dataset = load_dataset("imdb")
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

In [8]:
def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)

In [9]:
tokenized_imdb = dataset.map(preprocess_function, batched=True)

In [10]:
tokenized_imdb_train = tokenized_imdb["train"].select(range(10))
tokenized_imdb_train

Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 10
})

In [11]:
tokenized_imdb_test = tokenized_imdb["test"].select(range(10))
tokenized_imdb_test

Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 10
})

In [12]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [13]:
id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}

In [14]:
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, num_labels=2, id2label=id2label, label2id=label2id)


Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

Some weights of LlamaForSequenceClassification were not initialized from the model checkpoint at Enoch/llama-7b-hf and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [15]:
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM

)

In [16]:
lora_model = get_peft_model(model=model, peft_config=lora_config)
lora_model.print_trainable_parameters()

trainable params: 4,194,304 || all params: 6,611,546,112 || trainable%: 0.06343907958816639


In [17]:
lora_model.config.use_cache = False
#lora_model.config.quantization_config.to_dict()

In [18]:
#gradient_accumulation_steps = Bacth_size // Micro_Batch_Size
"""
training_args = TrainingArguments(
    per_device_train_batch_size=4,
    gradient_accumulation_steps= (128 // 4),
    warmup_steps=100,
    max_steps=300,
    learning_rate=3e-4,
    logging_steps=10,
    optim="adamw_torch",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    eval_steps=50,
    save_steps=50,
    output_dir="./content/experiments",
    save_total_limit=3,
    load_best_model_at_end=True
)
"""

'\ntraining_args = TrainingArguments(\n    per_device_train_batch_size=4,\n    gradient_accumulation_steps= (128 // 4),\n    warmup_steps=100,\n    max_steps=300,\n    learning_rate=3e-4,\n    logging_steps=10,\n    optim="adamw_torch",\n    evaluation_strategy="epoch",\n    save_strategy="epoch",\n    eval_steps=50,\n    save_steps=50,\n    output_dir="./content/experiments",\n    save_total_limit=3,\n    load_best_model_at_end=True\n)\n'

In [22]:
training_args = TrainingArguments(
    learning_rate=3e-4,
    gradient_accumulation_steps=20,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    num_train_epochs=1,
    fp16=False,
    evaluation_strategy="steps",
    save_strategy="steps",
    optim="adafactor",
    output_dir="./content/experiments",
    load_best_model_at_end=True
)

In [23]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_imdb_train,
    eval_dataset=tokenized_imdb_test,
    data_collator=data_collator
)

In [27]:
torch.cuda.empty_cache()

----------------
### Finetune Model

In [24]:
trainer.train()

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss


TrainOutput(global_step=1, training_loss=0.1618174910545349, metrics={'train_runtime': 7.1749, 'train_samples_per_second': 1.394, 'train_steps_per_second': 0.139, 'total_flos': 132551617486848.0, 'train_loss': 0.1618174910545349, 'epoch': 1.0})

In [25]:
model.save_pretrained("/Content/llama_7b", from_pt=True)

In [29]:
trained_model = AutoModelForSequenceClassification.from_pretrained("/Content/llama_7b")

Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

Some weights of the model checkpoint at /Content/llama_7b were not used when initializing LlamaForSequenceClassification: ['model.layers.5.self_attn.v_proj.lora_B.default.weight', 'model.layers.28.self_attn.v_proj.base_layer.weight', 'model.layers.22.self_attn.q_proj.lora_B.default.weight', 'model.layers.16.self_attn.q_proj.lora_B.default.weight', 'model.layers.3.self_attn.q_proj.lora_B.default.weight', 'model.layers.0.self_attn.v_proj.lora_A.default.weight', 'model.layers.26.self_attn.q_proj.base_layer.weight', 'model.layers.9.self_attn.v_proj.lora_B.default.weight', 'model.layers.14.self_attn.v_proj.lora_A.default.weight', 'model.layers.28.self_attn.q_proj.lora_B.default.weight', 'model.layers.21.self_attn.v_proj.lora_B.default.weight', 'model.layers.4.self_attn.q_proj.lora_B.default.weight', 'model.layers.13.self_attn.v_proj.base_layer.weight', 'model.layers.22.self_attn.q_proj.base_layer.weight', 'model.layers.3.self_attn.v_proj.lora_A.default.weight', 'model.layers.2.self_attn.v_p

----------------
### Example Prompting

In [58]:
text = "This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three."

In [62]:
inputs = tokenizer(text, return_tensors="pt")

In [63]:
with torch.no_grad():
  output = trained_model(**inputs).logits


In [65]:
predicted_class_id = output.argmax().item()
trained_model.config.id2label[predicted_class_id]

'POSITIVE'

-----------
### Implement User Interface via Gradio Libary

In [73]:
def user_interface(message, history):
  inputs = tokenizer(message, return_tensors="pt")
  with torch.no_grad():
    output = trained_model(**inputs).logits

  predicted_class_id = output.argmax().item()
  sentiment = trained_model.config.id2label[predicted_class_id]
  return f'LLaMA-7B-Model says: {sentiment}'



gr.ChatInterface(fn=user_interface).launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://c69d28fefcdb12170d.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


