**Alpaca-350M Fine-Tuned**
1. Stanford Alpaca's Training Recipe
2. 350M Parameters (Smaller Model)
3. LoRA fine-tuning to run with fewer computational resources and training parameters
4. PEFT (Parameter-Efficient-Fine-Tuning) library from HuggingFace used for fine-tuning

In [None]:
## Building Colaboratory around Eric Wang's recreation of Alpaca using LoRA.
!git clone https://github.com/tloen/alpaca-lora.git
%cd alpaca-lora/

Cloning into 'alpaca-lora'...
remote: Enumerating objects: 607, done.[K
remote: Counting objects: 100% (51/51), done.[K
remote: Compressing objects: 100% (32/32), done.[K
remote: Total 607 (delta 28), reused 33 (delta 19), pack-reused 556[K
Receiving objects: 100% (607/607), 27.78 MiB | 6.03 MiB/s, done.
Resolving deltas: 100% (360/360), done.
/content/alpaca-lora/alpaca-lora


In [None]:
## Installing dependencies
!pip install bitsandbytes
!pip install GPUtil
!pip install -q datasets loralib sentencepiece
!pip install -q git+https://github.com/zphang/transformers@c3dc391
!pip install -q git+https://github.com/huggingface/peft.git
!pip install torch

[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [None]:
## Checking Dataset
from datasets import load_dataset
from transformers import AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("RootYuan/opt-350m-alpaca", add_eos_token=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id

data = load_dataset("json", data_files="alpaca_data.json")


def generate_prompt(instruction, input=None):
    if input:
        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response: """
    else:
        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:"""

data = data.map(lambda data_point: {"prompt": tokenizer(generate_prompt(data_point))})

Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-5c6f749ce3cda98b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-5c6f749ce3cda98b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

Map:   0%|          | 0/52002 [00:00<?, ? examples/s]

Token indices sequence length is longer than the specified maximum sequence length for this model (546 > 512). Running this sequence through the model will result in indexing errors


In [None]:
## Fine-tuning process
import os
import torch
import torch.nn as nn
from datasets import load_dataset
import bitsandbytes as bnb
import transformers
from transformers import LLaMAForCausalLM, LLaMATokenizer, AutoTokenizer, AutoConfig, AutoModelForCausalLM
from peft import get_peft_model, prepare_model_for_int8_training, LoraConfig

MICRO_BATCH_SIZE = 4 # 4 works with a smaller GPU
BATCH_SIZE = 32
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
EPOCHS = 2 # Stanford's Alpaca uses 3
LEARNING_RATE = 2e-5 # Stanford's Alpaca uses 2e-5
CUTOFF_LEN = 256 # Stanford's Alpaca uses 512, but 256 accounts for 96% of the data and runs far quicker
LORA_R = 4
LORA_ALPHA = 16
LORA_DROPOUT = 0.05

model = AutoModelForCausalLM.from_pretrained (
    "RootYuan/opt-350m-alpaca",
    load_in_8bit=True,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained (
    "RootYuan/opt-350m-alpaca", add_eos_token=True
)
model = prepare_model_for_int8_training(model)


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)


Downloading (…)lve/main/config.json:   0%|          | 0.00/749 [00:00<?, ?B/s]



Downloading pytorch_model.bin:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]



In [None]:
config = LoraConfig (
    r=LORA_R,
    lora_alpha=LORA_ALPHA,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=LORA_DROPOUT,
    bias="none",
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, config)
tokenizer.pad_token_id = 0
data = load_dataset("json", data_files="alpaca_data.json")



  0%|          | 0/1 [00:00<?, ?it/s]

In [9]:
def generate_prompt(instruction, input=None):
    if input:
        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response: """
    else:
        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:"""

data = data.shuffle().map(
    lambda data_point: tokenizer(
        generate_prompt(data_point),
        truncation=True,
        max_length=CUTOFF_LEN,
        padding="max_length",
    )
)

trainer = transformers.Trainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=MICRO_BATCH_SIZE,
        gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
        warmup_steps=50,
        num_train_epochs=EPOCHS,
        learning_rate=LEARNING_RATE,
        fp16=True,
        logging_steps=1,
        output_dir="lora-alpaca",
        save_total_limit=3,
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False
trainer.train(resume_from_checkpoint=False)

model.save_pretrained("lora-alpaca")

Map:   0%|          | 0/52002 [00:00<?, ? examples/s]

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
1,3.2618
2,3.0394
3,3.2325
4,3.1498
5,3.2146
6,3.0682
7,3.2146
8,3.012
9,3.1832
10,3.1566




Step,Training Loss
1,3.2618
2,3.0394
3,3.2325
4,3.1498
5,3.2146
6,3.0682
7,3.2146
8,3.012
9,3.1832
10,3.1566




In [None]:
## Push Model to HuggingFace
from huggingface_hub import notebook_login

notebook_login()

#You can edit the code to push the model to your HuggingFace Account
model.push_to_hub("RyanAir/Alpaca-350M-Fine-Tuned", use_auth_token=True)

In [None]:
## Generation Process

from peft import PeftModel
from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig

tokenizer = LLaMATokenizer.from_pretrained("RootYuan/opt-350m-alpaca")

model = LLaMAForCausalLM.from_pretrained(
    "RootYuan/opt-350m-alpaca",
    load_in_8bit=True,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, "RyanAir/Alpaca-350M-Fine-Tuned")

# Prompt can be edited as per requirement
PROMPT = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Write a poem as an Alpaca.
### Response:"""

inputs = tokenizer(
    PROMPT,
    return_tensors="pt",
)
input_ids = inputs["input_ids"].cuda()

generation_config = GenerationConfig(
    temperature=0.6,
    top_p=0.95,
    repetition_penalty=1.15,
)
print("Generating...")
generation_output = model.generate(
    input_ids=input_ids,
    generation_config=generation_config,
    return_dict_in_generate=True,
    output_scores=True,
    max_new_tokens=128,
)
for s in generation_output.sequences:
    print(tokenizer.decode(s))

In [None]:
PROMPT ='''Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Write on the purpose of an Alpaca

### Response:
'''

%%time

inputs = tokenizer(
    PROMPT,
    return_tensors="pt",
)
input_ids = inputs["input_ids"].cuda()

generation_config = GenerationConfig(
    temperature=0.6,
    top_p=0.95,
    repetition_penalty=1.15,
)
print("Generating...")
generation_output = model.generate(
    input_ids=input_ids,
    generation_config=generation_config,
    return_dict_in_generate=True,
    output_scores=True,
    max_new_tokens=128,
)
for s in generation_output.sequences:
    print(tokenizer.decode(s))