<h1 align="center">Efficient Lamma Training 101 (Part 3): A llama that obeys your instruction.</h2>

---
<div align="center">

![](https://img.shields.io/badge/build-passing-green.svg)
![](https://img.shields.io/badge/transformers-4.28.0-green.svg)
![](https://img.shields.io/badge/version-1.1-blue.svg)
![](https://img.shields.io/badge/python-%203.8%20|%203.9-blue.svg)

</div>

This notebook demonstrates Efficient Llama tuning using Lora and Meta-learning techniques.

#### Limitations:
1. The model was not trained on math calculations or summarization tasks, resulting in suboptimal performance in these areas. Instead, we incorporated title generation tasks during meta-training.
2. If performance is unsatisfactory, consider prompt engineering, such as utilizing ChatGPT to refine your prompts for better results.
3. This model is designed for zero-shot classification of marketing-related data; retraining may be necessary for optimal performance in your specific domain.

### 1. Setting Up Environment
#### 1.1 Package installation

In [1]:
# open this only if you need to install these packages

# !pip install bitsandbytes datasets loralib sentencepiece tqdm

# need the latest transformer to make Llama work (4.28.0 dev)
# !pip install git+https://github.com/huggingface/transformers.git
# If this not working for you, try the original contributer's repo (https://github.com/huggingface/transformers/pull/21955)

# for load efficient fine-tunning param
# !pip install git+https://github.com/huggingface/peft.git

# for pytorch, choose with caution
# gpu version, use this if you have gpu and cuda ready on your computer
# !pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
# cpu version
# !pip install torch==1.13.1+cpu torchvision==0.14.1+cpu torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cpu

#### 1.2 Import packages

In [2]:
import torch
from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig
from peft import PeftModel
from tqdm.notebook import tqdm

# setup device
device = "cuda:0" if torch.cuda.is_available() else "cpu"

# setup tqdm        
# tqdm.pandas()

# if sentencepiece raise error, try to run it on cpu or a linux machine

  from .autonotebook import tqdm as notebook_tqdm



Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda117.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)


#### 1.3 Model location

In [3]:
# path to base model, leave as blank if you want to download from HF
llama_model_path = ""
if not llama_model_path:
    llama_model_path = "decapoda-research/llama-7b-hf"

# the path to the auxiliary model
efficient_llama_model_path = ""

if not efficient_llama_model_path:
    raise Exception("Please input your auxiliary model")

### 2. Load model and Setup template

In [4]:
# load llama model
llama_tokenizer = LLaMATokenizer.from_pretrained(llama_model_path)

if device != "cpu":
    # load the weights into gpu
    # load base quantized model
    llama_model = LLaMAForCausalLM.from_pretrained(llama_model_path, load_in_8bit=True, torch_dtype=torch.float16, device_map="auto")
    # load fine-tuned weights
    llama_model = PeftModel.from_pretrained(llama_model, efficient_llama_model_path, torch_dtype=torch.float16)
else:
    # set the weights into cpu
    device_map = {"": device}
    # load base model
    # if working on cpu then we want to shrink the memory usage
    llama_model = LLaMAForCausalLM.from_pretrained(llama_model_path, device_map=device_map, low_cpu_mem_usage=True)
    # load fine-tuned weights
    llama_model = PeftModel.from_pretrained(llama_model, efficient_llama_model_path, device_map=device_map)

Loading checkpoint shards: 100%|██████████| 33/33 [00:19<00:00,  1.71it/s]


#### 2.2 Human-instruction / Self-instruction template

In this section, I will demonstrate how instructions are defined, which is crucial for composing your own instructions.

- General question instructions:
    Pose questions directly without any additional input.
    ```yaml
    Example without input:
        instruction: What is the capital of France?
        input: 
        output: The capital of France is Paris.
    ```

In [5]:
instruction_only_template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{_instruction}

### Response:
"""

- Task-specific instruction and input:
    For task-oriented questions, specify the task in the instruction and provide a sample in the input.
    ```yaml
    Sample with input:
        instruction: Classify the following into animals, plants, and minerals
        input: Oak tree, copper ore, elephant
        output: Oak tree: Plant\n Copper ore: Mineral\n Elephant: Animal\n
    ```


In [6]:
instruction_and_input_template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{_instruction}

### Input:
{_input}

### Response:
"""

In [7]:
def get_model_input(_instruction:str, _input:str) -> str:
    """This function gernerates a template to feed into LLM
    input:
        _instruction: string
        _input: string
    
    return: 
        string
    """
    if _input:
        return instruction_and_input_template.format(_instruction=_instruction, _input=_input)
    else:
        return instruction_only_template.format(_instruction=_instruction)

#### 2.3 Model Run Setup

In [8]:
def run(_instruction:str, 
        _input:str=None, 
        temperature:float=0.1, 
        top_p:float=0.75, 
        num_beams:int=4,
        max_len:int=256
) -> str:
    """This function runs the model and return the output
    input:
        _instruction: string
        _input: string

        temperature: The value used to module the next token probabilities.
        top_p: If set to float < 1, only the most probable tokens with probabilities that add up to ``top_p`` or higher are kept for generation.
        num_beams: Number of beams for beam search. 1 means no beam search.
        max_len: max generation length
        
    return: 
        string
    """
    # 
    model_config = GenerationConfig(
        temperature=temperature,
        top_p=top_p,
        num_beams=num_beams
    )
    
    # get the instruction input
    model_input = get_model_input(_instruction, _input)
    
    # tokenized input
    model_input = llama_tokenizer(model_input, return_tensors="pt")
    model_input_ids = model_input["input_ids"].to(device)

    # infer only, do not compute gradient
    with torch.no_grad():
        model_output = llama_model.generate(
            input_ids=model_input_ids,
            generation_config=model_config,
            max_new_tokens=max_len,
            return_dict_in_generate=True,
            output_scores=True
        )
    
    return llama_tokenizer.decode(model_output.sequences[0])

### 3. Run Model - Zero Shot!

In [12]:
# understand domain specific words.
_instruction = "We car dealership. Generate a marketing email to our customer and give them a 20 off for new year. Also promot our   an."
_input = ""

# disambiguous
# _instruction = "Explain how 'mac' is used differently in these sentences?"
# _input = "['I love big mac.', 'My mac is broken']"

# Classification
# _instruction = "Categorize the given sentence into the following categories.: Finance, Romantic, Retail, Food, and None of the above. Assign multiple categories if needed."
# _input = "What if we go to Macy's and grab some lunch at Chick-fil-A?"

# Extract entities
# _instruction = "Should I target or consider the user who sent the following message as my audience for promoting our new laptop product? Explain why."
# _input = "I just got my salary. I'll just save it for future usage."
# _input = 'My macbook has just broken.'

# Extract entities
# _instruction = "What is the life stage of the user who sent the following messages? Explain why. Life Stages: in college, married, have a baby, new house."
# _input = "We need get the car seats for her."
# _input = 'I am a little bit nevers about going to Umass this summar.'

print(run(_instruction, _input))



Instruction:
We car dealership. Generate a marketing email to our customer and give them a 20 off for new year. Also promot our BHPH loan plan.

Input:
None



Response:
Dear Valued Customer,

Happy New Year! We hope you had a wonderful holiday season and are looking forward to a prosperous 2021. 

To celebrate the new year, we're offering 20% off on all new car purchases. We're also promoting our Buy Here, Pay Here (BHPH) loan plan, which allows you to pay for your car in affordable monthly installments. 

If you're interested in learning more about our BHPH loan plan or taking advantage of our 20% off offer, please don't hesitate to contact us. We look forward to hearing from you soon.

Sincerely,
[Your Name]

