<a href="https://colab.research.google.com/github/sudhang/css-nlp/blob/master/llama/LLaMa2_7B_QLORA_Generate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Make it pretty
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

In this notebook, we will use LLaMa2 from meta, which was released very recently.  We have obtained access from meta, and will fine-tune it using QLORA.  We will do 4-bit quantization, enabling this 7Billion Parameter model to be trained on a free Google Colab

We rely a lot on the google colab notebooks and the tutorials provided by huggingface:  https://huggingface.co/blog/4bit-transformers-bitsandbytes

Apart form that, we used a number of tutorial blogs and even youtube videos:



1.   [Fine-tuning Alpaca and LLaMA: Training on a Custom Dataset](https://www.mlexpert.io/machine-learning/tutorials/alpaca-fine-tuning#user-content-fn-6)
2.   [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685.pdf)
3.   [How to Fine-Tune Open-Source LLMs Locally Using QLoRA!](https://youtu.be/2bkrL2ZcOiM)
4.   [QLORA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/pdf/2305.14314.pdf)
5. [How to generate text: using different decoding methods for language generation with Transformers](https://huggingface.co/blog/how-to-generate)
6. [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes)







### Installations

In [2]:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets
!pip install rouge

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


## FLAGS and PARAMS

In [28]:
GDRIVEPATH = "/content/drive/MyDrive/TU/Sem 4/NLP"

In [4]:
DEBUG = False
NUM_TO_GEN = 10

## Imports

To use the llama2 models from huggingface, we need to input an access token.

In [5]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [6]:
import pandas as pd

import torch
import transformers
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM
from transformers import BitsAndBytesConfig       # For quantization
from peft import prepare_model_for_kbit_training

from peft import LoraConfig                       # For LORA
from peft import get_peft_model

from datasets import Dataset, load_dataset, DatasetDict

## Load a previous model

In [7]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
import nltk
nltk.download('punkt')

adapter_model_id = "llama2_cssnlp"
peft_model_id = f"sudhangshankar/{adapter_model_id}"

config = PeftConfig.from_pretrained(peft_model_id)
the_base_model = config.base_model_name_or_path

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [8]:
config

PeftConfig(peft_type='LORA', auto_mapping=None, base_model_name_or_path='meta-llama/Llama-2-7b-hf', revision=None, task_type='CAUSAL_LM', inference_mode=True)

In [9]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,         # nested quantization to preserve memory
    bnb_4bit_quant_type="nf4",              # NF4 gives higher precision than FP4
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
                the_base_model,
                return_dict=True,
                quantization_config=bnb_config,
                device_map='auto'
              )
tokenizer = AutoTokenizer.from_pretrained(the_base_model)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Generation


In [10]:
if DEBUG:
  # Generate a prompt
  prompt = 'Greek Coast Guard vessels on Saturday evacuated hundreds of tourists and locals trapped in seaside villages on Rhodes that were threatened by five-day-old wildfires, moving them to safer parts of the island.'

  device = "cuda:0"
  inputs = tokenizer(prompt, return_tensors="pt").to(device)

  model.config.use_cache = True
  outputs = model.generate(**inputs,
                  # Use sampling instead of greedy decoding
                  do_sample=True,
                  # Keep only top 50 token with
                  # the highest probability
                  top_k=50,
                  # Maximum sequence length
                  max_length=300,             # TODO: Max token length for LLaMa2 is 4096
                  # Keep only the most probable tokens
                  # with cumulative probability of 95%
                  top_p=0.95,
                  # Changes randomness of generated sequences
                  temperature=0.7,
                  repetition_penalty=1.2,  # Corrected here
                  # Number of sequences to generate
                  num_return_sequences=1)


  for i, sample_output in enumerate(outputs):
      print(f"{i}: {tokenizer.decode(sample_output, skip_special_tokens=True)}\n\n")

In [11]:
def count_sentences(text_list):
    total_sentences = 0
    for text in text_list:
        sentences = nltk.sent_tokenize(text)
        total_sentences += len(sentences)
    return total_sentences

# Example usage:
text_list = [
    "This is the first sentence. This is the second sentence.",
    "This is another sentence."
  ]
print(count_sentences(text_list))  # Output: 3


3


In [12]:
def generate_news_article(prompt="Graz, Austria - ", min_sentences = 50):

  device = "cuda:0"

  gen_text_snippets = [prompt]
  count_gen_sentences = count_sentences(gen_text_snippets)

  while count_gen_sentences < min_sentences:
    last_gen_snippet = gen_text_snippets[-1].rstrip('. ')
                                                # rstrip('. ') to trick it into
                                                # thinking the sentence isn't
                                                # over so that it doesn't decide
                                                # to go on a tangent

    inputs = tokenizer(last_gen_snippet, return_tensors="pt").to(device)

    outputs = model.generate(**inputs,
                    # Use sampling instead of greedy decoding
                    do_sample=True,
                    # Keep only top 50 token with
                    # the highest probability
                    top_k=50,
                    # Maximum sequence length
                    max_length=1000,             # TODO: Max token length for LLaMa2 is 4096
                    # Keep only the most probable tokens
                    # with cumulative probability of 95%
                    top_p=0.95,
                    temperature=0.4,        # Low temperature, since we have such a large sequence being generated
                    repetition_penalty=1.2,  # Corrected here
                    # Number of sequences to generate
                    num_return_sequences=1)

    last_gen_snippet_length = len(tokenizer.encode(last_gen_snippet))
    gen_text = tokenizer.decode(
        outputs[0][last_gen_snippet_length:],
        skip_special_tokens=True
      )
    gen_text_snippets.append(gen_text)
    count_gen_sentences = count_sentences(gen_text_snippets)
    if DEBUG:
      print(f"{gen_text=}\n{count_gen_sentences=}====\n")

  gen_text = " ".join(gen_text_snippets)

  return gen_text



In [13]:
if DEBUG:
  the_prompt = "NEW DELHI - Thousands of people were evacuated from their homes "
  article = generate_news_article(prompt = the_prompt, min_sentences=51)
  display(article)
  print("\n\n")

In [14]:
# Load the csv file
df = pd.read_csv(f'{GDRIVEPATH}/data/nyt_test.csv')

# Initialize a new dataframe
new_df = pd.DataFrame(columns=['Original Article', 'Prompt', 'Generated Article'])

for i in range(NUM_TO_GEN):
    random_article = df['content'].sample(1).values[0]

    sentences = nltk.sent_tokenize(random_article)
    # Use the first two sentences of the real article as the prompt
    prompt = ' '.join(sentences[:2])

    generated_article = generate_news_article(prompt=prompt, min_sentences=51)

    current_df = pd.DataFrame({
        'Original Article': [random_article],
        'Prompt': [prompt],
        'Generated Article': [generated_article]
    })

    # Append the current dataframe to the new dataframe
    new_df = pd.concat([new_df, current_df], ignore_index=True)

# Post-processing to remove incomplete sentences
new_df['Generated Article'] = new_df['Generated Article'].apply(lambda text:
                                      ' '.join(nltk.sent_tokenize(text)[:-1])
                                      if not text.endswith(('.', '!', '?'))
                                      else text
                                    )




In [16]:
new_df

Unnamed: 0,Original Article,Prompt,Generated Article
0,WASHINGTON — After decades of maintaining a mi...,WASHINGTON — After decades of maintaining a mi...,WASHINGTON — After decades of maintaining a mi...
1,"KABUL, Afghanistan — Soon, American Embassy em...","KABUL, Afghanistan — Soon, American Embassy em...","KABUL, Afghanistan — Soon, American Embassy em..."
2,MEXICO CITY — Mexico’s most prominent human ri...,MEXICO CITY — Mexico’s most prominent human ri...,MEXICO CITY — Mexico’s most prominent human ri...
3,"WASHINGTON — Tom Price, the health and human s...","WASHINGTON — Tom Price, the health and human s...","WASHINGTON — Tom Price, the health and human s..."
4,"• Bruno Mars swept the top categories, winning...","• Bruno Mars swept the top categories, winning...","• Bruno Mars swept the top categories, winning..."
5,A jury in San Diego on Thursday rejected claim...,A jury in San Diego on Thursday rejected claim...,A jury in San Diego on Thursday rejected claim...
6,WASHINGTON — A Senate security officer stepped...,WASHINGTON — A Senate security officer stepped...,WASHINGTON — A Senate security officer stepped...
7,"WASHINGTON — In private, President Trump somet...","WASHINGTON — In private, President Trump somet...","WASHINGTON — In private, President Trump somet..."
8,Wells Fargo’s board said on Monday that it wou...,Wells Fargo’s board said on Monday that it wou...,Wells Fargo’s board said on Monday that it wou...
9,"KHOGYANI, Afghanistan — When the American mili...","KHOGYANI, Afghanistan — When the American mili...","KHOGYANI, Afghanistan — When the American mili..."


In [27]:
# Save the new dataframe to a csv file
new_df.to_csv(f'{GDRIVEPATH}/generated/llama2qlora_nyt.csv', index=False)

In [21]:
new_df

Unnamed: 0,Original Article,Prompt,Generated Article
0,WASHINGTON — After decades of maintaining a mi...,WASHINGTON — After decades of maintaining a mi...,WASHINGTON — After decades of maintaining a mi...
1,"KABUL, Afghanistan — Soon, American Embassy em...","KABUL, Afghanistan — Soon, American Embassy em...","KABUL, Afghanistan — Soon, American Embassy em..."
2,MEXICO CITY — Mexico’s most prominent human ri...,MEXICO CITY — Mexico’s most prominent human ri...,MEXICO CITY — Mexico’s most prominent human ri...
3,"WASHINGTON — Tom Price, the health and human s...","WASHINGTON — Tom Price, the health and human s...","WASHINGTON — Tom Price, the health and human s..."
4,"• Bruno Mars swept the top categories, winning...","• Bruno Mars swept the top categories, winning...","• Bruno Mars swept the top categories, winning..."
5,A jury in San Diego on Thursday rejected claim...,A jury in San Diego on Thursday rejected claim...,A jury in San Diego on Thursday rejected claim...
6,WASHINGTON — A Senate security officer stepped...,WASHINGTON — A Senate security officer stepped...,WASHINGTON — A Senate security officer stepped...
7,"WASHINGTON — In private, President Trump somet...","WASHINGTON — In private, President Trump somet...","WASHINGTON — In private, President Trump somet..."
8,Wells Fargo’s board said on Monday that it wou...,Wells Fargo’s board said on Monday that it wou...,Wells Fargo’s board said on Monday that it wou...
9,"KHOGYANI, Afghanistan — When the American mili...","KHOGYANI, Afghanistan — When the American mili...","KHOGYANI, Afghanistan — When the American mili..."


In [25]:
new_df.loc[4,"Generated Article"]

'• Bruno Mars swept the top categories, winning album, record and song of the year. • Our writers and critics weigh in on the best and worst moments of the Grammys. . • Read more about this year’s Grammy Awards here. The Grammys were a night for old-school music to shine — but not always as expected. In an awards show that seemed at times like it was trying too hard to be “relevant,” many of its biggest winners had little or no connection with contemporary pop culture. And when they did connect, it wasn’t necessarily how you might have predicted: For example, Adele won three trophies Sunday evening (including one for her new single, “Hello”), while Beyoncé went home empty handed after being nominated six times. Still, there were plenty of memorable performances by artists who are very much part of today’s musical landscape. One such artist is Rihanna, whose performance of “Kiss It Better” from her latest No. 1 hit album, “Anti,” showed off both her vocal prowess and her sartorial style