<a href="https://colab.research.google.com/github/finardi/WatSpeed_LLM_foundation/blob/main/Solution_Module_1_Using_BloomZ_7b_for_sentence_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -q datasets transformers accelerate bitsandbytes

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.6/474.6 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.0/7.0 MB[0m [31m60.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m215.3/215.3 kB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.3/104.3 MB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.5/224.5 kB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m787.9 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m23.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.5/212.5 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━

# Downloading Dataset

The SST-2 dataset, or the Stanford Sentiment Treebank, is popular for sentiment analysis tasks in Natural Language Processing (NLP). It consists of movie reviews from the Rotten Tomatoes website that are labeled with either a positive or negative sentiment. The dataset contains 10,662 sentence-level movie reviews, with approximately half of the reviews labeled as positive and the other half labeled as negative. The reviews are also relatively evenly distributed in length, with a median length of 18 tokens.

The SST-2 dataset has become a benchmark dataset for sentiment analysis in NLP, and many researchers use it to evaluate the performance of their models. The dataset's popularity is partly due to its high-quality labels and the task's relative simplicity, making it an accessible starting point for researchers and developers new to NLP.

In this example, we're using the **`datasets`** library to download and load the validation set of the dataset.

In [40]:
import torch

import pandas as pd
pd.set_option('display.max_rows', 200)
pd.set_option('max_colwidth', 400)

from tqdm.auto import tqdm

from transformers import AutoTokenizer, AutoModelForCausalLM
from datasets import load_dataset
from datasets import load_metric

if torch.cuda.is_available(): 
    device = 'cuda'  
else: 
    device ='cpu'

MANUAL_SEED = 2711

def deterministic(rep=True, manual_seed=MANUAL_SEED):
    if rep:
        torch.manual_seed(manual_seed)
        if torch.cuda.is_available():
            torch.cuda.manual_seed(manual_seed)
            torch.cuda.manual_seed_all(manual_seed)
        torch.backends.cudnn.enabled = False 
        torch.backends.cudnn.benchmark = False
        torch.backends.cudnn.deterministic = True
        print(f'Experimento deterministico, seed: {manual_seed}')
        if device == 'cuda':
            print(f'Existe {torch.cuda.device_count()} GPU\
            {torch.cuda.get_device_name(0)} disponível.')
    else:
        print('Experimento randomico')
deterministic()    

Experimento deterministico, seed: 2711
Existe 1 GPU            Tesla T4 disponível.


In [9]:
test_dataset = load_dataset('glue', 'sst2', split='validation')
test_dataset



Dataset({
    features: ['sentence', 'label', 'idx'],
    num_rows: 872
})

In [28]:
shots = [
    {"text":"hide new secretions from the parental units ","label":"negative"},
    {"text":"contains no wit , only labored gags ", "label":"negative"},
    {"text":"that loves its characters and communicates something rather beautiful about human nature ","label":"positive"}
]

def generate_static_prompt(data_text, num_fewshot=3):

    if num_fewshot > 3:
        return "Fewshot error must be bellow than 3"
    
    elif num_fewshot==0:
        labeled_examples = ""
    
    else:
        fewshotex = shots[:num_fewshot]

        labeled_examples = "Given the examples below, answer after ##label: the sentence as positive or negative:\n"
        for i, doc_shot in enumerate(fewshotex):
            labeled_examples += f"Text {i+1}:" + " " + doc_shot['text'] + " " + f"##label: {doc_shot['label']}\n"
        labeled_examples += f'Text {len(fewshotex) + 1}: '
      
        example = data_text + "##label:"
    
    return labeled_examples + example

three_shot_inputs = []
for text, label in zip(test_dataset['sentence'], test_dataset['label']):
    three_shot_inputs.append(generate_static_prompt(text, num_fewshot=3))

print(three_shot_inputs[1])

Given the examples below, answer after ##label: the sentence as positive or negative:
Text 1: hide new secretions from the parental units  ##label: negative
Text 2: contains no wit , only labored gags  ##label: negative
Text 3: that loves its characters and communicates something rather beautiful about human nature  ##label: positive
Text 4: unflinchingly bleak and desperate ##label:


In [29]:
model_bloom = 'bigscience/bloomz-7b1-mt'
tokenizer_bloom = AutoTokenizer.from_pretrained(model_bloom)

# load bloom in 8bits with bitsandbytes
BLOOM_model = AutoModelForCausalLM.from_pretrained(model_bloom, device_map="auto", load_in_8bit=True)

Downloading (…)okenizer_config.json:   0%|          | 0.00/199 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/747 [00:00<?, ?B/s]



Downloading pytorch_model.bin:   0%|          | 0.00/14.1G [00:00<?, ?B/s]


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)


# Run

In [None]:
deterministic() 

trues, preds = [], []
loop = tqdm(three_shot_inputs, leave=True)

for ix, batch in enumerate(loop):
    inputs = tokenizer_bloom.encode(three_shot_inputs[ix], return_tensors="pt", max_length=1024).to(device)
    outputs = BLOOM_model.generate(inputs)
    preds.append(tokenizer_bloom.decode(outputs[0]))
    trues.append(test_dataset['label'][ix])    

# Model Evaluation



In [37]:
#checking predictions careful to not cut the string predictions
dataframe = pd.DataFrame({'true': trues, 'pred':preds})
dataframe.pred.apply(lambda x: x[-8:])

0      positive
1      positive
2      positive
3      positive
4      negative
         ...   
867    negative
868    positive
869    negative
870    negative
871    positive
Name: pred, Length: 872, dtype: object

In [38]:
dataframe['pred'] = dataframe.pred.apply(lambda x: x[-8:])
dataframe.pred.apply(lambda x: 1 if x == "positive" else 0)

0      1
1      1
2      1
3      1
4      0
      ..
867    0
868    1
869    0
870    0
871    1
Name: pred, Length: 872, dtype: int64

In [39]:
 dataframe['pred'] = dataframe.pred.apply(lambda x: 1 if x == "positive" else 0)
 dataframe

Unnamed: 0,true,pred
0,1,1
1,0,1
2,1,1
3,1,1
4,0,0
...,...,...
867,0,0
868,1,1
869,0,0
870,0,0


In [41]:
metric = load_metric("accuracy")
metric.compute(predictions=dataframe['pred'], references=dataframe['true'])

  metric= load_metric("accuracy")


Downloading builder script:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

{'accuracy': 0.908256880733945}

Using **few-shot** examples:

# Try it yourself

You can try sentiment analysis with your own sentences by providing the input text as a string to the **`text`** variable in the code snippet provided below.


In [46]:
prompt = "Given the text classify the sentence as positive or negative:\n" 
text = prompt + "This movie is a bad movie!" #@param 
inputs = tokenizer_bloom.encode(text, return_tensors="pt", max_length=1024).to(device)
outputs = BLOOM_model.generate(inputs)
tokenizer_bloom.decode(outputs[0])



'Given the text classify the sentence as positive or negative:\nThis movie is a bad movie! negative'