Користејќи го моделот FLAN-T5 со техниката few-shot prompting за секој примерок 
од податочното множество за препознавање на навредлив текст одредете дали 
примерокот содржи навредлив текст или не. Испробајте со користење различен 
број на примероци (n = 1, 2, 3, 5, 10).
Добиените предвидувања евалуирајте ги со метриките: точност
(accuracy_score), прецизност (precision_score), одзив (recall_score) и F1-
мерка (f1_score). Евалуацијата направете ја посебно за сите подмножества 
(подмножество за тренирање, валидација и тестирање).
Испробајте ги следните prompts:
1. „Here is a text: <text>, which is <label>. Classify the following text: <text> into
<label1> or <label2>.“.
2. „Here is a text: <text>, which is not <label>. Classify the following text: <text> into
<label1> or <label2>.“.
Дали овој модел е подобар од моделите во втората лабораториска вежба?


In [37]:
# !pip install sentence_transformers

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from sentence_transformers import SentenceTransformer, util

In [3]:
test_en_path = "C:/Users/Mia/Desktop/FINKI/NLP/nlp/data/offensive text detection/test_en.txt"
train_en_path = "C:/Users/Mia/Desktop/FINKI/NLP/nlp/data/offensive text detection/train_en.txt"
val_en_path = "C:/Users/Mia/Desktop/FINKI/NLP/nlp/data/offensive text detection/val_en.txt"

In [4]:
train_en = pd.read_table(train_en_path).dropna()
test_en = pd.read_table(test_en_path).dropna()
val_en = pd.read_table(val_en_path).dropna()

In [5]:
dataset = pd.concat([train_en, pd.concat([test_en, val_en])])

In [6]:
train_samples = train_en['Sentence'].values.tolist()
train_labels = train_en['Label'].values.tolist()
test_samples = test_en['Sentence'].values.tolist()
test_labels = test_en['Label'].values.tolist()
val_samples = val_en['Sentence'].values.tolist()
val_labels = val_en['Label'].values.tolist()

In [7]:
model = AutoModelForSeq2SeqLM.from_pretrained('google/flan-t5-base')
tokenizer = AutoTokenizer.from_pretrained('google/flan-t5-base')

In [8]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [9]:
def evaluate(y_test, y_pred, prompt_type, train_test_or_val):
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)

    print(train_test_or_val + ": " + prompt_type)
    print("Accuracy:", accuracy)
    print("Precision:", precision)
    print("Recall:", recall)
    print("F1 Score:", f1)

In [22]:
def prompt_with_number_of_examples(samples, labels, no_of_examples):
    pred_labels = []

    for sample, label in zip(samples, labels):
        example = []
       
        for i in range(no_of_examples):
            example_text = samples[i]
            example_label = 'offensive' if labels[i] == 1 else 'non-offensive'
            result_example = f'Text: {example_text}\nCategory: {example_label}'
            example.append(result_example)

        prompt = f'{example}\nBased on the above example, classify the text into offensive or non-offensive: {sample}'
        
        # print(prompt)

        input_data = tokenizer(prompt, return_tensors='pt')
        input_ids = input_data.input_ids
        
        output = model.generate(input_ids)
        pred_label = tokenizer.decode(output[0])

        pred_labels.append(pred_label)
    
    return pred_labels

In [11]:
import re

def clean_prediction(pred_label):
    pattern = re.compile('<.*?>')
    pred_list = []

    for pred in pred_label:
        pred = re.sub(pattern, '', pred)
        pred = pred.strip()
        # print(pred)

        if pred == "non-offensive":
            pred = 0
        elif pred == "offensive":
            pred = 1

        pred_list.append(pred)

    return pred_list

In [18]:
def predict_with_few_shot_prompting(samples, labels, train_test_or_val):

    pred_labels_n_1 = prompt_with_number_of_examples(samples, labels, 1) 
    pred_labels_n_2 = prompt_with_number_of_examples(samples, labels, 2)
    pred_labels_n_5 = prompt_with_number_of_examples(samples, labels, 5)
    pred_labels_n_10 = prompt_with_number_of_examples(samples, labels, 10)

    evaluate(labels, clean_prediction(pred_labels_n_1), "N 1", train_test_or_val)
    evaluate(labels, clean_prediction(pred_labels_n_2), "N 2", train_test_or_val)
    evaluate(labels, clean_prediction(pred_labels_n_5), "N 5", train_test_or_val)
    evaluate(labels, clean_prediction(pred_labels_n_10), "N 10", train_test_or_val)

In [23]:
pred_labels_train = predict_with_few_shot_prompting(train_samples[:100], train_labels[:100], "Train")

In [21]:
pred_labels_test = predict_with_few_shot_prompting(test_samples[:100], test_labels[:100], "Test")

['Text: So maybe you should be more retarded.\nCategory: offensive']
Based on the above example, classify the text into offensive or non-offensive: So maybe you should be more retarded.
['Text: So maybe you should be more retarded.\nCategory: offensive']
Based on the above example, classify the text into offensive or non-offensive: THERES A MEGATHREAD FOR VACCINE OR COVID RELATED TOPICS. DON'T TALK ABOUT THAT SHIT HERE IDIOT!
['Text: So maybe you should be more retarded.\nCategory: offensive', "Text: THERES A MEGATHREAD FOR VACCINE OR COVID RELATED TOPICS. DON'T TALK ABOUT THAT SHIT HERE IDIOT!\nCategory: offensive"]
Based on the above example, classify the text into offensive or non-offensive: So maybe you should be more retarded.
['Text: So maybe you should be more retarded.\nCategory: offensive', "Text: THERES A MEGATHREAD FOR VACCINE OR COVID RELATED TOPICS. DON'T TALK ABOUT THAT SHIT HERE IDIOT!\nCategory: offensive"]
Based on the above example, classify the text into offensive or

IndexError: list index out of range

In [1]:
pred_labels_val = predict_with_few_shot_prompting(val_samples[:100], val_labels[:100], "Val")

NameError: name 'predict_with_few_shot_prompting' is not defined

Train evaluation

In [62]:
evaluate(train_labels[:100], clean_prediction(pred_labels_train))

Accuracy: 0.53
Precision: 1.0
Recall: 0.53
F1 Score: 0.6928104575163399


In [63]:
evaluate(test_labels[:100], clean_prediction(pred_labels_test))

Accuracy: 0.59
Precision: 1.0
Recall: 0.59
F1 Score: 0.7421383647798743


In [64]:
evaluate(val_labels[:100], clean_prediction(pred_labels_val))

Accuracy: 0.53
Precision: 1.0
Recall: 0.53
F1 Score: 0.6928104575163399
