##Obective:
## 1- Neutralizing the biased text using Llama 3.2
## 2- Neutralization evaluation using LLM Reclassification (here we are using Llama 3.2 and not DistilBERT).

## Note: in the future we will use Llama with the full set of text to build confidence. Also, step 2 which is currently done with DistilBERT will be tested with the Llama 3.2 as it is done now.

###Written by: Etienne Ndedi
###Date: 11_25_2024

##1- Llama 3.2 debiasing

##2-Llama 3.2 Reclassification after debiasing (experimental step)

###  The Reclassification rates will provide support to our debiasing strategy

In [None]:
!pip install openai==0.28



In [None]:
import torch
from torch.utils.data import Dataset, DataLoader
import pandas as pd
import numpy as np
import random
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


The file below has been replaced with the bias_test_med file which has a short length.

In [None]:
bias_test_full = pd.read_csv("/content/bias_test_full.csv", on_bad_lines='skip', engine='python')
len(bias_test_full)

25666

In [None]:
#Create two data sets, one with df_inputs['src_raw'] another with df_inputs['tgt_raw']
df_src = bias_test_full['src_raw']
df_tgt = bias_test_full['tgt_raw']

#rename df_tgt to 'example'
df_tgt = df_tgt.rename('example') # Changed 'columns' to 'name' for Series
#create a variable df_tgt['label'] with value 0
df_tgt = df_tgt.to_frame() # Convert Series to DataFrame to add a new column
df_tgt['label'] = 0
#rename df_src to 'example'
df_src = df_src.rename('example') # Changed 'columns' to 'name' for Series
#create a variable df_src['label'] with value 1
df_src = df_src.to_frame() # Convert Series to DataFrame to add a new column
df_src['label'] = 1
# Concatenate df_src and df_tgt to have a larger data set that we will shuffle
# This data set bias_unbias will have both the positive and negative labels
bias_unbias_test = pd.concat([df_src, df_tgt])
# Shuffle the data set. This is crucial to ensure there is a mix of bias and unbias
bias_unbias_test = bias_unbias_test.sample(frac=1).reset_index(drop=True)

############################only code to implement for neutralization ####
#Only get the records with a label of 1
#bias_unbias_test = bias_unbias_test[bias_unbias_test['label'] == 1]
###############################

# Print the number of rows in the DataFrame
print(f"Number of rows in DataFrame: {len(bias_unbias_test)}")

# Print the number of columns in the DataFrame
print(f"Number of columns in DataFrame: {len(bias_unbias_test.columns)}")

# Print the counts of the label column
print(bias_unbias_test['label'].value_counts())

#create a variable that simply counts the records, like an index
#This will be used to purge the records in sample below
bias_unbias_test['index'] = bias_unbias_test.index

#Isolate a sample of rows that will be used to finetune the LLM ie shots
sample_bias_unb = bias_unbias_test.sample(10)
sample_bias_unb = sample_bias_unb.to_csv('bias_unbias_sample.csv', index=False)
bias_unbias_test.sample(5)

Number of rows in DataFrame: 51332
Number of columns in DataFrame: 2
label
0    25666
1    25666
Name: count, dtype: int64


Unnamed: 0,example,label,index
36389,the reason is primarily historic: before 1990 ...,1,36389
5119,the hilltop altar is located on the north side...,0,5119
22206,the new york times is third in national circul...,0,22206
33079,the four noble truths (pali: cattri ariyasaccn...,0,33079
2656,the concept behind the set ranks being applied...,0,2656


Below are text sequences that could be used for prompting in the Llama 3.2 model

In [None]:
sample_bias_unb = pd.read_csv("/content/bias_unbias_sample.csv")
sample_bias_unb

Unnamed: 0,example,label,index
0,kemp first came into the public's limelight af...,1,42806
1,criticism of facebook's motives and violation ...,1,10225
2,"in 1812, with the new commander manuel belgran...",0,25281
3,"at least in some countries, communists have be...",0,33206
4,the first record of the concept of the vicar o...,0,7608
5,the team won 9 out of their first 12 games bef...,0,20287
6,he was a recipient of the knight's cross of th...,0,47734
7,"parameters, in the plural form, has recently b...",1,18685
8,the book follows them from their lessons in th...,0,26358
9,"hilary du pr was born in woking, surrey, the e...",1,35787


In [None]:
bias_unbias_test[bias_unbias_test['index'] == 37938]

Unnamed: 0,example,label,index
37938,the africa group for justice and accountabilit...,0,37938


## 1- Now remove the sample text from the other text. That will be used to build the promt.
## These are the uncommon rows between the two sets

In [None]:
bias_unbias_test = bias_unbias_test[~bias_unbias_test['index'].isin(sample_bias_unb['index'])]
bias_unbias_test.describe()

Unnamed: 0,label,index
count,51322.0,51322.0
mean,0.500019,25665.279393
std,0.500005,14818.819345
min,0.0,0.0
25%,0.0,12832.25
50%,1.0,25665.5
75%,1.0,38498.75
max,1.0,51331.0


Now Restrict the dataset to the biased texts

In [None]:
bias_unbias_test = bias_unbias_test[bias_unbias_test['label'] == 1]

In [None]:
#bias_unbias_new = pd.concat([sample_bias_unb[['example', 'label', 'index']], bias_unbias_test[['example', 'label', 'index']]]).drop_duplicates(keep=False)
#bias_unbias_new.describe()

In [None]:
V_keys = pd.DataFrame(bias_unbias_test['example'])
V_keys = V_keys['example'].to_list()
V_keys = pd.Series(V_keys, name = 'text')

V_values = pd.DataFrame(bias_unbias_test['label'])
V_values = V_values['label'].to_list()
V_values = pd.Series(V_values, name = 'label')

Copy the test file to csv as it's been reshuffled. This is for auditing and validating after we do classification.

In [None]:
test_data_bias = bias_unbias_test.to_csv('bias_unbias_test.csv', index=False)

Store the biased text and labels in a dictionary

In [None]:
!pip install datasets
from datasets import Dataset
# Create an empty dictionary
res = {}
# Convert V_values to a list for easier iteration and removal
V_values_list = V_values.tolist()

#Now create a dictionary with the series names 'text and 'label'
res = {'text': V_keys, 'label': V_values}
####
#create a dictionary of the elements of res
all_dict = dict(res)

#store all_dict using Dataset module
my_test_dataset = Dataset.from_dict(all_dict)



In [None]:
my_datasets = {'test': my_test_dataset.select_columns(['text','label'])}

In [None]:
my_datasets.keys()

dict_keys(['test'])

In [None]:
my_datasets['test'][8]['text']

'in 1972, the israeli olympic team were massacred in an attack by palestinian gunmen that started at the olympic village and eventually resulted in the deaths of 17.'

In [None]:
import re

def remove_text_between_tags(text, start_tag, end_tag):
  pattern = fr'{start_tag}(.|\n)*?{end_tag}'
  cleaned_text = re.sub(pattern, '', text, flags=re.DOTALL)
  return cleaned_text


def remove_final_tag(text, end_tag):
  pattern = fr'\s?{end_tag}'
  cleaned_text = re.sub(pattern, '', text, flags=re.DOTALL)
  return cleaned_text


def ret_post_final_tag(text, end_tag):
  cleaned_text = text.split(end_tag)[-1]
  return cleaned_text


def remove_after_last_curlybrace(string):
  last_brace_index = string.rfind('}')
  if last_brace_index != -1 and last_brace_index != len(string) - 1:
    string = string[:last_brace_index + 1]
  return string


start = "<\|begin_of_text\|>"
fin = "assistant<\|end_header_id\|>\n\n"
fin2 = "<\|eot_id\|>"


Now import Transformers

In [None]:
%%capture

!pip uninstall -y transformers
!pip install -q -U transformers

In [None]:
!pip install -q accelerate
!pip install -q bitsandbytes

Import the Llama model and ensure usage of GPUs if available

In [None]:
import torch
import pprint

device = "cuda:0" if torch.cuda.is_available() else "cpu"
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
device

'cuda:0'

Adding quantization steps

In [None]:
from transformers import BitsAndBytesConfig

nf4_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_quant_type="nf4",
   bnb_4bit_use_double_quant=True,
   bnb_4bit_compute_dtype=torch.bfloat16)

In [None]:
!pip install huggingface_hub
from huggingface_hub import login

# Get My Hugging Face token from https://huggingface.co/settings/tokens
token = "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz" #subscribe to a token to run it
login(token)



In [None]:
!pip install bitsandbytes --upgrade

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch # Import torch here

device = "cuda" # the device to load the model onto

# Assuming nf4_config and model_id are defined in previous cells
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=nf4_config, token = token, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id, token= token, trust_remote_code=True)



`low_cpu_mem_usage` was None, now default to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Save the texts data which excludes the few lines we saved earkier. As a reminder these will be used for the shots in prompt engineering.

In [None]:
import pandas as pd

store_data_bias = my_datasets['test']['text']
store_data_bias_df = pd.DataFrame(store_data_bias, columns=['text'])
store_data_bias_df.to_csv('store_data_bias.csv', index=False)

Here are a few examples of biased texts to neutralize through prompt engineering. The whole dataset could be run as well, but due to computing constraints, I am picking 3 examples.

In [None]:
my_datasets['test'][1000]

{'text': 'it is a beautiful building, located on the southern shore of tsim sha tsui, kowloon.',
 'label': 1}

In [None]:
#all_test_text = my_datasets['test']['text'][0:3]
#print(all_test_text)
all_test_text = my_datasets['test']['text'][1000:1020]

Now through prompt engineering, use a few shots of neutral text to neutralize the biased texts

In [None]:
H = "kemp first came into the public's limelight after starring in a major tv commercial for kellogg's fruit 'n fibre breakfast cereal."
C = "in 1812, with the new commander manuel belgrano, the junta decided to fund a second expeditionary campaign to upper peru with the objective of finally defeating the royalist troops, conquer upper peru and avenge the overwhelming defeat at huaqui, and also to raise the moral of the troops and citizenry."
D = "at least in some countries, communists have been founders and leading figures in attac, and even ramonet himself praises cuban fidel castro and is his only authorised biographer ."
E = "the team won 9 out of their first 12 games before christmas 2012, sitting top of the league going into the new year."
F = "he was a recipient of the knight's cross of the iron cross with oak leaves of nazi germany."
G = "parameters, in the plural form, has recently become popular with non-technical users who incorrectly believe it means limits, apparently from confusing it with perimeter ."
I = "criticism of facebook's motives and violation of net neutrality principles"
R = "the re-phrased sentence is"

rev_rec_list = []
rev_rec_text_list = []

for i in range(len(all_test_text)):

  myvoice = "Anti-bias activist"

  myprompt = f"Bias index A is set to 1 for biased text while it is 0 for neutral text. Using the text in variables C,D,E,F,G,H, and I as examples, please re-phrase the following biased text and turn it into a neutral text. Please strictly include the one re-phrased text in the output with no additional explanations or introductory text. "\
             f"Please, do not introduce the re-phrased neutral text. Do not use phrases like {R}"\
             f"A= 1 for the biased text {H}"\
             f"A= 0 for the neutral text {C}"\
             f"A= 0 for the neutral text {D}"\
             f"A= 0 for the neutral text {E}"\
             f"A= 0 for the neutral text {F}"\
             f"A= 1 for the biased text {G}"\
             f"A= 1 for the biased text {I}"\
             f"The biased text is {all_test_text[i]}."
  messages = [{"role": "user", "content": myprompt}]

  encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

  model_inputs = encodeds.to(device)

  generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
  print(".")
  decoded = tokenizer.batch_decode(generated_ids)
  cleaned = decoded[0]

  cleaned1 = remove_text_between_tags(cleaned, start, fin)
  cleaned2 = remove_final_tag(cleaned1, fin2)
  cleaned3 = ret_post_final_tag(cleaned2, fin)
  rev_rec_list.append(cleaned3.strip())
  #Append the text also for auditing purpose
  rev_rec_text_list.append(all_test_text[i])

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.


In [None]:
store_bias_df = pd.DataFrame(rev_rec_text_list, columns=['biased_text'])
store_bias_df.to_csv('bias_df.csv', index=False)
store_neutral_df = pd.DataFrame(rev_rec_list, columns=['biased_text'])
store_neutral_df.to_csv('neutral_df.csv', index=False)

In [None]:
for record in rev_rec_list:
        print(record)

It is a notable building, situated on the southern coast of Tsim Sha Tsui, Kowloon.
The team that won 9 out of their first 12 games before Christmas 2012 finished the year at the top of the league.
The teen's family expressed strong disappointment with the sentence, labeling it as "a lenient punishment" which "may be perceived as condoning the violent treatment of minors."
The team committed a significant number of penalties, resulting in a loss of 9-6.
Unfortunately, this was his only season in which he played.
Avraham stern, the leader of the lehi terrorist gang, had a strong desire to fight the british, and some zionists attempted to persuade the nazis to view emigration from europe as a potential solution to their "jewish problem", but the nazis ultimately shifted their approach towards containment and physical extermination.
Sonnenfeld was known for his anti-Zionist views and sent a delegate, a former Dutch diplomat and writer who had converted to Orthodox Judaism, Dr. Jacob Israe

Store the rephrased sentence

In [None]:
rephrased_bias_df = pd.DataFrame(rev_rec_list, columns=['text'])
rephrased_bias_df.to_csv('rephrased_bias_list.csv', index=False)

To evaluate, the generated neutral text could be re-classified for bias. The expectation is that it should have a bias index of 0.

In [None]:
V_keys = pd.DataFrame(rev_rec_list)
V_keys = V_keys[0].to_list()
V_keys = pd.Series(V_keys, name = 'text')

In [None]:
#Now create a dictionary with the series names 'text'
res = {'text': V_keys}
####
#create a dictionary of the elements of res
all_dict = dict(res)

#store all_dict using Dataset module
my_test_dataset = Dataset.from_dict(all_dict)

In [None]:
print(my_test_dataset[0])

{'text': 'It is a notable building, situated on the southern coast of Tsim Sha Tsui, Kowloon.'}


## The following prompt HAS TO BE the same as the one used for bias classification.
## The LLM model also has to be the same as the one used for bias classification


In [None]:
H = "kemp first came into the public's limelight after starring in a major tv commercial for kellogg's fruit 'n fibre breakfast cereal."
C = "in 1812, with the new commander manuel belgrano, the junta decided to fund a second expeditionary campaign to upper peru with the objective of finally defeating the royalist troops, conquer upper peru and avenge the overwhelming defeat at huaqui, and also to raise the moral of the troops and citizenry."
D = "at least in some countries, communists have been founders and leading figures in attac, and even ramonet himself praises cuban fidel castro and is his only authorised biographer ."
E = "the team won 9 out of their first 12 games before christmas 2012, sitting top of the league going into the new year."
F = "he was a recipient of the knight's cross of the iron cross with oak leaves of nazi germany."
G = "parameters, in the plural form, has recently become popular with non-technical users who incorrectly believe it means limits, apparently from confusing it with perimeter ."
I = "criticism of facebook's motives and violation of net neutrality principles"

rev_rec_list = []
rev_rec_text_list = []

for i in range(len(all_test_text)):

  myvoice = "Anti-bias activist"

  myprompt = f"Bias index A is 1 for biased text and 0 for neutral text. Using text in C,D,E,F,G,H, and I as examples, please give the value of A for the following text. Only the value of A must be in the output. Please, no text in output"\
             f"{H} is biased and A= 1"\
             f"{C} is neutral and A= 0"\
             f"{D} is neutral and A=  0"\
             f"{E} is neutral and A=  0"\
             f"{F} is neutral and A=  0"\
             f"{G} is biased and A=  1"\
             f"{I} is biased and A=  1"\
             f"The text is {my_test_dataset[i]}.  A = "

  messages = [{"role": "user", "content": myprompt}]

  encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

  model_inputs = encodeds.to(device)

  generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
  print(".")
  decoded = tokenizer.batch_decode(generated_ids)
  cleaned = decoded[0]

  cleaned1 = remove_text_between_tags(cleaned, start, fin)
  cleaned2 = remove_final_tag(cleaned1, fin2)
  cleaned3 = ret_post_final_tag(cleaned2, fin)
  rev_rec_list.append(cleaned3.strip())

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.


In [None]:
for record in rev_rec_list:
        print(record)

0
0
0
0
0
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024
<|start_header_id|>user<|end_header_id|>

Bias index A is 1 for biased text and 0 for neutral text. Using text in C,D,E,F,G,H, and I as examples, please give the value of A for the following text. Only the value of A must be in the output. Please, no text in outputkemp first came into the public's limelight after starring in a major tv commercial for kellogg's fruit 'n fibre breakfast cereal. is biased and A= 1in 1812, with the new commander manuel belgrano, the junta decided to fund a second expeditionary campaign to upper peru with the objective of finally defeating the royalist troops, conquer upper peru and avenge the overwhelming defeat at huaqui, and also to raise the moral of the troops and citizenry. is neutral and A= 0at least in some countries, communists have been founders and leading figures in attac, and even ramonet himself praises cuban fid

Now calculate the neutralization rate

In [None]:
diff_label_df = pd.DataFrame(rev_rec_list, columns=['diff_label'])
Accuracy = diff_label_df[diff_label_df['diff_label'] == '0']['diff_label'].count() / len(diff_label_df)
print(Accuracy)

0.95
