In [None]:
!pip -q install groq pypdf faiss-cpu llama-index llama-index-readers-file llama-index-llms-groq llama-index-embeddings-huggingface llama-index-vector-stores-faiss bert-score ragas


Checking for upgrades of openai

## **Environment SetUp**
Solution is developed in google collab

In [None]:
import pandas as pd
from llama_index.core import SimpleDirectoryReader, ServiceContext, VectorStoreIndex, StorageContext
from llama_index.core.response.pprint_utils import pprint_response

from llama_index.llms.groq import Groq
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.node_parser import SimpleNodeParser
from llama_index.core.node_parser import (SentenceWindowNodeParser,)
from llama_index.core.text_splitter import SentenceSplitter
from llama_index.core import Document
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
import faiss
from llama_index.vector_stores.faiss import FaissVectorStore
import os
from getpass import getpass

In [None]:
import nest_asyncio

nest_asyncio.apply()

In [None]:
os.environ['GROQ_API_KEY'] = getpass('GROQ_API_KEY')

GROQ_API_KEY··········


**Creating the LLM**

In [None]:
llm = Groq(model="llama-3.3-70b-versatile",
           api_key=os.environ.get("GROQ_API_KEY"))

**Creating Embeddings**

In [None]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name = "BAAI/bge-small-en-v1.5",
                                   embed_batch_size=10)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = embed_model
Settings.node_parser = SentenceSplitter(chunk_size=512,
                                        chunk_overlap=20)
Settings.num_output = 512
Settings.context_window = 3900

# **Loading Data and Creating Indexes**

In [None]:
racism_report = SimpleDirectoryReader(
    input_files=["HR.pdf"]
).load_data()

**Building Indices**

In [None]:
dim = 384
faiss_index = faiss.IndexFlatL2(dim)

In [None]:
vec_store = FaissVectorStore(faiss_index=faiss_index)
context = StorageContext.from_defaults(vector_store=vec_store)
report_index = VectorStoreIndex.from_documents(racism_report, storage_context=context)

**Building the Query engine**

In [None]:
report_engine = report_index.as_query_engine(similarity_top_k=5)

In the below section the Questions and their Baseline Responses are read from the excel sheet "Questions_and_Baseline_Responses.xlsx". A pd dataframe is created from the file read

In [None]:
file_path = 'proposed_responses.xlsx'

In [None]:
question_list=pd.read_excel(file_path)

In [None]:
pd.set_option('display.max_colwidth', None)

In [None]:
question_list.head(5)

Unnamed: 0,Question,Responses
0,What is the aim of the report study?,The aim of the study is to better understand how racism prevails in workplaces and highlight the organizational and systemic factors that contribute to the issue.
1,How is racism at work experienced by marginalised racial and ethnic groups?,"Racism at work can mean being passed over for a promotion, being paid less, and being excluded from advancement opportunities due to the individual belonging to marginalised racial and ethnic groups. It can also mean being targeted with slurs and stereotypes, as well as derisive comments about physical features, dress, and food."
2,What is the most experienced form of racism at work as per the report?,"Close to half described most experienced form of workplace harassment as racist, slurs, jokers and other derogatory comments. Also stereotypes, as well as derisive comments about physical features, dress, and food are experienced."
3,What are top 3 solutions proposed by the report to combat racism at work?,"By ensuring, managers are trained to act accordingly in cases of racism at work by following clear process of investigation and penaliszing racist incident. Training employees to identify microaggressions and\nteaching them how to step in if they witness one.Creating a code of conduct for customers and clients that explicitly states theorganization’s expectations regarding civility, respect, common courtesy, and hurtful comments about race, ethnicity, gender, and\nother personal characteristics."


In [None]:
question_list.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Question   4 non-null      object
 1   Responses  4 non-null      object
dtypes: object(2)
memory usage: 196.0+ bytes


In [None]:
question_list.Question[0]

'What is the aim of the report study?'

In [None]:
question_list.Responses[0]

'The aim of the study is to better understand how racism prevails in workplaces and highlight the organizational and systemic factors that contribute to the issue.'

### Retrieving responses from the document using RAG

In the below section appropriate responses to few shot prompt questions are retrieved using RAG model created. The responses are added to a list data[] containing the question, the corresponding baseline response and the RAG generated response. This list[] object is then converted to a pd dataframe, new_data. The objective of this dataframe is to get a comparative view of the baseline response and the RAG generated response which can then be used to calculate the BERT score in the subsequent section

In [None]:
data=[]

In [None]:
for index, row in question_list.iterrows():
    question = row['Question']
    # Assuming you want to use the answer as well, though it's not clear how in this context
    response = row['Responses']
    # Generate a response using the report_engine
    generated_response = report_engine.query(question)
    data.append({
        'Question': question,
        'Response': response,
        'RAG_Response': generated_response
    })
    print(f"Question: {question}")
    print(f"Response:{response}")
    print(f"Generated Response: {generated_response}\n")

Question: What is the aim of the report study?
Response:The aim of the study is to better understand how racism prevails in workplaces and highlight the organizational and systemic factors that contribute to the issue.
Generated Response: The aim of the report study is to identify patterns of racism in the workplace and determine strategies for action by analyzing survey responses from participants who have experienced racism in their current job.

Question: How is racism at work experienced by marginalised racial and ethnic groups? 
Response:Racism at work can mean being passed over for a promotion, being paid less, and being excluded from advancement opportunities due to the individual belonging to marginalised racial and ethnic groups. It can also mean being targeted with slurs and stereotypes, as well as derisive comments about physical features, dress, and food.
Generated Response: Racism at work is experienced by marginalized racial and ethnic groups in various ways, including be

In [None]:
new_data=pd.DataFrame(data)

In [None]:
new_data

Unnamed: 0,Question,Response,RAG_Response
0,What is the aim of the report study?,The aim of the study is to better understand how racism prevails in workplaces and highlight the organizational and systemic factors that contribute to the issue.,The aim of the report study is to identify patterns of racism in the workplace and determine strategies for action by analyzing survey responses from participants who have experienced racism in their current job.
1,How is racism at work experienced by marginalised racial and ethnic groups?,"Racism at work can mean being passed over for a promotion, being paid less, and being excluded from advancement opportunities due to the individual belonging to marginalised racial and ethnic groups. It can also mean being targeted with slurs and stereotypes, as well as derisive comments about physical features, dress, and food.","Racism at work is experienced by marginalized racial and ethnic groups in various ways, including being passed over for promotions, being paid less, and being excluded from advancement opportunities. It can also take the form of being targeted with slurs and stereotypes, as well as derisive comments about physical features, dress, and food. Additionally, racism at work can be perpetrated by anyone, regardless of their gender, race, or relationship to the victim, and can come from superiors, colleagues, and customers. The experiences of racism can be subtle, such as being treated differently or excluded from social interactions, or more overt, such as being verbally abused or having one's culture or heritage disrespected. Overall, racism at work is a common experience for many employees from marginalized racial and ethnic groups, with two-thirds reporting that they have experienced racism at work during their career, and half experiencing racism in their current job."
2,What is the most experienced form of racism at work as per the report?,"Close to half described most experienced form of workplace harassment as racist, slurs, jokers and other derogatory comments. Also stereotypes, as well as derisive comments about physical features, dress, and food are experienced.","The most common expressions of racism involved workplace harassment, which was cited by almost half of participants, and employment and professional inequities, which were cited by about one-third. Workplace harassment includes being subjected to derogatory and snide remarks, racial slurs, racist jokes, comments about a person’s accent or assumed native language, and comments to “go back to your country.”"
3,What are top 3 solutions proposed by the report to combat racism at work?,"By ensuring, managers are trained to act accordingly in cases of racism at work by following clear process of investigation and penaliszing racist incident. Training employees to identify microaggressions and\nteaching them how to step in if they witness one.Creating a code of conduct for customers and clients that explicitly states theorganization’s expectations regarding civility, respect, common courtesy, and hurtful comments about race, ethnicity, gender, and\nother personal characteristics.","The top 3 solutions proposed to combat racism at work are: \n\n1. Training managers to learn about the emotional tax that being on guard to bias against race, ethnicity, and gender levies against people from marginalized racial and ethnic groups, and training them on allyship and curiosity to decrease experiences of racism.\n\n2. Clarifying team expectations and norms for mutual respect and building an inclusive environment where everyone feels valued, trusted, authentic, and psychologically safe, and helping employees practice communicating across differences to have a dialogue rather than a debate.\n\n3. Giving managers the authority to follow through on clear processes for investigating and penalizing racist incidents, and creating a code of conduct for customers and clients that explicitly states the organization's expectations regarding civility, respect, common courtesy, and hurtful comments about race, ethnicity, gender, and other personal characteristics."


### Calculating BERT Score

In the below section a for loop is created to calculate the BERT score for each of the rows in the dataframe new_data, comparing the baseline responses and the RAG generated responses. The generated score is then appended to the data frame to create a new df new_bert which is then converted to excel "Bert_Score.xls" and downloaded.

In [None]:
from bert_score import score

In [None]:
bertScores = []

In [None]:
for _, row in new_data.iterrows():
    question = row["Question"]
    resp_baseline = row["Response"]
    resp_llm = row["RAG_Response"]

    resp_baseline_text = str(resp_baseline)
    resp_llm_text = str(resp_llm)

    # Calculate BERTScore
    P, R, F1 = score([resp_llm_text],[resp_baseline_text],lang="en")
    bertScores.append(F1.item())

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You sho

In [None]:
bertScores_df=pd.DataFrame(bertScores, columns=["BERTScore"])

In [None]:
new_data_bert=pd.concat([new_data,bertScores_df],axis=1)

In [None]:
new_data_bert

Unnamed: 0,Question,Response,RAG_Response,BERTScore
0,What is the aim of the report study?,The aim of the study is to better understand how racism prevails in workplaces and highlight the organizational and systemic factors that contribute to the issue.,The aim of the report study is to identify patterns of racism in the workplace and determine strategies for action by analyzing survey responses from participants who have experienced racism in their current job.,0.914589
1,How is racism at work experienced by marginalised racial and ethnic groups?,"Racism at work can mean being passed over for a promotion, being paid less, and being excluded from advancement opportunities due to the individual belonging to marginalised racial and ethnic groups. It can also mean being targeted with slurs and stereotypes, as well as derisive comments about physical features, dress, and food.","Racism at work is experienced by marginalized racial and ethnic groups in various ways, including being passed over for promotions, being paid less, and being excluded from advancement opportunities. It can also take the form of being targeted with slurs and stereotypes, as well as derisive comments about physical features, dress, and food. Additionally, racism at work can be perpetrated by anyone, regardless of their gender, race, or relationship to the victim, and can come from superiors, colleagues, and customers. The experiences of racism can be subtle, such as being treated differently or excluded from social interactions, or more overt, such as being verbally abused or having one's culture or heritage disrespected. Overall, racism at work is a common experience for many employees from marginalized racial and ethnic groups, with two-thirds reporting that they have experienced racism at work during their career, and half experiencing racism in their current job.",0.911348
2,What is the most experienced form of racism at work as per the report?,"Close to half described most experienced form of workplace harassment as racist, slurs, jokers and other derogatory comments. Also stereotypes, as well as derisive comments about physical features, dress, and food are experienced.","The most common expressions of racism involved workplace harassment, which was cited by almost half of participants, and employment and professional inequities, which were cited by about one-third. Workplace harassment includes being subjected to derogatory and snide remarks, racial slurs, racist jokes, comments about a person’s accent or assumed native language, and comments to “go back to your country.”",0.870931
3,What are top 3 solutions proposed by the report to combat racism at work?,"By ensuring, managers are trained to act accordingly in cases of racism at work by following clear process of investigation and penaliszing racist incident. Training employees to identify microaggressions and\nteaching them how to step in if they witness one.Creating a code of conduct for customers and clients that explicitly states theorganization’s expectations regarding civility, respect, common courtesy, and hurtful comments about race, ethnicity, gender, and\nother personal characteristics.","The top 3 solutions proposed to combat racism at work are: \n\n1. Training managers to learn about the emotional tax that being on guard to bias against race, ethnicity, and gender levies against people from marginalized racial and ethnic groups, and training them on allyship and curiosity to decrease experiences of racism.\n\n2. Clarifying team expectations and norms for mutual respect and building an inclusive environment where everyone feels valued, trusted, authentic, and psychologically safe, and helping employees practice communicating across differences to have a dialogue rather than a debate.\n\n3. Giving managers the authority to follow through on clear processes for investigating and penalizing racist incidents, and creating a code of conduct for customers and clients that explicitly states the organization's expectations regarding civility, respect, common courtesy, and hurtful comments about race, ethnicity, gender, and other personal characteristics.",0.878213
