<a href="https://colab.research.google.com/github/stigsfoot/genai-free-workshops-2024/blob/main/byte_an_atom_naive_rag_eval_llamaindex_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Retrieval Augmented Generation (RAG) with Actions (LlamaIndex Edition)

In this example, we'll be taking a look at how to apply the power of **R**etrieval **A**ugmented **G**eneration (RAG) to LLMs .

We'll get started by installing the prerequisite libraries:

In [None]:
!pip install -qU \
    nemoguardrails \
    datasets \
    openai \
    chromadb \
    tqdm \
    langchain \
    pandas \
    llama-index \
    llama-index-llms-openai

In [None]:
# The nest_asyncio module enables the nesting of asynchronous functions within an already running async loop.
# This is necessary because Jupyter notebooks inherently operate in an asynchronous loop.
# By applying nest_asyncio, we can run additional async functions within this existing loop without conflicts.
import nest_asyncio

nest_asyncio.apply()

from llama_index.core.evaluation import generate_question_context_pairs
from llama_index.core.evaluation import RetrieverEvaluator
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.core.node_parser import SentenceSplitter, SimpleNodeParser
from llama_index.llms.openai import OpenAI

import os
import pandas as pd

## Download Data

To begin, we need to setup our data and retrieval components for RAG. We'll start with a dataset that contains info on the recent Llama 2 models:

In [None]:
!mkdir -p 'data/black-code-collective/'
!curl 'https://raw.githubusercontent.com/stigsfoot/genai-free-workshops-2024/main/labs/black_code_collective.txt' -o 'data/black-code-collective/black_code_collective.txt'

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 26989  100 26989    0     0  65803      0 --:--:-- --:--:-- --:--:-- 65826


## Load our data/build an index

In [None]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core import Settings
from google.colab import userdata
import openai

openai_api_key = userdata.get('OPENAI_KEY')

openai.api_key = openai_api_key

documents = SimpleDirectoryReader("./data/black-code-collective/").load_data()

Settings.chunk_size = 512
Settings.chunk_overlap = 50

index = VectorStoreIndex.from_documents(
    documents,
)

query_engine = index.as_query_engine(similarity_top_k=4)


In [None]:
documents = SimpleDirectoryReader("./data/black-code-collective/").load_data()

# Define an LLM
llm = OpenAI(model="gpt-4")

# Build index with a chunk_size of 512
node_parser = SimpleNodeParser.from_defaults(chunk_size=512)
nodes = node_parser.get_nodes_from_documents(documents)
vector_index = VectorStoreIndex(nodes)

In [None]:
'''
Question samples:
How does the manifesto suggest improving tech recruiting for diversity?
What changes to job descriptions does the manifesto advocate for inclusivity?
What leadership actions are recommended to enhance workplace diversity?
'''
query_engine = vector_index.as_query_engine()
response_vector = query_engine.query("What changes to job descriptions does the manifesto advocate for inclusivity?")
response_vector.response

"The manifesto advocates for job descriptions to be gender-neutral and to include qualifications that accurately reflect the tasks that need to be performed. It suggests stating that candidates are not expected to know everything listed in the job description and emphasizing the organization's value for growth and the ability to learn. These changes are aimed at increasing the likelihood of diverse candidates applying."

In [None]:
# First retrieved node
response_vector.source_nodes[0].get_text()

'Make sure that the job descriptions are gender-neutral and qualifications that reflect what the actual job will need to be done. Including in job descriptions that candidates are not expected to know 100% of the things listed. It should state that the organization values growth and the ability to learn. These changes will increase the likelihood of diverse candidates applying.\n\nABOLISH GPA REQUIREMENTS\nStudies have also found that Grade Point Averages (GPAs) are ineffective in predicting an applicant\'s performance on the job. GPAs create barriers for people who may not have had the economic wealth/support to maintain or even earn "good grades". As universities and schools become more expensive a lot of Black and Brown students have had to work more jobs outside of school to pay for their education. As they try to balance work/school GPAs tend to decrease. We challenge companies to look beyond GPAs and test these students through their skills rather than completely discarding them 

In [None]:
# Second retrieved node
response_vector.source_nodes[1].get_text()

"BLACK CODE COLLECTIVE\n\nBLACK LIVES MATTER\nBLACK TECH WORKERS MATTER\nA BLACK TECH WORKER'S MANIFESTO\n\nAs first-generation Black tech workers, we have excelled in positions in a field once obscured for many who look like us. However, recent events have reminded us that no matter the success we obtain personally, we still exist in a culture where our bodies threaten and make others feel uncomfortable. We refuse to let this go by unaddressed in the tech community. We do not proclaim to have all the answers. Frankly, it was exhausting for us to sit down under the weight of oppression and still muster the energy to research and write to educate others. However, when we think about why we created Black Code Collective we remember that this community has recognized and empowered our Black voices and we want to contribute to building a world where the next generation of Black and Brown tech workers have respect our ancestors deeply desired. We don't write these words to educate every whi

## Eval

In [None]:
qa_dataset = generate_question_context_pairs(
    nodes,
    llm=llm,
    num_questions_per_chunk=2
)

100%|██████████| 18/18 [00:44<00:00,  2.48s/it]


Retriever

In [None]:
retriever = vector_index.as_retriever(similarity_top_k=2)

In [None]:
retriever_evaluator = RetrieverEvaluator.from_metric_names(
    ["mrr", "hit_rate"], retriever=retriever
)

In [None]:
# Evaluate
eval_results = await retriever_evaluator.aevaluate_dataset(qa_dataset)

In [None]:
def display_results(name, eval_results):
    """Display results from evaluate."""

    metric_dicts = []
    for eval_result in eval_results:
        metric_dict = eval_result.metric_vals_dict
        metric_dicts.append(metric_dict)

    full_df = pd.DataFrame(metric_dicts)

    hit_rate = full_df["hit_rate"].mean()
    mrr = full_df["mrr"].mean()

    metric_df = pd.DataFrame(
        {"Retriever Name": [name], "Hit Rate": [hit_rate], "MRR": [mrr]}
    )

    return metric_df

# Get the list of queries from the above created dataset

queries = list(qa_dataset.queries.values())

In [None]:
# gpt-3.5-turbo
gpt35 = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context_gpt35 = ServiceContext.from_defaults(llm=gpt35)

# gpt-4
gpt4 = OpenAI(temperature=0, model="gpt-4")
service_context_gpt4 = ServiceContext.from_defaults(llm=gpt4)


  service_context_gpt35 = ServiceContext.from_defaults(llm=gpt35)
  service_context_gpt4 = ServiceContext.from_defaults(llm=gpt4)


In [None]:
vector_index = VectorStoreIndex(nodes, service_context = service_context_gpt35)
query_engine = vector_index.as_query_engine()

### Faithfulness checks

This is a class designed to evaluate the faithfulness of a model's responses. Basically how accurately and reliably the model's responses reflect the information in the provided context or data.

In [None]:
from llama_index.core.evaluation import FaithfulnessEvaluator
faithfulness_gpt4 = FaithfulnessEvaluator(service_context=service_context_gpt4)
eval_query = queries[10]

eval_query

'According to the context, what are some of the challenges faced by Black tech workers in advocating for their careers and how can organizations help combat these issues?'

In [None]:
# Query the model
response_vector = query_engine.query(eval_query)

# Display the generated answer in a clear and concise format
print("Generated Answer:\n", response_vector)

# Evaluate the response for faithfulness
eval_result = faithfulness_gpt4.evaluate_response(response=response_vector)

# Check and display if the response passed the faithfulness evaluation
is_passing = eval_result.passing
print("Faithfulness Evaluation Passed?\n", is_passing)



Generated Answer:
 Some of the challenges faced by Black tech workers in advocating for their careers include feeling inadequate due to Imposter Syndrome, not knowing how to ask for positions, salaries, and responsibilities they deserve, and facing unconscious bias in leadership selection processes. Organizations can combat these issues by providing coaching on how to communicate contributions effectively, offering training for management on equitable talent evaluation, ensuring diverse voices are part of promotion conversations, establishing clear paths to promotion, and creating a more inclusive workplace culture that recognizes and validates workplace racism while promoting diverse talent at all levels of the organization.
Faithfulness Evaluation Passed:
 True


### Relevancy Evaluator
Checking if the response stays on topic, addressing the specific points raised in the question, and does not include unrelated or tangential information.

In [None]:
from llama_index.core.evaluation import RelevancyEvaluator

relevancy_gpt4 = RelevancyEvaluator(service_context=service_context_gpt4)
# Pick a query
query = queries[10]

query

'According to the context, what are some of the challenges faced by Black tech workers in advocating for their careers and how can organizations help combat these issues?'

In [None]:
# Generate response.
# response_vector has response and source nodes (retrieved context)
response_vector = query_engine.query(query)

# Relevancy evaluation
eval_result = relevancy_gpt4.evaluate_response(
    query=query, response=response_vector
)

# You can check passing parameter in eval_result if it passed the evaluation.
eval_result.passing

True

In [None]:
# Generate the response from the query engine
response_vector = query_engine.query(query)

# Display the generated response and context for clarity
print("Generated Response and Context:\n", response_vector)

# Perform relevancy evaluation on the generated response
eval_result = relevancy_gpt4.evaluate_response(query=query, response=response_vector)

# Extract and display the outcome of the relevancy evaluation
is_passing = eval_result.passing
print("Relevancy Evaluation Passed?\n", is_passing)


Generated Response and Context:
 Black tech workers face challenges in advocating for their careers due to Imposter Syndrome, lack of knowledge on how to ask for positions, salaries, and responsibilities, and feeling unsupported by their organization. Organizations can combat these issues by providing coaching on how to communicate contributions and training for management on how to evaluate talent equitably. Additionally, having clear paths to promotion and promoting diverse voices in leadership positions can help empower Black tech workers and create a more inclusive workplace culture.
Relevancy Evaluation Passed?
 True


In [None]:
# You can get the feedback for the evaluation.
eval_result.feedback

'YES'

# Advanced Eval with Tonic


In [None]:
!pip install tonic-validate -q

# Inference