# **Naive RAG**
The Naive RAG is the simplest technique in the RAG ecosystem, providing a straightforward approach to combining retrieved data with LLM models for efficient user responses.

Research Paper: [RAG](https://arxiv.org/pdf/2005.11401)

## **Initial Setup**

In [None]:
# ! pip install --q athina

In [None]:
from dotenv import load_dotenv
import os
# from google.colab import userdata
# os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY').strip()
# os.environ['PINECONE_API_KEY'] = userdata.get('PINECONE_API_KEY').strip()
load_dotenv()


## **Indexing**

In [None]:
# load embedding model
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [None]:
# load data
from langchain_community.document_loaders import CSVLoader
loader = CSVLoader("./context.csv")
documents = loader.load()

In [None]:
# split documents
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
documents = text_splitter.split_documents(documents)

## **Pinecone Vector Database**

In [None]:
# initialize pinecone client
from pinecone import Pinecone as PineconeClient, ServerlessSpec
pc = PineconeClient(
    api_key=os.environ.get("PINECONE_API_KEY"),
)

In [None]:
# create index
pc.create_index(
        name='my-index',
        dimension=1536,
        metric="cosine",
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )

In [None]:
# load index
index_name = "my-index"

In [None]:
# create vectorstore
from langchain_community.vectorstores import Pinecone
vectorstore = Pinecone.from_documents(
    documents=documents,
    embedding=embeddings,
    index_name=index_name
)

## **FAISS (Optional)**

In [None]:
# # optional vectorstore
# !pip install --q faiss-gpu

# # create vectorstore
# from langchain_community.vectorstores import FAISS
# vectorstore = FAISS.from_documents(documents, embeddings)

## **Retriever**

In [None]:
# create retriever
retriever = vectorstore.as_retriever()

## **RAG Chain**

In [None]:
# load llm
from langchain_openai import ChatOpenAI
llm = ChatOpenAI()

In [None]:
# create document chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

template = """"
You are a helpful assistant that answers questions based on the provided context.
Use the provided context to answer the question.
Question: {input}
Context: {context}
Answer:
"""
prompt = ChatPromptTemplate.from_template(template)

# Setup RAG pipeline
rag_chain = (
    {"context": retriever,  "input": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
# response
response = rag_chain.invoke("when did ww1 end?")
response

'World War I ended on November 11, 1918.'

## **Preparing Data for Evaluation**

In [None]:
# create dataset
question = ["when did ww1 end?"]
response = []
contexts = []

# Inference
for query in question:
  response.append(rag_chain.invoke(query))
  contexts.append([docs.page_content for docs in retriever.invoke(query)])

# To dict
data = {
    "query": question,
    "response": response,
    "context": contexts,
}

In [None]:
# create dataset
from datasets import Dataset
dataset = Dataset.from_dict(data)

In [None]:
# create dataframe
import pandas as pd
df = pd.DataFrame(dataset)

In [None]:
df

Unnamed: 0,query,response,context
0,when did ww1 end?,"World War I ended on November 11, 1918.",[context: ['World War I or the First World War...


In [None]:
# Convert to dictionary
df_dict = df.to_dict(orient='records')

# Convert context to list
for record in df_dict:
    if not isinstance(record.get('context'), list):
        if record.get('context') is None:
            record['context'] = []
        else:
            record['context'] = [record['context']]

## **Evaluation in Athina AI**

We will use **Does Response Answer Query** eval here. It Checks if the response answer the user's query. To learn more about this. Please refer to our [documentation](https://docs.athina.ai/api-reference/evals/preset-evals/overview) for further details.

In [None]:
# set api keys for Athina evals
from athina.keys import AthinaApiKey, OpenAiApiKey
OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

In [None]:
# load dataset
from athina.loaders import Loader
dataset = Loader().load_dict(df_dict)

In [None]:
# evaluate
from athina.evals import DoesResponseAnswerQuery
DoesResponseAnswerQuery(model="gpt-4o").run_batch(data=dataset).to_df()

You can view your dataset at: https://app.athina.ai/develop/80872384-24ac-4ad9-824d-74dc02cb7cca


Unnamed: 0,query,context,response,expected_response,display_name,failed,grade_reason,runtime,model,passed
0,when did ww1 end?,"[context: ['World War I or the First World War (28 July 1914 – 11 November 1918), often abbreviated as WWI, was one of the deadliest global conflicts in history. It was fought between two coalitions, the Allies and the Central Powers. Fighting occurred throughout Europe, the Middle East, Africa, the Pacific, and parts of Asia. An estimated 9 million soldiers were killed in combat, plus another 23 million wounded, while 5 million civilians died as a result of military action, hunger, and dise...","World War I ended on November 11, 1918.",,Does Response Answer Query,False,"The response directly answers the user's query by providing the specific date on which World War I ended, which is November 11, 1918. This sufficiently covers all aspects of the user's query.",787,gpt-4o,1.0
