# LLMS evaluation

# 1. Create simple RAG
I'm using Azure OpenAI implementations. The simple RAG code is based on a Ragas example and has been modified for educational purposes.

I have configured the llm_judge model to use a superior version, opting for o3 over o4. This follows best practices in LLM evaluation, where employing a higher-performing model as judge enhances result assessment

**Notes about env variables**
- _ENDPOINT_AZ_AI_: The _simple_rag_ library uses this environment variable, contains the Azure private endpoint of the model (AI Foundry)
- _AZURE_OPENAI_API_KEY_ variable is set in the system and is consumed by default in LangChain.

In [None]:
import os
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from rag import simple_rag
#ragas libs for evaluation LLM
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper

endpoint_az = os.environ["ENDPOINT_AZ_AI"]

llm_judge = AzureChatOpenAI(
    azure_endpoint= f'https://{endpoint_az}.cognitiveservices.azure.com/openai/deployments/o3-mini_pv/chat/completions?api-version=2025-01-01-preview',#os.environ["AZURE_OPENAI_ENDPOINT"],
    azure_deployment='o3-mini_pv',
    openai_api_version='2024-12-01-preview') #the judge model is superior to the model using for the RAG

embeddings_judge = AzureOpenAIEmbeddings(
    azure_endpoint= f'https://{endpoint_az}.cognitiveservices.azure.com/openai/deployments/text-embedding-3-large/embeddings?api-version=2023-05-15',#os.environ["AZURE_OPENAI_ENDPOINT"],
    azure_deployment='text-embedding-3-large',
    openai_api_version='2024-02-01')#embedding models judge is the same as the one used for the RAG

In [13]:
sample_docs = [
    "Google Inc published the transformative paper 'Attention is All You Need' in 2017, introducing the Transformer architecture that revolutionized natural language processing.",
    "Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.",
    "Marie Curie was a physicist and chemist who conducted pioneering research on radioactivity and won two Nobel Prizes.",
    "Alan Turing is considered the father of computer science for his work on algorithms and computation. He introduced the turing test, a criterion of intelligence.",
    "Ada Lovelace is regarded as the first computer programmer for her work on Charles Babbage's early mechanical computer, the Analytical Engine.",
    "Microsoft was founded by Bill Gates and Paul Allen in 1975, revolutionizing personal computing with software like Windows and Office.",
]

## 1.1 Let's ask a question

In [None]:
# Initialize my local RAG instance
rag = simple_rag.RAG(endpoint_az)

# Load simulated documents
rag.load_documents(sample_docs)

# Query and retrieve the most relevant document
query = "Who introduced the theory of relativity?"
relevant_doc = rag.get_most_relevant_docs(query)

# Generate an answer
answer = rag.generate_answer(query, relevant_doc)

print(f"Query: {query}")
print(f"Relevant Document: {relevant_doc}")
print(f"Answer: {answer}")

Query: Tell me about Microsoft
Relevant Document: ['Microsoft was founded by Bill Gates and Paul Allen in 1975, revolutionizing personal computing with software like Windows and Office.']
Answer: Microsoft was founded by Bill Gates and Paul Allen in 1975. The company revolutionized personal computing with its software products, notably Windows and Office.


## 2. Evaluate the RAG with ragas

### 2.1 Collect evaluation dataset

In [None]:
sample_queries = [
    "Who introduced the theory of relativity?",
    "Who published the concept of the transformers",
    "What did Ada Lovelace?",
    "Who won two Nobel Prizes for research on radioactivity?",
    "Tell me about Microsoft"
]

expected_responses = [
    "Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.",
    "Ada Lovelace is regarded as the first computer programmer for her work on Charles Babbage's early mechanical computer, the Analytical Engine.",
    "Isaac Newton formulated the laws of motion and universal gravitation, laying the foundation for classical mechanics.",
    "Marie Curie was a physicist and chemist who conducted pioneering research on radioactivity and won two Nobel Prizes.",
    "Charles Darwin introduced the theory of evolution by natural selection in his book 'On the Origin of Species'."
]

In [None]:
result = evaluate(dataset=evaluation_dataset,metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness()],llm=evaluator_llm)
result



result = evaluate(dataset=evaluation_dataset,metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness()],llm=evaluator_llm)
result

### References
- https://docs.ragas.io/en/stable/getstarted/rag_eval/#basic-setup
- https://python.langchain.com/docs/integrations/chat/
- Private repositories