# LlamaIndex

[LlamaIndex](https://github.com/run-llama/llama_index) is a data framework for LLM applications to ingest, structure, and access private or domain-specific data. Makes it super easy to connect LLMs with your own data. But in order to figure out the best configuration for llamaIndex and your data you need a object measure of the performance. This is where ragas comes in. Ragas will help you evaluate your `QueryEngine` and gives you the confidence to tweak the configuration to get hightest score.

This guide assumes you have familarity with the LlamaIndex framework.

## Building the Testset

You will need an testset to evaluate your `QueryEngine` against. You can either build one yourself or use the [Testset Generator Module](../../getstarted/testset_generation.md) in Ragas to get started with a small synthetic one.

Let's see how that works with Llamaindex

In [1]:
# load the documents
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./nyc_wikipedia").load_data()

Now  lets init the `TestsetGenerator` object with the corresponding generator and critic llms

In [6]:
from ragas.testset import TestsetGenerator

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# generator with openai models
generator_llm = OpenAI(model="gpt-4o")
embeddings = OpenAIEmbedding(model="text-embedding-3-large")

generator = TestsetGenerator.from_llama_index(
    llm=generator_llm,
    embedding_model=embeddings,
)

Now you are all set to generate the dataset

In [7]:
# generate testset
testset = generator.generate_with_llamaindex_docs(
    documents,
    testset_size=5,
)

Applying [SummaryExtractor, HeadlinesExtractor]:   0%|          | 0/2 [00:00<?, ?it/s]

Applying EmbeddingExtractor:   0%|          | 0/1 [00:00<?, ?it/s]

Applying HeadlineSplitter:   0%|          | 0/1 [00:00<?, ?it/s]

Applying [EmbeddingExtractor, KeyphrasesExtractor, TitleExtractor]:   0%|          | 0/42 [00:00<?, ?it/s]

unable to apply transformation: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 27528 tokens (27528 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
unable to apply transformation: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 14879 tokens (14879 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
unable to apply transformation: Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}


Applying CosineSimilarityBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

unable to apply transformation: Node d3a44301-4216-4805-9cad-7321a50eddcf has no embedding


Applying SummaryCosineSimilarityBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating common themes:   0%|          | 0/2 [00:00<?, ?it/s]

Generating common_concepts:   0%|          | 0/1 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/7 [00:00<?, ?it/s]

In [8]:
df = testset.to_pandas()
df.head()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,What events led to New York being named after ...,"[Etymology ==\n\nIn 1664, New York was named i...",New York was named after the Duke of York in 1...,AbstractQuerySynthesizer
1,How early European explorers and Native Americ...,[History ==\n\n\n=== Early history ===\nIn the...,Early European explorers and Native Americans ...,AbstractQuerySynthesizer
2,New York City population economy challenges,"[New York City, the most populous city in the ...","New York City, as the most populous city in th...",ComparativeAbstractQuerySynthesizer
3,"How do the economic aspects of New York City, ...","[New York City, the most populous city in the ...",New York City's economic aspects as a global c...,ComparativeAbstractQuerySynthesizer
4,What are some of the cultural and architectura...,[Geography ==\n\nDuring the Wisconsin glaciati...,Brooklyn is distinct within New York City due ...,SpecificQuerySynthesizer


with a test dataset to test our `QueryEngine` lets now build one and evaluate it.

## Building the `QueryEngine`

To start lets build an `VectorStoreIndex` over the New York Citie's [wikipedia page](https://en.wikipedia.org/wiki/New_York_City) as an example and use ragas to evaluate it. 

Since we already loaded the dataset into `documents` lets use that.

In [9]:
# build query engine
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents)

query_engine = vector_index.as_query_engine()

Lets try an sample question from the generated testset to see if it is working

In [11]:
# convert it to pandas dataset
df = testset.to_pandas()
df["user_input"][0]

'What events led to New York being named after the Duke of York?'

In [12]:
response_vector = query_engine.query(df["user_input"][0])

print(response_vector)

New York was named in honor of the Duke of York because King Charles II appointed the Duke as proprietor of the former territory of New Netherland, which included the city of New Amsterdam, when England seized it from Dutch control.


## Evaluating the `QueryEngine`

Now that we have a `QueryEngine` for the `VectorStoreIndex` we can use the llama_index integration Ragas has to evaluate it. 

In order to run an evaluation with Ragas and LlamaIndex you need 3 things

1. LlamaIndex `QueryEngine`: what we will be evaluating
2. Metrics: Ragas defines a set of metrics that can measure different aspects of the `QueryEngine`. The available metrics and their meaning can be found [here](https://github.com/explodinggradients/ragas/blob/main/docs/metrics.md)
3. Questions: A list of questions that ragas will test the `QueryEngine` against. 

first lets generate the questions. Ideally you should use that you see in production so that the distribution of question with which we evaluate matches the distribution of questions seen in production. This ensures that the scores reflect the performance seen in production but to start off we'll be using a few example question.

Now lets import the metrics we will be using to evaluate

In [13]:
# import metrics
from ragas.metrics import (
    Faithfulness,
    AnswerRelevancy,
    ContextPrecision,
    ContextRecall,
)

# init metrics with evaluator LLM
from ragas.llms import LlamaIndexLLMWrapper
evaluator_llm = LlamaIndexLLMWrapper(OpenAI(model="gpt-4o"))
metrics = [
    Faithfulness(llm=evaluator_llm),
    AnswerRelevancy(llm=evaluator_llm),
    ContextPrecision(llm=evaluator_llm),
    ContextRecall(llm=evaluator_llm),
]

the `evaluate()` function expects a dict of "question" and "ground_truth" for metrics. You can easily convert the `testset` to that format

In [17]:
# convert to Ragas Evaluation Dataset
ragas_dataset = testset.to_evaluation_dataset()
ragas_dataset

EvaluationDataset(samples=[SingleTurnSample(user_input='What events led to New York being named after the Duke of York?', retrieved_contexts=None, reference_contexts=["Etymology ==\n\nIn 1664, New York was named in honor of the Duke of York, who would become King James II of England. James's elder brother, King Charles II, appointed the Duke as proprietor of the former territory of New Netherland, including the city of New Amsterdam, when England seized it from Dutch control.\n\n\n== "], response=None, multi_responses=None, reference='New York was named after the Duke of York in 1664 when King Charles II appointed him as proprietor of the former territory of New Netherland, including the city of New Amsterdam, after England seized it from Dutch control.', rubric=None), SingleTurnSample(user_input='How early European explorers and Native Americans shape New York City?', retrieved_contexts=None, reference_contexts=['History ==\n\n\n=== Early history ===\nIn the pre-Columbian era, the are

Finally lets run the evaluation

In [18]:
from ragas.integrations.llama_index import evaluate

result = evaluate(
    query_engine=query_engine,
    metrics=metrics,
    dataset=ragas_dataset,
)

Running Query Engine:   0%|          | 0/7 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/28 [00:00<?, ?it/s]

In [19]:
# final scores
print(result)

{'faithfulness': 0.9746, 'answer_relevancy': 0.9421, 'context_precision': 0.9286, 'context_recall': 0.6857}


You can convert into a pandas dataframe to run more analysis on it.

In [20]:
result.to_pandas()

Unnamed: 0,user_input,retrieved_contexts,reference_contexts,response,reference,faithfulness,answer_relevancy,context_precision,context_recall
0,What events led to New York being named after ...,[New York City is the headquarters of the glob...,"[Etymology ==\n\nIn 1664, New York was named i...",New York was named in honor of the Duke of Yor...,New York was named after the Duke of York in 1...,1.0,0.950377,1.0,1.0
1,How early European explorers and Native Americ...,[=== Dutch rule ===\n\nA permanent European pr...,[History ==\n\n\n=== Early history ===\nIn the...,Early European explorers established a permane...,Early European explorers and Native Americans ...,1.0,0.8963,1.0,0.8
2,New York City population economy challenges,[=== Wealth and income disparity ===\nNew York...,"[New York City, the most populous city in the ...",New York City has faced challenges related to ...,"New York City, as the most populous city in th...",1.0,0.915717,1.0,0.0
3,"How do the economic aspects of New York City, ...",[=== Wealth and income disparity ===\nNew York...,"[New York City, the most populous city in the ...","The economic aspects of New York City, as a gl...",New York City's economic aspects as a global c...,0.913043,0.929317,1.0,0.0
4,What are some of the cultural and architectura...,[==== Staten Island ====\nStaten Island (Richm...,[Geography ==\n\nDuring the Wisconsin glaciati...,"Brooklyn is known for its cultural diversity, ...",Brooklyn is distinct within New York City due ...,1.0,0.902664,0.5,1.0
5,What measures has New York City implemented to...,[==== International events ====\nIn terms of h...,[Environment ==\n\n \nEnvironmental issues in ...,New York City has implemented various measures...,New York City has implemented several measures...,0.909091,1.0,1.0,1.0
6,What role did New York City play during the Am...,[=== Province of New York and slavery ===\n\nI...,[History ==\n\n\n=== Early history ===\nIn the...,New York City served as a significant military...,"During the American Revolution, New York City ...",1.0,1.0,1.0,1.0
