# Building and Evaluating a RAG Pipeline with LlamaCloud and our Batch Evaluator

In this notebook we show you how to easily construct a RAG pipeline with a LlamaCloud Index, and then run evaluations against that index using our batch evaluator.

## Setup

Here we define some basic imports.

In [1]:
# attach to the same event-loop
import nest_asyncio

nest_asyncio.apply()

In [2]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Response
from llama_index.llms.openai import OpenAI
from llama_index.core.evaluation import (
    FaithfulnessEvaluator,
    RelevancyEvaluator,
    CorrectnessEvaluator,
)
from llama_index.core.node_parser import SentenceSplitter
import pandas as pd

pd.set_option("display.max_colwidth", 0)

## Build RAG Pipeline from LlamaCloud Index

The LlamaCloud index is built over the 2021 Lyft and Uber 10K documents.

To create the index, follow the instructions:
1. You can download them here ([Uber 10K](https://www.dropbox.com/s/te0a2w227v27iag/uber_2021.pdf?dl=1), [Lyft 10K](https://www.dropbox.com/s/qctkz6nxhm0y5qe/lyft_2021.pdf?dl=1))
2. Follow instructions on `https://cloud.llamaindex.ai/` to signup for an account. Create a pipeline by uploading these documents.

In [None]:
! pip install llama-index-indices-llama-cloud

In [None]:
import os

os.environ["LLAMA_CLOUD_API_KEY"] = "llx-"

In [3]:
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex

index = LlamaCloudIndex(
  name="<index_name>", 
  project_name="<project_name>",
  api_key=os.environ["LLAMA_CLOUD_API_KEY"]
)

In [5]:
query = "Tell me about the risk factors for Uber"

In [6]:
response = index.as_query_engine().query(query)

In [7]:
print(str(response))

Airbyte is a platform that allows users to sync their data from various sources into a destination database, such as Snowflake. It provides functionalities for data ingestion and transformation, enabling users to easily move and work with data from different platforms.


## Setup Batch Evaluator

Here we setup a batch evaluator, which can run evaluations over a batch dataset. We start by defining the set of metrics that we want to measure over this dataset.

In [4]:
# gpt-4
gpt4 = OpenAI(temperature=0, model="gpt-4")
gpt35 = OpenAI(model="gpt-3.5-turbo")

faithfulness_gpt4 = FaithfulnessEvaluator(llm=gpt4)
relevancy_gpt4 = RelevancyEvaluator(llm=gpt4)
correctness_gpt4 = CorrectnessEvaluator(llm=gpt4)

In [None]:
print(response.source_nodes[2].get_content())

In [9]:
from llama_index.core.evaluation import BatchEvalRunner

runner = BatchEvalRunner(
    {"faithfulness": faithfulness_gpt4, "relevancy": relevancy_gpt4},
    workers=8,
)

eval_results = await runner.aevaluate_queries(
    index.as_query_engine(), queries=[query]
)

In [None]:
eval_results

## Upload Results

Once you obtain a set of eval results, you're able to upload them.

In [None]:
runner.upload_eval_results(
    project_name="default project",
    app_name="default app",
    results=eval_results
)