# Humanloop RAG Evaluation Walkthrough
The goal of this notebook is to demonstrate how to take an existing RAG pipeline and integrate Humanloop in order to:
1. Manage your [Prompts](https://humanloop.com/docs/v5/concepts/prompts) in code but be able to version and run them on Humanloop
2. Setup logging for both your retriever [Tool](https://humanloop.com/docs/v5/concepts/prompts) managed in code and your Prompt managed on Humanloop
3. Create a [Dataset](https://humanloop.com/docs/v5/concepts/prompts) and run Evaluations to benchmark the performance of your RAG pipeline
4. Configure [Evaluators](https://humanloop.com/docs/v5/concepts/evaluators) for monitoring your RAG pipeline in production


## What is Humanloop?
Humanloop is an interactive development environment designed to streamline the entire lifecycle of LLM app development. It serves as a central hub where AI, Product, and Engineering teams can collaborate on Prompt management, Evaluation and Monitoring workflows. 


## What is RAG?
RAG stands for Retrieval Augmented Generation.
- **Retrieval** - Getting the relevant information from a larger data source for a given a query.
- **Augmented** - Using the retrieved information as input to an LLM.
- **Generation** - Generating an output from the model given the input.

In practise, it remains an effective way to exploit LLMs for things like question answering, summarization, and more, where the data source is too large to fit in the context window of the LLM, or where providing the full data source for each query is not cost-effective.


## What are the major challenges with RAG?
Implementing RAG and other similar flows complicates the process of [Prompt Engineering](https://humanloop.com/blog/prompt-engineering-101) because you expand the design space of your application. There are lots of choices you need to make around the retrieval component that can significantly impact the performance of your overall application. For example,
- How do you select the data source?
- How should it be chunked up and indexed?
- What embedding and retrieval model should you use?
- How should you combine the retrieved information with the query?
- What should your system Prompt be? 

The process of versioning, evaluating and monitoring your pipeline therefore needs to consider both the retrieval and generation components. This is where Humanloop can help.


# Example RAG Pipeline

We first need a reference RAG implementation. Our use case will be Q&A over medical docs - leveraging the [MedQA dataset](https://huggingface.co/datasets/bigbio/med_qa) from Hugging Face.

We're going to use [Chroma](https://docs.trychroma.com/getting-started) as a simple local vector DB. You can replace this with your favorite retrieval system.




## Pre-requisites 

In [1]:
!pip install --quiet chromadb
!pip install --quiet openai
!pip install --quiet humanloop==0.8.0b6


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
# Set up dependencies for reference implementation
from dotenv import load_dotenv
import os
from chromadb import chromadb
from openai import OpenAI

load_dotenv()

OPENAI_KEY = ""

# init clients
chroma = chromadb.Client()
openai = OpenAI(api_key=os.getenv("OPENAI_KEY"))

# init collection into which we will add documents
collection = chroma.create_collection(name="MedQA")

# load dataset
# TODO: LOAD MEDQA DATASET HERE

# Add to Chroma - will by default use local vector DB and model all-MiniLM-L6-v2
collection.add(
    documents=[
        "This is a document about pineapple",
        "This is a document about oranges"
    ],
    ids=["id1", "id2"]
)


UniqueConstraintError: Collection MedQA already exists

In [6]:
# Reference RAG pipeline using Chroma and OpenAI

def ask_question(question: str)-> str:
    """Ask a question and get an answer using a simple RAG pipeline"""
    
    # Retrieve relevant documents
    response = collection.query(query_texts=["apple"], n_results=1)
    retrieved_doc = response["documents"][0][0]
    

    # Generate answer
    messages = [
            {"role": "system", "content": f"You are a helpful assistant. Here is some context: {retrieved_doc}."},
            {"role": "user", "content": question},
        ]
    print(messages)
    answer = openai.chat.completions.create(
        model="gpt-4o",
        messages=messages
    ).choices[0].message.content
    return answer

In [7]:
# Test the pipeline

print(ask_question("What is a pineapple?"))

[{'role': 'system', 'content': 'You are a helpful assistant. Here is some context: This is a document about pineapple.'}, {'role': 'user', 'content': 'What is a pineapple?'}]
A pineapple is a tropical fruit with a tough, spiky exterior and sweet, juicy, yellow flesh inside. It is known for its unique appearance, with a globe-like shape covered in rough, diamond-shaped scales and topped with a crown of stiff, green leaves. Pineapples are rich in vitamins, enzymes, and antioxidants, making them a popular fruit for their health benefits. They can be eaten fresh, cooked, juiced, or preserved and are used in a variety of dishes and beverages around the world. The scientific name for the pineapple is *Ananas comosus*, and it belongs to the bromeliad family. Pineapples are also noted for containing bromelain, an enzyme that can aid in digestion.


# Humanloop Integration

The steps to the Humanloop integration are as follows:
....

We demonstrate how you can log to or call any of the core entities on Humanloop 

In [None]:
# Manage your Prompt in code

def ask_question(question: str)-> str:
    """Ask a question and get an answer using a simple RAG pipeline"""
    # Retrieve relevant documents
    response = collection.query(query_texts=["question"], n_results=1)
    retrieved_doc = response["documents"][0][0]
    
    # Generate answer using Prompt managed on Humanloop
    messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": question},
            {"role": "assistant", "content": retrieved_doc}
        ]
    answer = hl.prompt.call(
        path="faq-bot/rag-prompt",
        model="gpt-3.5-turbo",
        prompt=messages,
        messages=messages,
        temperature=0.5
        
    )
    return answer

In [None]:
# Manage your Prompt on Humanloop

def ask_question(question: str)-> str:
    """Ask a question and get an answer using a simple RAG pipeline"""
    # Retrieve relevant documents
    response = collection.query(query_texts=["question"], n_results=1)
    retrieved_doc = response["documents"][0][0]
    
    # Generate answer using Prompt managed on Humanloop
    messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": question},
            {"role": "assistant", "content": retrieved_doc}
        ]
    answer = hl.prompt.call(
        path="faq-bot/rag-prompt",
        messages=messages,
    )
    return answer

In [None]:
# Log your tool calls too

def ask_question(question: str)-> str:
    """Ask a question and get an answer using a simple RAG pipeline"""
    # Retrieve relevant documents
    response = collection.query(query_texts=["question"], n_results=1)
    retrieved_doc = response["documents"][0][0]
    
    # log tool to Humanloop 
    hl.tool.log(
        path="faq-bot/rag-retriever",
        query=question,
        retrieved_doc=retrieved_doc
    )
    
    # Generate answer using Prompt managed on Humanloop
    messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": question},
            {"role": "assistant", "content": retrieved_doc}
        ]
    answer = hl.prompt.call(
        path="faq-bot/rag-prompt",
        messages=messages,
    )
    return answer

# Setting up Evaluations


## Creating a dataset
- From your existing logs

- using the SDK 