# Building a Trustworthy RAG Chatbot using 🔭 Galileo Observe

In this tutorial, we'll build a RAG based chatbot and monitor the results in Galileo Observe.

This notebook pulls data from the web for its datasource and uses Open AI for LLM. Feel Free to change these sources as you'd like

## 1. Set-up of the environment

Let's start by installing the required libraries.

In [None]:
! pip install galileo-observe langchain langchain-community langchain_openai faiss-cpu openai ipywidgets

## 2. Set-up Galileo Clients

Next we will setup Galileo Observe client. You will need to enter 2 things - 
 - GALILEO API KEY: This is the API key used to connect to the client. You can fetch this from the console
 - OPENAI API KEY: For this notebook we are using Open AI so enter your Open AI Key here. If you are using some other model, you can skip this
 - Project Name - Define a name for the project

In [None]:
import os
from galileo_observe import ObserveWorkflows

os.environ["GALILEO_CONSOLE_URL"] = "https://console.dev.rungalileo.io"
os.environ["GALILEO_API_KEY"] = "" # Enter Galileo key here
os.environ["OPENAI_API_KEY"] = "" # Enter Open AI Key here
observe_logger = ObserveWorkflows(project_name="observe-rag-chatbot")

## 3. Loading and Preparing Data

For this lab we will use a fictuous use case where we want to build a chatbot to answer questions about Galileo. A typical technique to build such a chatbot is Retrieval-Augmented Generation (RAG).

Now in order to build the chatbot, we will first fetch some documents from Galileo's website, then create some questions, and then ask the chatbot those questions and check how are the responses based off the documents, with the help of Galileo Observe

In our case let's start by downloading some documents for Galileo from the website.

In [None]:
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import FAISS

# Load data from a website URL
from langchain_community.document_loaders import WebBaseLoader

urls = [
    "https://docs.rungalileo.io/galileo",
    "https://docs.rungalileo.io/galileo/gen-ai-studio-products/galileo-guardrail-metrics/context-adherence",
    "https://docs.rungalileo.io/galileo/gen-ai-studio-products/galileo-guardrail-metrics/context-relevance"
]
loader = WebBaseLoader(urls)
documents = loader.load()

Now that the context data in the form of the documents has been downloaded we will now split them into smaller text chunks using the Langchain library. The CharacterTextSplitter divides the text into chunks of a specified size while allowing for overlap to prevent cutting sentences in half. When setting the chunk size, make sure it fits into the context window of your LLM and feel free to experiment with different chunk sizes.

In [None]:
# Split the text into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

Let's have a look at the size of our data

In [None]:
# Print metadata of the loaded documents
avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)
avg_char_count_pre = avg_doc_length(documents)
avg_char_count_post = avg_doc_length(texts)
print(f'Average length among {len(documents)} pages loaded is {avg_char_count_pre} characters.')
print(f'After the split you have {len(texts)}')
print(f'Average length among {len(texts)} chunks is {avg_char_count_post} characters.')

Next we convert our chunks into embeddings and store them in a vector database. This is a common technique used in RAG where instead of always passing all the documents to the LLM as context, we will pull the chunks we feel are most relevant to a given question and only pass those to the LLM. This is achieved by doing a semantic similarity search within the vector DB between the question embeddings and the chunk embeddings. Passing concise information to the LLM helps improve its accuracy

In [None]:
# Initialize OpenAI embeddings
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

# Create a vector store
vectorstore = FAISS.from_documents(texts, embeddings)

## 4. Run Inference with Galileo Observe

In [None]:
# Create a function to generate a response using Open AI

from openai import OpenAI
client = OpenAI()

def generate_response(prompt: str, history: list = [], model_name: str = "gpt-4o-mini"):
    
    response = client.chat.completions.create(
        model=model_name,
        messages=history + [{"role": "user", "content": prompt}],
        max_tokens=512,
        temperature=1,
        top_p=1
    )
    
    response_text = response.choices[0].message.content
    input_tokens = response.usage.prompt_tokens
    output_tokens = response.usage.completion_tokens
    total_tokens = response.usage.total_tokens
    
    return response_text, input_tokens, output_tokens, total_tokens

If you want to type in your own questions on the fly for the LLM for a richer chatbot experience, set `USE_PREDEFINED_QUESTIONS` to False. Otherwise the model will run on these pre-defined questions below

In [None]:
USE_PREDEFINED_QUESTIONS = True

questions = [
    "What does Galileo do?",
    "What are some of the RAG Metrics Galileo provides?",
    "What is LUNA and where is it used?",
    "How does LUNA calculate context adherence?",
    "What is chainpoll?",
]


Here we define the model we want to use, and the system prompt for the LLM

In [None]:
MODEL_ID = "gpt-4o-mini"
history = [
        {"role": "system", "content": "You are a helpful assistant."}
    ]

Now let's run the actual inference and log the information to Galileo! If you want to run the LLM chat longer, set the `max_rounds` variable accordingly

In [None]:
rounds = 0
max_rounds = 5

while rounds < max_rounds:
    if USE_PREDEFINED_QUESTIONS:
        question = questions[rounds]
    else:
        import speech_recognition as sr
        recognizer = sr.Recognizer()
        with sr.Microphone() as source:
            print("Listening... (speak 'exit' to end)")
            audio = recognizer.listen(source)
            try:
                question = recognizer.recognize_google(audio)
                print("You said:", question)
            except sr.UnknownValueError:
                print("Could not understand audio")
                question = ""
            except sr.RequestError:
                print("Could not request results")
                question = ""
    
    if question.lower() == 'exit':
        break
    # Retrieve relevant documents from the vector store
    relevant_docs = vectorstore.similarity_search(question, k=3)
    context_list = [doc.page_content for doc in relevant_docs]
    context = " ".join(context_list)
    prompt = f"""Context: {context}

    Question: {question}

    Answer: """

    # Create your workflow to log to Galileo.
    wf = observe_logger.add_workflow(input={"question": question}, name="Chatbot", metadata={"env": "demo"})
    wf.add_retriever(
        input=question,
        documents=context_list,
        metadata={"env": "demo"},
        name="Chatbot",
    )
    
    # Generate the response with the updated history
    model_response, input_tokens, output_tokens, total_tokens = generate_response(prompt, model_name=MODEL_ID, history=history)
    
    # Add the current question to the history
    history.append({"role": "user", "content": question})
    # Update history with the new interaction
    history.append({"role": "assistant", "content": model_response})

    print("You: ", question)
    print(f"Assistant: {model_response}")
    print("*" * 100)


    # Log your llm call step to Galileo.
    wf.add_llm(
        input=prompt,
        output=model_response,
        model=MODEL_ID,
        input_tokens=input_tokens,
        output_tokens=output_tokens,
        total_tokens=total_tokens,
        metadata={"env": "demo"},
        name="Chatbot",
    )

    # Conclude the workflow.
    wf.conclude(output={"output": model_response})
    rounds += 1

As a last step, we shall upload all the gathered information to Galileo

In [None]:
# Log the workflow to Galileo.
logged_workflows = observe_logger.upload_workflows()

You can have a look at the final results in the console via the link generated from the project

## Conclusion

Throughout this notebook, we have explored the process of creating and evaluation a chatbot for a QA-RAG application using GPT 4o mini via Open AI, Python, and Langchain. We covered essential steps, including setting up the environment, loading and preparing context data, extracting relevant context, answer generation, and logging to Galileo.