
## Building applications and evaluating their performance

Now lets jump to our application. The purpose of this part is to give you an overview of everything you need to do to get an chat-application working.

The folder chat_solution contains the app. Lets try to initialize it and try it out.

For our chat we will use data available in the daata folder.
Our first step is to load this data inside the LLM.



In [1]:
from chat_solution.create_db import create_db

db = create_db()
print(db.retrieve("what is a llm?"))

Loading environment variables from /Users/jean.machado@getyourguide.com/prj/rag-workshop/.env


  from tqdm.autonotebook import tqdm, trange


Loading database from /tmp/embedding_db.pkl
Created 6 chunks of size 700 with overlap 200
Database saved to /tmp/embedding_db.pkl
Database saved successfully
['What are LLMs?\n\nLarge Language Models (LLMs) are trained on massive datasets of text to predict and generate language based on given prompts, learning patterns, structures, and relationships in text to produce human-like responses.\n\nHow do they work?\n\nWhat are the most known LLMs\nGpt-X series developed by Open-AI, they are proprietary and very powerful\nMistral Series: developed by Mistral AI, built by an eu company\nLLamma series: developed by Meta\n\nClosed source vs Open source LLMs\n\nClosed-source LLMs are proprietary, with code and models kept private, while open-source LLMs allow public access to the model architecture and often the training data, enabling more transparency and community-dr', 'What are LLMs?\n\nLarge Language Models (LLMs) are trained on massive datasets of text to predict and generate language bas

## Creating our rag script


In [3]:
# User input and response handling
from chat_solution.rag import LearningAssistant

query2 = "what is an hallucination?"
rag = LearningAssistant()  
response = rag.query(query2)

Loading database from /tmp/embedding_db.pkl
Quis assistant initialized
[]
 You are a helpful AI knowledge quiz chat assistant. Your goal is to test the knowledge of the users on the given topic."
The user gives you a topic or a question and you should generate a new relevant quiz based on the context.
- Generate a new relevant quiz question based on the context and the topic provided. The topic you select on the context does not need to be exactly the same, but should be related to the topic.
- Your primary audience are students learning about AI. Do not use technical jargon that is not common knowledge or that you dont explain first.
- The question should be relevant to the given topic and the answer should be found within the given context. If not say: "You did not equip me with the knowledge to answer this question."
- Provide 4 answer choices for the question, one of which should be correct and the other three should be incorrect but plausible. Answer choices should be formulated c


## Running our chat application

In [13]:
import os

os.system("pkill -f stremalit ")
os.system("streamlit run ../chat_solution/start_streamlit.py &")

0


  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8503
  Network URL: http://192.168.178.64:8503

Loading environment variables from /Users/jean.machado@getyourguide.com/prj/rag-workshop/.env
Loading database from /tmp/embedding_db.pkl
Quis assistant initialized


2024-11-22 15:40:03.762 Examining the path of torch.classes raised: Tried to instantiate class '__path__._path', but it does not exist! Ensure that it is registered via torch::class_


[]
 You are a helpful AI knowledge quiz chat assistant. Your goal is to test the knowledge of the users on the given topic."
The user gives you a topic or a question and you should generate a new relevant quiz based on the context.
- Generate a new relevant quiz question based on the context and the topic provided. The topic you select on the context does not need to be exactly the same, but should be related to the topic.
- Your primary audience are students learning about AI. Do not use technical jargon that is not common knowledge or that you dont explain first.
- The question should be relevant to the given topic and the answer should be found within the given context. If not say: "You did not equip me with the knowledge to answer this question."
- Provide 4 answer choices for the question, one of which should be correct and the other three should be incorrect but plausible. Answer choices should be formulated clearly and concisely.
- Mark the index of the correct answer in the ans


## Evaluation

In [7]:
from ragas import EvaluationDataset

data = [
    {'user_input': 'what is a llm?',
     'reference': 'A large language model is a type of artificial intelligence model that is designed to understand and generate human language. These models are trained on vast amounts of text data and are capable of generating coherent and contextually appropriate responses to a wide range of questions.',
     },
     {'user_input': 'role models in the area of artificial intelligence?',
      'reference': 'Who are some role models in the area of artificial intelligence?',
     },
     {'user_input': "hallucination",
      'reference': "An hallucination is a false perception or sensation that occurs in the absence of a corresponding external stimulus. It is often associated with mental health conditions such as schizophrenia or hallucinogenic substances.",
      }
]

# augment data with the llm response

for i, d in enumerate(data):
    response = rag.query(d['user_input'])
    data[i]['response'] = response


dataset = EvaluationDataset.from_list(data)

[('what is an hallucination?', 'Question: What is a hallucination in the context of AI?\n1. A hallucination is a type of error that occurs when an AI model provides incorrect or irrelevant responses.\n2. Hallucinations occur when an AI model generates information that is not supported by its training data. (CORRECT)\n3. Hallucinations are caused by overfitting, where the AI model becomes too specialized to the training data.\n4. Hallucinations happen only in image-generating AI models, not in language models.'), ('what is a llm?', 'Question: What is a Large Language Model (LLM)?\n1. An LLM is a type of artificial intelligence that can understand and generate human language. (CORRECT)\n2. LLMs are primarily used for image processing and analysis.\n3. LLMs are designed to predict and generate mathematical equations.\n4. LLMs are used exclusively in robotics for motion control.'), ('how are some role models people the area of artificial intelligence?', 'Question: Who is a notable role mod

In [9]:
from ragas.metrics import FactualCorrectness
from ragas import evaluate

factual_correctness = FactualCorrectness()
eval_results = evaluate(
        dataset=dataset,
        metrics=[
                factual_correctness
        ],
       raise_exceptions=False 
)



Evaluating: 100%|██████████| 3/3 [00:16<00:00,  5.60s/it]


Unnamed: 0,user_input,response,reference,factual_correctness
0,what is a llm?,Question: What is a Large Language Model (LLM)...,A large language model is a type of artificial...,0.8
1,role models in the area of artificial intellig...,Question: Who is a notable role model in the f...,Who are some role models in the area of artifi...,0.0
2,what is an hallucination?,Question: What is a hallucination in the conte...,An hallucination is a false perception or sens...,0.0


In [20]:
display.width = 1000
evaluation_result_df = eval_results.to_pandas()
evaluation_result_df.iloc[:5]


Unnamed: 0,user_input,response,reference,factual_correctness
0,what is a llm?,Question: What is a Large Language Model (LLM)...,A large language model is a type of artificial...,0.8
1,role models in the area of artificial intellig...,Question: Who is a notable role model in the f...,Who are some role models in the area of artifi...,0.0
2,what is an hallucination?,Question: What is a hallucination in the conte...,An hallucination is a false perception or sens...,0.0
