# Notebook : Why Rag

## About

In this notebook, we will motivate Retrieval Augmented Generation (RAG) .

We will see why we can't use Large Language Model directly.



## Imports

In [1]:
from dotenv import load_dotenv
import rich
import logging
from llama_index.core.llms import ChatMessage
from llama_index.llms.openai import OpenAI
from huggingface_hub import InferenceClient
import os

In [2]:
#logging.basicConfig(level=logging.DEBUG)


In [3]:
dotenv_path = "../env"
if os.path.isfile("env"):
    dotenv_path = "env"
load_dotenv(dotenv_path=dotenv_path)

True

In [4]:
llm = OpenAI(model="gpt-3.5-turbo",temperature=0)


## Problems

There are atleast two problems with using Large Language Model 

- Knoweldge Cutoff
- Hallucination

### Knoweldge Cutoff

In [5]:
llm = OpenAI(model="gpt-4o-mini",temperature=0)


In [6]:
def get_response(query:str):
    messages = [
        ChatMessage(role="user", content=query),
    ]
    
    resp = llm.chat(messages)

    return resp

In [7]:
query="what is different about Llama3.2 than Llama2 ?"


In [8]:
response = get_response(query)

In [9]:
rich.print(response.message.content)

In [10]:
rich.print(response)

note the model is aliased to `model='gpt-4o-mini-2024-07-18'`.

So, the model's training data wont have any information after then.

### Hallucination

![Snapshot of Hallucination](../images/llama_2__hallucination.png)


In the above video, we ask about non existant Llama model.

We also give it a link to a study music youtube video.

Note: we are also using an "older" version of Llama model to make the inference

In [11]:
client = InferenceClient(api_key=os.environ['HF_API_KEY'])

In [12]:
message = f"""
What did Andrej Karpathy say about Meta's Llama 5.9 in the below youtube talk

https://www.youtube.com/watch?v=n61ULEU7CO0&ab_channel=LofiGirl

"""

In [13]:
messages = [
    { "role": "user", "content": message },
]

response = client.chat.completions.create(
    model="meta-llama/Llama-2-7b-chat-hf", 
    messages=messages, 
    temperature=0,
)

In [14]:
rich.print(response)

In [15]:
rich.print(response.choices[0].message.content)

## Notes

We explored at least two problems with directly using LLMs directly. 

- Knowledge Cuttoff  
- Hallucination.

