# LLM RAG Example

LLM RAG example using GPT-3.5 and ElasticSearch as the Vector DB

First thing first, we will instantiate connection to OpenAI GPT model

In [1]:
import os
from langchain.chat_models import ChatOpenAI

OPENAI_API_KEY = "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"
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

chat = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
)

  warn_deprecated(


Then, we will try to use the model to answer some basic question

In [2]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage("You are a helpful assistant."),
    HumanMessage("Hi AI, how are you today?"),
    AIMessage(content="I am great. Thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand thermodinamics theory.")
]

In [3]:
res = chat(messages)

  warn_deprecated(


In [4]:
res

AIMessage(content="Thermodynamics is the branch of physics that deals with the relationships between heat, work, and energy. It studies the behavior of systems in response to changes in temperature, pressure, and volume. The laws of thermodynamics govern these relationships and provide a framework for understanding how energy is transferred and transformed within a system.\n\nThere are four laws of thermodynamics, but the first and second laws are the most fundamental:\n\n1. The first law of thermodynamics states that energy cannot be created or destroyed, only transferred or converted from one form to another. This law is often summarized as the conservation of energy.\n\n2. The second law of thermodynamics states that the total entropy of an isolated system can never decrease over time. Entropy is a measure of the disorder or randomness in a system, and this law implies that natural processes tend to increase the overall disorder of a system.\n\nThermodynamics is a broad and complex 

In [5]:
print(res.content)

Thermodynamics is the branch of physics that deals with the relationships between heat, work, and energy. It studies the behavior of systems in response to changes in temperature, pressure, and volume. The laws of thermodynamics govern these relationships and provide a framework for understanding how energy is transferred and transformed within a system.

There are four laws of thermodynamics, but the first and second laws are the most fundamental:

1. The first law of thermodynamics states that energy cannot be created or destroyed, only transferred or converted from one form to another. This law is often summarized as the conservation of energy.

2. The second law of thermodynamics states that the total entropy of an isolated system can never decrease over time. Entropy is a measure of the disorder or randomness in a system, and this law implies that natural processes tend to increase the overall disorder of a system.

Thermodynamics is a broad and complex field with applications in 

In [6]:
# Add previous query's answer as the context to the conversation
messages.append(res)

prompt = HumanMessage(
    content='Please tell me any specific aspect of thermodynamics that exist'
)

messages.append(prompt)
res = chat(messages)
print(res.content)

One specific aspect of thermodynamics is the concept of heat transfer. Heat transfer is the process by which thermal energy is exchanged between different systems or objects. There are three main mechanisms of heat transfer:

1. Conduction: Heat transfer through a material without any movement of the material itself. This process occurs when two objects at different temperatures are in direct contact with each other.

2. Convection: Heat transfer through the movement of fluids (liquids or gases). Convection can be natural (due to density differences causing fluid movement) or forced (induced by an external force like a fan).

3. Radiation: Heat transfer through electromagnetic waves, such as infrared radiation. Radiation does not require a medium for heat transfer and can occur in a vacuum.

Understanding heat transfer is crucial in designing efficient heating and cooling systems, analyzing thermal performance in engineering applications, and studying the behavior of materials under di

## RAG

After we succesfully use the OpenAI GPT Model to search a query. We will continue to use RAG to the GPT model.

In [7]:
from pinecone import Pinecone
from sentence_transformers import SentenceTransformer

MODEL = SentenceTransformer('all-mpnet-base-v2')

  from tqdm.autonotebook import tqdm


In [8]:
from typing import List

def transform_query_to_vector(text: str) -> List:
    return MODEL.encode(text)

Initialize pinecone client

In [9]:
PINECONE_API_KEY = "c0757e37-bcc9-4199-b6d8-720252fecf45"
INDEX_NAME = "learn-rag"

pc = Pinecone(api_key=PINECONE_API_KEY)
index = pc.Index(INDEX_NAME)

Next, we will test a query to the pincone database.

In [20]:
import numpy as np

def search_query(query: str) -> dict:
    query_vector = transform_query_to_vector(query)
    query_vector = np.array(query_vector).tolist()

    response = index.query(
        vector=query_vector,
        top_k=4,
        # include_values=True,
        include_metadata=True
    )

    return response.to_dict()

search_query("Messi dribbling ability")

{'matches': [{'id': 'a3c56cc259855f053c4afb01d005f5b2-318',
   'score': 0.753761351,
   'values': [],
   'metadata': {'content': 'Messi\'s pace and technical ability enable him to undertake individual dribbling runs towards goal, in particular during counterattacks, usually starting from the halfway line or the right side of the pitch.[544][552][556] Widely considered to be the best dribbler in the world,[557] and one of the greatest dribblers of all time,[558] with regard to this ability, his former Argentina manager Diego Maradona has said of him, "The ball stays glued to his foot; I\'ve seen great players in my career, but I\'ve never seen anyone with Messi\'s ball control."[547] Beyond his individual qualities, he is also a well- rounded, hard-working team player, known for his creative combinations, in and Andres particular with Iniesta.[539][540]former Barcelona midfielders Xavi\n|    | 0                                                                                             

We successfully queried a text to the Pinecone Vector DB. Now let's do the RAG part.

First, we will create a prompt template to be used to for the LLM.

In [11]:
PROMPT_TEMPLATE = """
<|system|>
</s>
<|user|>
You are a chatbot that helps answering user questions about 'Footballer'.
You must refer to the user as 'You'.

Below are the context that can be used to answer the questions.

CONTEXT:
{context}

INSTRUCTIONS:
Your task is to answer the Human question based on the context in English language.
The context is the only source of truth.

You can only use information, Title, and Source that are explicitly present in the context.
You MUST NOT create or use information, Title, and Source that are not explicitly present in the context.

First, determine whether there are relevant information that are explicitly stated in the context:

> Scenario 1
If you find context that are relevant:
- Answer the question only using information explicitly stated in the context.
- Do not derive anything that is not explicitly stated in the context.
- Answer in a valid markdown format. This means you must add double spaces after every new line tokens.

> Scenario 2
If you don't find any relevant context:
- State politely that you can not find the answer.
- Recommend the user to ask about `Footballer` instead.

QUESTION:
{question}

Do not use, add, or assume information that is not explicitly stated in the CONTEXT.
Do not need to give additional information other than what is asked.
I'd prefer to not get an answer than to get information that is not explicitly in the context.

ANSWER:</s>
<|assistant|>
"""

In [21]:
def build_prompt(question: str) -> str:

    query_response = search_query(query=question)
    context_string = "Title: {title}\nSource:{source}\nContent:{content}"
    contexts = [context_string.format(title=elem['metadata']['title'], source=elem['metadata']['source'], content=elem['metadata']['content']) for elem in query_response['matches']]

    return PROMPT_TEMPLATE.format(context="\n\n".join(contexts), question=question)

In [22]:
build_prompt("messi is a super dribbler")

'\n<|system|>\n</s>\n<|user|>\nYou are a chatbot that helps answering user questions about \'Footballer\'.\nYou must refer to the user as \'You\'.\n\nBelow are the context that can be used to answer the questions.\n\nCONTEXT:\nTitle: Lionel Messi\nSource:Lionel Messi - Wikipedia.pdf\nContent:Messi\'s pace and technical ability enable him to undertake individual dribbling runs towards goal, in particular during counterattacks, usually starting from the halfway line or the right side of the pitch.[544][552][556] Widely considered to be the best dribbler in the world,[557] and one of the greatest dribblers of all time,[558] with regard to this ability, his former Argentina manager Diego Maradona has said of him, "The ball stays glued to his foot; I\'ve seen great players in my career, but I\'ve never seen anyone with Messi\'s ball control."[547] Beyond his individual qualities, he is also a well- rounded, hard-working team player, known for his creative combinations, in and Andres particu

Now, lets use the prompt to the GPT model.

In [36]:
from langchain_openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo-instruct")

In [43]:
question = "tell me about messi"
prompt = build_prompt(question)

for chunk in llm.stream(prompt):
    print(chunk, end="", flush=True)

Messi is a professional footballer from Argentina who plays for Paris Saint-Germain. He is known for his agility, balance, and ability to evade tackles and dribble at speed. He is also a prolific goalscorer and a creative playmaker, and has won numerous awards and titles throughout his career. He has also represented his country at both the youth and senior levels.