# Understanding Memory in LLMs

In the previous Notebook 03, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. [Bing Chat](http://chat.bing.com/) is a search engine with a GPT-4 model that utilizes the content of search results to provide context and deliver accurate responses to queries.

However, we have yet to discover how to engage in a conversation with the LLM. With Bing Chat, this is possible, as the LLM can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

The aim of this Notebook is to demonstrate how we can "provide memory" to the LLM by utilizing prompts and context.

In [1]:
import os
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferMemory
from openai.error import OpenAIError
from langchain.docstore.document import Document

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from app.utils import (
    get_search_results,
    order_search_results,
    model_tokens_limit,
    num_tokens_from_docs,
    embed_docs,
    search_docs,
    get_answer,
)

from app.prompts import COMBINE_QUESTION_PROMPT, COMBINE_PROMPT, COMBINE_CHAT_PROMPT

# Don't mess with this unless you really know what you are doing
AZURE_SEARCH_API_VERSION = '2021-04-30-Preview'
AZURE_OPENAI_API_VERSION = "2023-03-15-preview"

# Change these below with your own services credentials
AZURE_SEARCH_ENDPOINT = "ENTER YOUR VALUE"
AZURE_SEARCH_KEY = "ENTER YOUR VALUE"  
AZURE_OPENAI_ENDPOINT= "ENTER YOUR VALUE"
AZURE_OPENAI_API_KEY = "ENTER YOUR VALUE"

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_BASE"] = os.environ["AZURE_OPENAI_ENDPOINT"] = AZURE_OPENAI_ENDPOINT
os.environ["OPENAI_API_KEY"] = os.environ["AZURE_OPENAI_API_KEY"] = AZURE_OPENAI_API_KEY
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"] = AZURE_OPENAI_API_VERSION
os.environ["AZURE_SEARCH_KEY"] = AZURE_SEARCH_KEY
os.environ["AZURE_SEARCH_ENDPOINT"] = AZURE_SEARCH_ENDPOINT
os.environ["OPENAI_API_TYPE"] = "azure"

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning?"
FOLLOW_UP_QUESTION = "Can you rephrase what you just said?"

In [4]:
# Define model
MODEL = "gpt-35-turbo"
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5)

In [5]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [6]:
# Let's see what the GPT model responds
response = chain.run(QUESTION)
printmd(response)

1. Robotics: Reinforcement learning can be used to train robots to perform specific tasks, such as grasping objects or navigating through an environment.

2. Gaming: Reinforcement learning has been used to train game-playing agents to learn optimal strategies and make decisions based on rewards.

3. Autonomous Vehicles: Reinforcement learning can be used to train autonomous vehicles to make decisions based on real-time sensor data and navigate through complex environments.

4. Personalized Recommendations: Reinforcement learning can be used to personalize recommendations for users based on their past behavior and preferences.

5. Healthcare: Reinforcement learning can be used to optimize treatment plans for patients based on their medical history and response to previous treatments.

6. Finance: Reinforcement learning can be used to optimize investment portfolios and make trading decisions based on market trends and historical data.

7. Advertising: Reinforcement learning can be used to optimize ad placement and targeting to maximize engagement and conversions.

8. Supply Chain Management: Reinforcement learning can be used to optimize supply chain operations, such as inventory management and logistics, to minimize costs and improve efficiency.

In [7]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

"As an AI language model, I didn't say anything previously. Could you please provide me with the context or the sentence you want me to rephrase?"

As you can see, it doesn't remember what it just responded. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: {question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [10]:
chain.run({"history":Conversation_history, "question": FOLLOW_UP_QUESTION})

'Sure, here are some examples of how reinforcement learning can be applied:\n\n1. Game playing: Agents can be trained to play games like chess, Go, and poker.\n2. Robotics: Robots can be trained to perform complex tasks like grasping objects, walking, and navigating through environments.\n3. Autonomous driving: Self-driving cars can make decisions based on real-time data from sensors and cameras.\n4. Recommender systems: Algorithms can suggest products, movies, or music based on user feedback.\n5. Healthcare: Personalized treatment plans can be developed for patients based on their medical history and current condition.\n6. Finance: Trading algorithms can make investment decisions based on market trends and historical data.\n7. Advertising: Ad placement and targeting can be optimized to maximize revenue for advertisers.\n8. Energy management: Energy consumption in buildings can be optimized to reduce costs.\n9. Manufacturing: Production processes can be optimized to reduce waste.\n10. 

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In order to not duplicate code, we have put many of the code used in Notebook 3 into functions. These functions are in the app/utils.py and app/prompts.py files This way we can use these functios in the app that we will build later.

In [11]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
indexes = [index1_name, index2_name]

agg_search_results = get_search_results(QUESTION, indexes)
ordered_results = order_search_results(agg_search_results, reranker_threshold=1)

In [12]:
docs = []
for key,value in ordered_results.items():
    for page in value["chunks"]:
        docs.append(Document(page_content=page, metadata={"source": value["location"]}))

# Calculate number of tokens of our docs
tokens_limit = model_tokens_limit(MODEL)

if(len(docs)>0):
    num_tokens = num_tokens_from_docs(docs)
    print("Custom token limit for", MODEL, ":", tokens_limit)
    print("Combined docs tokens count:",num_tokens)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")


Custom token limit for gpt-35-turbo : 3000
Combined docs tokens count: 80428


In [13]:
%%time
if num_tokens > tokens_limit:
    index = embed_docs(docs)
    top_docs = search_docs(index,QUESTION)
    
    # Now we need to recalculate the tokens count of the top results from similarity vector search
    # in order to select the chain type: stuff or map_reduce
    
    num_tokens = num_tokens_from_docs(top_docs)   
    print("Token count after similarity search:", num_tokens)
    chain_type = "map_reduce" if num_tokens > tokens_limit else "stuff"
    
else:
    # if total tokens is less than our limit, we don't need to vectorize and do similarity search
    top_docs = docs
    chain_type = "stuff"
    
print("Chain Type selected:", chain_type)

Number of chunks: 70


  from .autonotebook import tqdm as notebook_tqdm


Token count after similarity search: 4693
Chain Type selected: map_reduce
CPU times: user 19.4 s, sys: 1.86 s, total: 21.2 s
Wall time: 9.77 s


In [14]:
# Get the answer
response = get_answer(docs=top_docs, query=QUESTION, language="English", deployment=MODEL, chain_type=chain_type)
response['output_text']

'There are a variety of small problems where reinforcement-learning techniques work effectively. However, there are no specific use cases for reinforcement learning mentioned in the given portion of the document.\nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf'

And if we ask the follow up question:

In [15]:
response = get_answer(docs=top_docs,  query=FOLLOW_UP_QUESTION, language="English",deployment=MODEL, chain_type=chain_type)
response['output_text']

'For the first three contents, there is no question being asked. For the last content, no question was provided.\nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0604/0604010v1.pdf'

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/en/latest/modules/memory/examples/adding_memory_chain_multiple_inputs.html

In [19]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(docs=top_docs, query=QUESTION, language="English", deployment=MODEL, chain_type=chain_type, 
                        memory=memory)
response['output_text']

'Reinforcement learning has a variety of use cases, including training animals, shaping hierarchical reinforcement-learning systems, rewarding agents for taking steps up a gradient, using imitation learning, decomposing a huge learning problem into smaller ones, and programming reflexes to keep agents alive and pointed in the right direction while they learn. Theoretical foundations and practical applications of the field can be found in the document at https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf. The exploration-exploitation trade-off in reinforcement learning is discussed in the article at https://demodatasetsp.blob.core.windows.net/arxivcs/0604/0604010v1.pdf, which provides optimal decision thresholds for both infinite horizon discounted reward and finite horizon undiscounted reward cases. However, none of the sources provide a comprehensive list of use cases for reinforcement learning.'

In [20]:
# Now we add a follow up question:
response = get_answer(docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", deployment=MODEL, chain_type=chain_type, 
                      memory=memory)
response['output_text']

'I apologize for the confusion. I previously mentioned that reinforcement learning has a variety of use cases, including training animals, shaping hierarchical reinforcement-learning systems, rewarding agents for taking steps up a gradient, using imitation learning, decomposing a huge learning problem into smaller ones, and programming reflexes to keep agents alive and pointed in the right direction while they learn. However, I cannot provide a comprehensive list of use cases for reinforcement learning. If you have any specific questions or topics you would like me to cover, please let me know. You can find more information on reinforcement learning in the document at https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf. Additionally, the article at https://demodatasetsp.blob.core.windows.net/arxivcs/0604/0604010v1.pdf discusses the exploration-exploitation trade-off in reinforcement learning and provides optimal decision thresholds for both infinite horizon discounte

In [21]:
# Another follow up query
response = get_answer(docs=top_docs, query="Thank you", language="English", deployment=MODEL, chain_type=chain_type,  
                      memory=memory)
response['output_text']

"You're welcome! If you have any more questions, feel free to ask. You can find more information on reinforcement learning in the document at https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf. Additionally, the article at https://demodatasetsp.blob.core.windows.net/arxivcs/0604/0604010v1.pdf discusses the exploration-exploitation trade-off in reinforcement learning and provides optimal decision thresholds for both infinite horizon discounted reward and finite horizon undiscounted reward cases. \nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0604/0604010v1.pdf"

Let's check our memory to see that it's keeping the conversation

In [22]:
memory.buffer

"Human: Tell me some use cases for reinforcement learning?\nAI: Reinforcement learning has a variety of use cases, including training animals, shaping hierarchical reinforcement-learning systems, rewarding agents for taking steps up a gradient, using imitation learning, decomposing a huge learning problem into smaller ones, and programming reflexes to keep agents alive and pointed in the right direction while they learn. Theoretical foundations and practical applications of the field can be found in the document at https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf. The exploration-exploitation trade-off in reinforcement learning is discussed in the article at https://demodatasetsp.blob.core.windows.net/arxivcs/0604/0604010v1.pdf, which provides optimal decision thresholds for both infinite horizon discounted reward and finite horizon undiscounted reward cases. However, none of the sources provide a comprehensive list of use cases for reinforcement learning.\nHuman:

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, it struggles to respond to prompts like: Hello, Thank you, Bye, What's your name, What's the weather and any other task that is not search in the knowledge base



# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data?** The next notebook 04 explains and solves the tabular problem and the concept of Agents