# Understanding Memory in LLMs

In the previous Notebook 03, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. [Bing Chat](http://chat.bing.com/) is a search engine with a GPT-4 model that utilizes the content of search results to provide context and deliver accurate responses to queries.

However, we have yet to discover how to engage in a conversation with the LLM. With Bing Chat, this is possible, as the LLM can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

The aim of this Notebook is to demonstrate how we can "provide memory" to the LLM by utilizing prompts and context.

In [1]:
import os
import random
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferMemory
from openai.error import OpenAIError
from langchain.docstore.document import Document
from langchain.memory import CosmosDBChatMessageHistory

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import (
    get_search_results,
    order_search_results,
    model_tokens_limit,
    num_tokens_from_docs,
    embed_docs,
    search_docs,
    get_answer,
)

from common.prompts import COMBINE_QUESTION_PROMPT, COMBINE_PROMPT, COMBINE_CHAT_PROMPT

from dotenv import load_dotenv
load_dotenv("credentials.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_BASE"] = os.environ["AZURE_OPENAI_ENDPOINT"]
os.environ["OPENAI_API_KEY"] = os.environ["AZURE_OPENAI_API_KEY"]
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]
os.environ["OPENAI_API_TYPE"] = "azure"

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning?"
FOLLOW_UP_QUESTION = "Can you summarize your last response?"

In [4]:
# Define model
MODEL = "gpt-35-turbo"
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5)

In [5]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [6]:
# Let's see what the GPT model responds
response = chain.run(QUESTION)
printmd(response)

1. Robotics: Reinforcement learning can be used to train robots to perform tasks such as picking and placing objects, navigating through unfamiliar environments, and performing complex movements.

2. Gaming: Reinforcement learning can be used to develop game-playing agents that can learn to play games such as chess, Go, and poker at a high level.

3. Autonomous Vehicles: Reinforcement learning can be used to train autonomous vehicles to navigate through traffic, avoid obstacles, and make decisions in real-time.

4. Personalized Recommendations: Reinforcement learning can be used to develop personalized recommendation systems that can learn from user feedback and provide more relevant recommendations over time.

5. Finance: Reinforcement learning can be used to optimize investment strategies, fraud detection, and risk management.

6. Healthcare: Reinforcement learning can be used to develop personalized treatment plans for patients based on their medical history and current condition.

7. Advertising: Reinforcement learning can be used to optimize ad placement and targeting to maximize conversions and revenue.

8. Energy Management: Reinforcement learning can be used to optimize energy consumption in buildings and reduce energy costs.

9. Manufacturing: Reinforcement learning can be used to optimize manufacturing processes and reduce waste.

10. Agriculture: Reinforcement learning can be used to optimize crop yields and reduce resource usage in agriculture.

In [7]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

'I am sorry, as an AI language model, I cannot summarize my last response without knowing the context or the specific response you are referring to. Please provide more information so that I can assist you better.'

As you can see, it doesn't remember what it just responded. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: {question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [10]:
chain.run({"history":Conversation_history, "question": FOLLOW_UP_QUESTION})

'Reinforcement learning can be used in a variety of industries such as robotics, gaming, autonomous vehicles, personalized recommendations, finance, healthcare, advertising, energy management, manufacturing, and agriculture to optimize processes, reduce waste, and improve decision-making.'

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In order to not duplicate code, we have put many of the code used in Notebook 3 into functions. These functions are in the app/utils.py and app/prompts.py files This way we can use these functios in the app that we will build later.

In [11]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
indexes = [index1_name, index2_name]

agg_search_results = get_search_results(QUESTION, indexes)
ordered_results = order_search_results(agg_search_results, reranker_threshold=1)

In [12]:
docs = []
for key,value in ordered_results.items():
    for page in value["chunks"]:
        docs.append(Document(page_content=page, metadata={"source": value["location"]}))

# Calculate number of tokens of our docs
tokens_limit = model_tokens_limit(MODEL)

if(len(docs)>0):
    num_tokens = num_tokens_from_docs(docs)
    print("Custom token limit for", MODEL, ":", tokens_limit)
    print("Combined docs tokens count:",num_tokens)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")


Custom token limit for gpt-35-turbo : 3000
Combined docs tokens count: 111872


In [13]:
%%time
if num_tokens > tokens_limit:
    index = embed_docs(docs)
    top_docs = search_docs(index,QUESTION)
    
    # Now we need to recalculate the tokens count of the top results from similarity vector search
    # in order to select the chain type: stuff or map_reduce
    
    num_tokens = num_tokens_from_docs(top_docs)   
    print("Token count after similarity search:", num_tokens)
    chain_type = "map_reduce" if num_tokens > tokens_limit else "stuff"
    
else:
    # if total tokens is less than our limit, we don't need to vectorize and do similarity search
    top_docs = docs
    chain_type = "stuff"
    
print("Chain Type selected:", chain_type)

Token count after similarity search: 4152
Chain Type selected: map_reduce
CPU times: user 676 ms, sys: 32.4 ms, total: 708 ms
Wall time: 5.55 s


In [14]:
# Get the answer
response = get_answer(docs=top_docs, query=QUESTION, language="English", deployment=MODEL, chain_type=chain_type)
response['output_text']

'Reinforcement learning can be applied in a variety of use cases, including robotics, industrial manufacturing, combinatorial search problems (such as computer game playing), filling containers with non-identical products, controlling machinery, juggling robots, mobile robots, and packaging tasks. Further work is in progress on practical implementations of reinforcement learning. \nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf'

And if we ask the follow up question:

In [15]:
response = get_answer(docs=top_docs,  query=FOLLOW_UP_QUESTION, language="English",deployment=MODEL, chain_type=chain_type)
response['output_text']

'The first three contents do not provide any information relevant to the question. The last content describes the standard reinforcement-learning model and models of optimal behavior.\nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf'

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/en/latest/modules/memory/examples/adding_memory_chain_multiple_inputs.html

In [16]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(docs=top_docs, query=QUESTION, language="English", deployment=MODEL, chain_type=chain_type, 
                        memory=memory)
response['output_text']

"Reinforcement learning has a wide range of practical applications, including game playing, robotics, and autonomous driving. Some specific examples of reinforcement learning use cases include filling containers with non-identical products, controlling machinery to produce containers with specific weights, juggling robots, mobile robots performing tasks, and packaging tasks. However, it's important to note that many reinforcement-learning techniques work effectively on small problems and may not scale well to larger problems. With appropriate biases, supplied by human programmers or teachers, complex reinforcement-learning problems will eventually be solvable. Some recent work explores the use of reflexes to make robot learning safer and more efficient. For more information, please refer to the following source: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf."

In [17]:
# Now we add a follow up question:
response = get_answer(docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", deployment=MODEL, chain_type=chain_type, 
                      memory=memory)
response['output_text']

"My apologies, my previous response was not provided. To answer your question, reinforcement learning has a variety of practical applications, such as game playing, robotics, and autonomous driving. Some specific examples of reinforcement learning use cases include filling containers with non-identical products, controlling machinery to produce containers with specific weights, juggling robots, mobile robots performing tasks, and packaging tasks. However, it's important to note that many reinforcement-learning techniques work effectively on small problems and may not scale well to larger problems. With appropriate biases, supplied by human programmers or teachers, complex reinforcement-learning problems will eventually be solvable. For more information, please refer to the following source: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf.\nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf"

In [18]:
# Another follow up query
response = get_answer(docs=top_docs, query="Thank you", language="English", deployment=MODEL, chain_type=chain_type,  
                      memory=memory)
response['output_text']

"You're welcome! Is there anything else I can help you with?\nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf"

You might get a different answer on the above cell, and it is ok, this bot is not yet well configured to answer any question that is not related to its knowledge base, including salutations.

Let's check our memory to see that it's keeping the conversation

In [19]:
memory.buffer

"Human: Tell me some use cases for reinforcement learning?\nAI: Reinforcement learning has a wide range of practical applications, including game playing, robotics, and autonomous driving. Some specific examples of reinforcement learning use cases include filling containers with non-identical products, controlling machinery to produce containers with specific weights, juggling robots, mobile robots performing tasks, and packaging tasks. However, it's important to note that many reinforcement-learning techniques work effectively on small problems and may not scale well to larger problems. With appropriate biases, supplied by human programmers or teachers, complex reinforcement-learning problems will eventually be solvable. Some recent work explores the use of reflexes to make robot learning safer and more efficient. For more information, please refer to the following source: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf.\nHuman: Can you summarize your last respo

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wisg to provide recommendations. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory, see [HERE](https://python.langchain.com/en/latest/_modules/langchain/memory/chat_message_histories/cosmos_db.html)

In [20]:
# Create CosmosDB instance from langchain cosmos class.
cosmos = CosmosDBChatMessageHistory(
    cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
    cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
    cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
    connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
    session_id="Agent-Test-Session" + str(random.randint(1, 1000)),
    user_id="Agent-Test-User" + str(random.randint(1, 1000))
    )

# prepare the cosmosdb instance
cosmos.prepare_cosmos()

In [21]:
# Create or Memory Object
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question",chat_memory=cosmos)

In [22]:
# Testing using our Question
response = get_answer(docs=top_docs, query=QUESTION, language="English", deployment=MODEL, chain_type=chain_type, 
                        memory=memory)
response['output_text']

'Reinforcement learning has a wide range of practical applications, including robotics, industrial manufacturing, and combinatorial search problems such as computer game playing. Some specific examples include filling containers with variable numbers of non-identical products, controlling machinery setpoints in factories, juggling robots, mobile robots for box-pushing and disk-collecting, and a packaging task. It is important to note that in order to solve highly complex problems, reinforcement learning techniques must incorporate bias that will give leverage to the learning process. With appropriate biases, supplied by human programmers or teachers, complex reinforcement-learning problems will eventually be solvable. \nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf'

In [23]:
# Now we add a follow up question:
response = get_answer(docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", deployment=MODEL, chain_type=chain_type, 
                      memory=memory)
response['output_text']

"My previous response discussed the practical applications of reinforcement learning, which include robotics, industrial manufacturing, and combinatorial search problems such as computer game playing. It also emphasized that appropriate biases, supplied by human programmers or teachers, are necessary to solve highly complex problems. If you would like a summary of the content of the document, it describes the standard reinforcement-learning model, where an agent interacts with its environment through perception and action, receiving input about the current state and choosing actions to generate output. The agent's behavior should aim to increase the long-run sum of values of the reinforcement signal, and it can learn to do this through trial and error with various algorithms. Reinforcement learning differs from supervised learning in that there are no input/output pairs, and the agent must actively gather experience to act optimally. The text also discusses models of optimal behavior, 

In [24]:
# Another follow up query
response = get_answer(docs=top_docs, query="Thank you", language="English", deployment=MODEL, chain_type=chain_type,  
                      memory=memory)
response['output_text']

"You're welcome! Let me know if you have any other questions. \nSOURCES: N/A"

Let's check our Azure CosmosDB to see the whole conversation


In [25]:
#load message from cosmosdb
cosmos.load_messages()
cosmos.messages

[HumanMessage(content='Tell me some use cases for reinforcement learning?', additional_kwargs={}, example=False),
 AIMessage(content='Reinforcement learning has a wide range of practical applications, including robotics, industrial manufacturing, and combinatorial search problems such as computer game playing. Some specific examples include filling containers with variable numbers of non-identical products, controlling machinery setpoints in factories, juggling robots, mobile robots for box-pushing and disk-collecting, and a packaging task. It is important to note that in order to solve highly complex problems, reinforcement learning techniques must incorporate bias that will give leverage to the learning process. With appropriate biases, supplied by human programmers or teachers, complex reinforcement-learning problems will eventually be solvable. \nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf', additional_kwargs={}, example=False),
 HumanMessage(con

![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, it struggles to respond to prompts like: Hello, Thank you, Bye, What's your name, What's the weather and any other task that is not search in the knowledge base.



# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data?** The next notebook 05 explains and solves the tabular problem and the concept of Agents