# Understanding Memory in LLMs

In the previous Notebook 03, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. [Bing Chat](http://chat.bing.com/) is a search engine with a GPT-4 model that utilizes the content of search results to provide context and deliver accurate responses to queries.

However, we have yet to discover how to engage in a conversation with the LLM. With Bing Chat, this is possible, as the LLM can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

The aim of this Notebook is to demonstrate how we can "provide memory" to the LLM by utilizing prompts and context.

In [10]:
from dotenv import load_dotenv
# .envファイルをロード
load_dotenv()

True

In [11]:
import os
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferMemory
from openai.error import OpenAIError
from langchain.docstore.document import Document
from langchain.memory import CosmosDBChatMessageHistory

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from app.utils import (
    get_search_results,
    order_search_results,
    model_tokens_limit,
    num_tokens_from_docs,
    embed_docs,
    search_docs,
    get_answer,
)

from app.prompts import COMBINE_QUESTION_PROMPT, COMBINE_PROMPT, COMBINE_CHAT_PROMPT

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

# Don't mess with this unless you really know what you are doing
AZURE_SEARCH_API_VERSION = '2021-04-30-Preview'
AZURE_OPENAI_API_VERSION = "2023-03-15-preview"


# Change these below with your own services credentials
AZURE_SEARCH_ENDPOINT = os.getenv("AZURE_SEARCH_ENDPOINT") #"Enter your Azure Cognitive Search Endpoint ..."
AZURE_SEARCH_KEY = os.getenv("AZURE_SEARCH_KEY") #"Enter your Azure Cognitive Search Key ..."  Make sure is the MANAGEMENT KEY no the query key
COG_SERVICES_NAME = os.getenv("COG_SERVICES_NAME") #"Enter your Cognitive Services NAME, note: not the Endpoint ..."
COG_SERVICES_KEY = os.getenv("COG_SERVICES_KEY") #"Enter your Cognitive Services Key ..."
AZURE_COSMOSDB_ENDPOINT = os.getenv("AZURE_COSMOSDB_ENDPOINT")#"ENTER YOUR VALUE"
AZURE_COSMOSDB_NAME = os.getenv("AZURE_COSMOSDB_NAME")#"ENTER YOUR VALUE"
AZURE_COSMOSDB_CONTAINER_NAME = os.getenv("AZURE_COSMOSDB_CONTAINER_NAME")#"ENTER YOUR VALUE"
AZURE_COMOSDB_CONNECTION_STRING = os.getenv("AZURE_COMOSDB_CONNECTION_STRING")#"ENTER YOUR VALUE"

In [13]:
AZURE_COSMOSDB_ENDPOINT

'https://cosmosdb-account-airqdqawghv7u.documents.azure.com:443/'

In [14]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
# os.environ["OPENAI_API_BASE"] = os.environ["AZURE_OPENAI_ENDPOINT"] = AZURE_OPENAI_ENDPOINT
# os.environ["OPENAI_API_KEY"] = os.environ["AZURE_OPENAI_API_KEY"] = AZURE_OPENAI_API_KEY
# os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"] = AZURE_OPENAI_API_VERSION
os.environ["OPENAI_API_BASE"] = "https://aoai-vbd-engine.openai.azure.com/"#os.environ["AZURE_OPENAI_ENDPOINT"] = AZURE_OPENAI_ENDPOINT
os.environ["OPENAI_API_KEY"] = "350380392cd8489ca98a44dff79fbe05"#os.environ["AZURE_OPENAI_API_KEY"] = AZURE_OPENAI_API_KEY
os.environ["OPENAI_API_VERSION"] = "2023-03-15-preview"#os.environ["AZURE_OPENAI_API_VERSION"] = AZURE_OPENAI_API_VERSION
os.environ["OPENAI_API_TYPE"] = "azure"

os.environ["AZURE_SEARCH_KEY"] = AZURE_SEARCH_KEY
os.environ["AZURE_SEARCH_ENDPOINT"] = AZURE_SEARCH_ENDPOINT
os.environ["AZURE_COSMOSDB_ENDPOINT"]=  AZURE_COSMOSDB_ENDPOINT
os.environ["AZURE_COSMOSDB_NAME"]=  AZURE_COSMOSDB_NAME
os.environ["AZURE_COSMOSDB_CONTAINER_NAME"]=  AZURE_COSMOSDB_CONTAINER_NAME
os.environ["AZURE_COMOSDB_CONNECTION_STRING"]=  AZURE_COMOSDB_CONNECTION_STRING

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [15]:
QUESTION = "Tell me some use cases for reinforcement learning?"
FOLLOW_UP_QUESTION = "Can you rephrase what you just said?"

In [16]:
# Define model
MODEL = "gpt-35-turbo"
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5)

In [17]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [18]:
# Let's see what the GPT model responds
response = chain.run(QUESTION)
printmd(response)

1. Game playing: Reinforcement learning has been used extensively in game playing, such as AlphaGo, AlphaZero, and OpenAI Five, to develop strategies and improve performance.

2. Robotics: Reinforcement learning is used to train robots to perform complex tasks, such as grasping objects, navigating through environments, and manipulating objects.

3. Autonomous vehicles: Reinforcement learning is used to train autonomous vehicles to make decisions while driving, such as changing lanes, avoiding obstacles, and navigating intersections.

4. Personalized recommendations: Reinforcement learning is used to develop personalized recommendations for products and services based on user behavior and preferences.

5. Healthcare: Reinforcement learning is used to optimize treatment plans for patients with chronic diseases, such as diabetes and cancer.

6. Finance: Reinforcement learning is used to develop trading strategies for financial markets, such as predicting stock prices and managing portfolios.

7. Advertising: Reinforcement learning is used to optimize ad placement and targeting to maximize revenue and engagement.

8. Energy management: Reinforcement learning is used to optimize energy consumption in buildings and power grids, reducing costs and improving efficiency.

9. Education: Reinforcement learning is used to develop personalized learning paths for students based on their performance and progress.

10. Agriculture: Reinforcement learning is used to optimize crop yields and reduce waste by predicting weather patterns and soil conditions.

In [19]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

'I apologize, but as an AI language model, I am unable to determine which statement you are referring to. Can you please provide me with more information or context so that I may assist you better?'

As you can see, it doesn't remember what it just responded. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [20]:
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: {question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [21]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [22]:
chain.run({"history":Conversation_history, "question": FOLLOW_UP_QUESTION})

'Sure, here are some examples of how reinforcement learning is used in various industries:\n\n1. Game playing: Reinforcement learning is used to improve game strategies and performance.\n2. Robotics: Reinforcement learning is used to train robots to perform complex tasks.\n3. Autonomous vehicles: Reinforcement learning is used to help autonomous vehicles make driving decisions.\n4. Personalized recommendations: Reinforcement learning is used to develop personalized product and service recommendations.\n5. Healthcare: Reinforcement learning is used to optimize treatment plans for patients with chronic diseases.\n6. Finance: Reinforcement learning is used to develop trading strategies for financial markets.\n7. Advertising: Reinforcement learning is used to optimize ad placement and targeting.\n8. Energy management: Reinforcement learning is used to optimize energy consumption in buildings and power grids.\n9. Education: Reinforcement learning is used to develop personalized learning pat

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In order to not duplicate code, we have put many of the code used in Notebook 3 into functions. These functions are in the app/utils.py and app/prompts.py files This way we can use these functios in the app that we will build later.

In [23]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
indexes = [index1_name, index2_name]

agg_search_results = get_search_results(QUESTION, indexes)
ordered_results = order_search_results(agg_search_results, reranker_threshold=1)

In [24]:
docs = []
for key,value in ordered_results.items():
    for page in value["chunks"]:
        docs.append(Document(page_content=page, metadata={"source": value["location"]}))

# Calculate number of tokens of our docs
tokens_limit = model_tokens_limit(MODEL)

if(len(docs)>0):
    num_tokens = num_tokens_from_docs(docs)
    print("Custom token limit for", MODEL, ":", tokens_limit)
    print("Combined docs tokens count:",num_tokens)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")


Custom token limit for gpt-35-turbo : 3000
Combined docs tokens count: 58744


In [25]:
%%time
if num_tokens > tokens_limit:
    index = embed_docs(docs)
    top_docs = search_docs(index,QUESTION)
    
    # Now we need to recalculate the tokens count of the top results from similarity vector search
    # in order to select the chain type: stuff or map_reduce
    
    num_tokens = num_tokens_from_docs(top_docs)   
    print("Token count after similarity search:", num_tokens)
    chain_type = "map_reduce" if num_tokens > tokens_limit else "stuff"
    
else:
    # if total tokens is less than our limit, we don't need to vectorize and do similarity search
    top_docs = docs
    chain_type = "stuff"
    
print("Chain Type selected:", chain_type)

Number of chunks: 52
Token count after similarity search: 3681
Chain Type selected: map_reduce
CPU times: user 15.1 s, sys: 2.45 s, total: 17.6 s
Wall time: 5.01 s


In [26]:
# Get the answer
response = get_answer(docs=top_docs, query=QUESTION, language="English", deployment=MODEL, chain_type=chain_type)
response['output_text']

'Some use cases for reinforcement learning include routing, robotics, game playing, autonomous vehicles, elevator scheduling, traffic signal control, and resource management in computer networks. \nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/0105/0105027v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0207/0207073v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0204/0204040v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0204/0204043v1.pdf'

And if we ask the follow up question:

In [27]:
response = get_answer(docs=top_docs,  query=FOLLOW_UP_QUESTION, language="English",deployment=MODEL, chain_type=chain_type)
response['output_text']

'The given content discusses the development of value estimators for reinforcement learning to estimate the value of another policy, resulting in more data-efficient algorithms. The paper also considers bounds on sample size for policy evaluation. There is no relevant text to answer the other questions. \nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/0105/0105027v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0207/0207073v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0204/0204040v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0204/0204043v1.pdf'

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/en/latest/modules/memory/examples/adding_memory_chain_multiple_inputs.html

In [28]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(docs=top_docs, query=QUESTION, language="English", deployment=MODEL, chain_type=chain_type, 
                        memory=memory)
response['output_text']

'Reinforcement learning has various use cases, including routing and optimizing long-term return in a class of behaviors. However, the agent does not know the correct behavior or the true model of the environment it interacts with. A policy is often used to choose the action, and the effectiveness of the action is communicated to the agent through a scalar value (reinforcement signal). For more information, you can refer to the following sources: https://demodatasetsp.blob.core.windows.net/arxivcs/0105/0105027v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0207/0207073v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0204/0204040v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0204/0204043v1.pdf.'

In [29]:
# Now we add a follow up question:
response = get_answer(docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", deployment=MODEL, chain_type=chain_type, 
                      memory=memory)
response['output_text']

"One of the papers discusses reinforcement learning, which involves finding the optimal course of action in Markovian environments without knowledge of the environment's dynamics. The value of a policy is estimated from results of simulating that policy in the environment, which requires a large amount of simulation as different points in the policy space are considered. Another paper discusses a reinforcement learning algorithm that can learn from scarce experience using a proxy environment. The algorithm relies on four external routines: pick sample, sample, add data, and optimize. The routine pick sample represents the balance between exploration and exploitation and has a single parameter p∗. The larger the value of p∗, the more exploitative the algorithm is. However, there is no relevant text in the given portion of the document to rephrase. For more information, you can refer to the following sources: https://demodatasetsp.blob.core.windows.net/arxivcs/0105/0105027v1.pdf, https:/

In [30]:
# Another follow up query
response = get_answer(docs=top_docs, query="Thank you", language="English", deployment=MODEL, chain_type=chain_type,  
                      memory=memory)
response['output_text']

"You're welcome! Do you have any other questions I can help you with?\nSOURCES: N/A"

Let's check our memory to see that it's keeping the conversation

In [31]:
memory.buffer

"Human: Tell me some use cases for reinforcement learning?\nAI: Reinforcement learning has various use cases, including routing and optimizing long-term return in a class of behaviors. However, the agent does not know the correct behavior or the true model of the environment it interacts with. A policy is often used to choose the action, and the effectiveness of the action is communicated to the agent through a scalar value (reinforcement signal). For more information, you can refer to the following sources: https://demodatasetsp.blob.core.windows.net/arxivcs/0105/0105027v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0207/0207073v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0204/0204040v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0204/0204043v1.pdf.\nHuman: Can you rephrase what you just said?\nAI: One of the papers discusses reinforcement learning, which involves finding the optimal course of action in Markovian environments without knowledge 

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wisg to provide recommendations. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory, see [HERE](https://python.langchain.com/en/latest/_modules/langchain/memory/chat_message_histories/cosmos_db.html)

In [42]:
# Create CosmosDB instance from langchain cosmos class.

SESSION_ID = '001'  # this isstring value, normally you need to provide the user session id. As an example I have provided '001' as a session id
USER_ID = 'dummy01' # this is string value, normally you need to provide the user id. As an example I have provided 'dummy01' as a user id

cosmos = CosmosDBChatMessageHistory(
    cosmos_endpoint=AZURE_COSMOSDB_ENDPOINT,
    cosmos_database=AZURE_COSMOSDB_NAME,
    cosmos_container=AZURE_COSMOSDB_CONTAINER_NAME,
    connection_string=AZURE_COMOSDB_CONNECTION_STRING,
    session_id=SESSION_ID,
    user_id=USER_ID
    )

# prepare the cosmosdb instance
cosmos.prepare_cosmos()

In [43]:
# Create or Memory Object
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question",chat_memory=cosmos)

In [44]:
# Testing using our Question
response = get_answer(docs=top_docs, query=QUESTION, language="English", deployment=MODEL, chain_type=chain_type, 
                        memory=memory)
response['output_text']

"Reinforcement learning has been applied successfully to a wide range of problems, including game playing, robotics, control engineering, and telecommunications. It is also increasingly finding use in many important applications, including routing. Research in reinforcement learning focuses on designing algorithms for an agent interacting with an environment, to adjust its behavior in such a way as to optimize a long-term return. This means searching for an optimal behavior in a class of behaviors. Unfortunately, I couldn't find more specific use cases. \nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/0105/0105027v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0207/0207073v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0204/0204040v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0204/0204043v1.pdf"

In [45]:
# Now we add a follow up question:
response = get_answer(docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", deployment=MODEL, chain_type=chain_type, 
                      memory=memory)
response['output_text']

ValueError: A single document was longer than the context length, we cannot handle this.

In [46]:
# Another follow up query
response = get_answer(docs=top_docs, query="Thank you", language="English", deployment=MODEL, chain_type=chain_type,  
                      memory=memory)
response['output_text']

"You're welcome! Let me know if you have any other questions. \nSOURCES: N/A"

Let's check our Azure CosmosDB to see the whole conversation


In [47]:
#load message from cosmosdb
cosmos.load_messages()
cosmos.messages

[HumanMessage(content='Tell me some use cases for reinforcement learning?', additional_kwargs={}, example=False),
 AIMessage(content="Reinforcement learning has been applied successfully to a variety of problems, including game playing, automated control of simulated and real robots, autonomous vehicles, and routing. The learning system does not know the correct behavior, or the true model of the environment it interacts with. Given the sensation of the environment state as an input, the agent chooses the action according to some rule, often called a policy. This action constitutes the output. The issue of finding a near-optimal policy from a given class of policies is analogous to a similar issue in supervised learning. There we are looking for a near-optimal hypothesis from a given class of hypotheses. Unfortunately, I couldn't find more specific use cases. \nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/0105/0105027v1.pdf, https://demodatasetsp.blob.core.windows.net/a

![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, it struggles to respond to prompts like: Hello, Thank you, Bye, What's your name, What's the weather and any other task that is not search in the knowledge base.



# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data?** The next notebook 05 explains and solves the tabular problem and the concept of Agents