# Understanding Memory in LLMs

In the previous Notebook, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.embeddings import AzureOpenAIEmbeddings
from langchain.docstore.document import Document
from langchain.memory import CosmosDBChatMessageHistory

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import (
    get_search_results,
    update_vector_indexes,
    model_tokens_limit,
    num_tokens_from_docs,
    num_tokens_from_string,
    get_answer,
)

from common.prompts import COMBINE_CHAT_PROMPT_TEMPLATE

from dotenv import load_dotenv
load_dotenv("credentials.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning"
FOLLOW_UP_QUESTION = "Give me the main points of our conversation"

In [4]:
# Define model
MODEL = "gpt-35-turbo"
COMPLETION_TOKENS = 500
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)

In [5]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [6]:
# Let's see what the GPT model responds
#response = chain.run(QUESTION)
#printmd(response)

response = chain.run(QUESTION)
printmd(response)

Sure! Here are some use cases for reinforcement learning:

1. Game playing: Reinforcement learning has been successfully applied to game playing, such as playing chess, Go, or Atari games. DeepMind's AlphaGo is a famous example of reinforcement learning in action.

2. Robotics: Reinforcement learning can be used to train robots to perform complex tasks, such as grasping objects, walking, or flying. It enables robots to learn from trial and error in real-world environments.

3. Autonomous driving: Reinforcement learning can be used to train self-driving cars to make decisions, navigate traffic, and respond to various road conditions. It helps in optimizing driving behavior and improving safety.

4. Recommendation systems: Reinforcement learning can be used to personalize recommendations for users in various domains, such as movies, music, or products. It helps in learning user preferences and optimizing recommendations over time.

5. Energy management: Reinforcement learning can be used to optimize energy consumption and management in smart grids. It enables intelligent decision-making to balance energy generation, storage, and consumption.

6. Healthcare: Reinforcement learning can be used to optimize treatment plans, drug dosage, and personalized medicine. It helps in learning optimal strategies for disease management and patient care.

7. Finance: Reinforcement learning can be used for algorithmic trading, portfolio management, and risk assessment. It helps in learning optimal trading strategies and making informed investment decisions.

8. Resource allocation: Reinforcement learning can be used to optimize resource allocation in various domains, such as transportation, logistics, or supply chain management. It helps in making efficient decisions to allocate resources effectively.

9. Industrial control systems: Reinforcement learning can be used to optimize control and operation of industrial processes, such as chemical plants or manufacturing systems. It helps in learning optimal control policies and improving system performance.

10. Natural language processing: Reinforcement learning can be used to improve dialogue systems, machine translation, or text summarization. It helps in learning to generate more accurate and context-aware responses.

These are just a few examples, and reinforcement learning has a wide range of applications across various domains.

In [7]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

'1. The topic of our conversation was about the upcoming project deadline.\n2. We discussed the progress made so far and identified areas that need more attention.\n3. We talked about potential strategies to meet the deadline and discussed the resources available.\n4. We agreed on distributing tasks among team members and set specific targets for each.\n5. We discussed the importance of effective communication and coordination within the team.\n6. We addressed any concerns or challenges that may arise during the project and brainstormed possible solutions.\n7. We concluded the conversation by setting a follow-up meeting to track progress and make any necessary adjustments.'

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: '{question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [10]:
printmd(chain.run({"history":Conversation_history, "question": FOLLOW_UP_QUESTION}))

Sure! Here are the main points of our conversation:

- Reinforcement learning has various use cases, including game playing, robotics, autonomous driving, recommendation systems, energy management, healthcare, finance, resource allocation, industrial control systems, and natural language processing.
- It is used in game playing to train AI agents to play chess, Go, or Atari games.
- In robotics, reinforcement learning helps train robots to perform complex tasks in real-world environments.
- It is used in autonomous driving to make decisions and navigate traffic, improving safety and optimizing driving behavior.
- Reinforcement learning is used in recommendation systems to personalize recommendations for users in domains like movies, music, and products.
- In energy management, it optimizes energy consumption and management in smart grids.
- In healthcare, it helps optimize treatment plans, drug dosage, and personalized medicine.
- In finance, reinforcement learning is used for algorithmic trading, portfolio management, and risk assessment.
- It optimizes resource allocation in domains like transportation, logistics, and supply chain management.
- In industrial control systems, it optimizes control and operation of industrial processes.
- Reinforcement learning can improve natural language processing tasks like dialogue systems, machine translation, and text summarization.

These are the main points we discussed about the use cases of reinforcement learning.

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In [11]:
# Since Memory adds tokens to the prompt, we would need a better model that allows more space on the prompt
MODEL = "gpt-35-turbo-16k"
COMPLETION_TOKENS = 2000
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)
embedder = AzureOpenAIEmbeddings(deployment="text-embedding-ada-002", chunk_size=1) 

In [12]:
index1_name = "cogsrch-index-files"
# index2_name = "cogsrch-index-csv"
index3_name = "cogsrch-index-books-vector"
text_indexes = [index1_name]
vector_indexes = [index+"-vector" for index in text_indexes] + [index3_name]

In [13]:
%%time

# Search in text-based indexes first and update vector indexes
k=10 # Top k results per each text-based index
ordered_results = get_search_results(QUESTION, text_indexes, k=k, reranker_threshold=1, vector_search=False)
update_vector_indexes(ordered_search_results=ordered_results, embedder=embedder)

# Search in all vector-based indexes available
similarity_k = 5 # top results from multi-vector-index similarity search
ordered_results = get_search_results(QUESTION, vector_indexes, k=k, vector_search=True,
                                        similarity_k=similarity_k,
                                        query_vector = embedder.embed_query(QUESTION))
print("Number of results:",len(ordered_results))

Number of results: 5
CPU times: user 451 ms, sys: 44.1 ms, total: 495 ms
Wall time: 2.84 s


In [14]:
# Uncomment the below line if you want to inspect the ordered results
# ordered_results

In [15]:
top_docs = []
for key,value in ordered_results.items():
    location = value["location"] if value["location"] is not None else ""
    top_docs.append(Document(page_content=value["chunk"], metadata={"source": location+os.environ['BLOB_SAS_TOKEN']}))
        
print("Number of chunks:",len(top_docs))

Number of chunks: 5


In [16]:
# Calculate number of tokens of our docs
if(len(top_docs)>0):
    tokens_limit = model_tokens_limit(MODEL) # this is a custom function we created in common/utils.py
    prompt_tokens = num_tokens_from_string(COMBINE_CHAT_PROMPT_TEMPLATE) # this is a custom function we created in common/utils.py
    context_tokens = num_tokens_from_docs(top_docs) # this is a custom function we created in common/utils.py
    
    requested_tokens = prompt_tokens + context_tokens + COMPLETION_TOKENS
    
    chain_type = "map_reduce" if requested_tokens > 0.9 * tokens_limit else "stuff"  
    
    print("System prompt token count:",prompt_tokens)
    print("Max Completion Token count:", COMPLETION_TOKENS)
    print("Combined docs (context) token count:",context_tokens)
    print("--------")
    print("Requested token count:",requested_tokens)
    print("Token limit for", MODEL, ":", tokens_limit)
    print("Chain Type selected:", chain_type)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")

System prompt token count: 2480
Max Completion Token count: 2000
Combined docs (context) token count: 5085
--------
Requested token count: 9565
Token limit for gpt-35-turbo-16k : 16384
Chain Type selected: stuff


In [17]:
%%time
# Get the answer
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

Some use cases for reinforcement learning include:
1. Searching for plannable domains to speed up reinforcement learning [1]
2. Algorithms for reinforcement learning in routing [2]
3. Using gradient descent methods for policy search in reinforcement learning [3]
4. Reusing accumulated experience in reinforcement learning without a generative model of the environment [4]

References:
[1] Source: https://demodatasetsp.blob.core.windows.net/arxivcs/arxivcs/0212/0212025v1.pdf?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-07-01T22:05:56Z&st=2023-12-27T15:05:56Z&spr=https&sig=9%2BxiwXXov2K8qAvXz6ep6Z%2Fi3sh7g1wr8huIvhggyMs%3D
[2] Source: https://demodatasetsp.blob.core.windows.net/arxivcs/arxivcs/0207/0207073v1.pdf?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-07-01T22:05:56Z&st=2023-12-27T15:05:56Z&spr=https&sig=9%2BxiwXXov2K8qAvXz6ep6Z%2Fi3sh7g1wr8huIvhggyMs%3D
[3] Source: https://demodatasetsp.blob.core.windows.net/arxivcs/arxivcs/0105/0105027v1.pdf?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-07-01T22:05:56Z&st=2023-12-27T15:05:56Z&spr=https&sig=9%2BxiwXXov2K8qAvXz6ep6Z%2Fi3sh7g1wr8huIvhggyMs%3D
[4] Source: https://demodatasetsp.blob.core.windows.net/arxivcs/arxivcs/0105/0105027v1.pdf?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-07-01T22:05:56Z&st=2023-12-27T15:05:56Z&spr=https&sig=9%2BxiwXXov2K8qAvXz6ep6Z%2Fi3sh7g1wr8huIvhggyMs%3D

CPU times: user 22.2 ms, sys: 7.83 ms, total: 30 ms
Wall time: 8.51 s


And if we ask the follow up question:

In [18]:
response = get_answer(llm=llm, docs=top_docs,  query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

I'm sorry, but I don't have enough information to provide the main points of our conversation.

You might get a different response from above, but it doesn't matter what response you get, it will be based on the context given, not on previous answers.

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/docs/modules/memory/how_to/adding_memory_chain_multiple_inputs

In [19]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has a wide range of use cases across various domains. Here are some examples:

1. **Game Playing**: Reinforcement learning has been successfully applied to game playing, where agents learn to make decisions and improve their strategies through trial and error. For example, AlphaGo, developed by DeepMind, used reinforcement learning to become a world champion in the game of Go.

2. **Robotics**: Reinforcement learning is used in robotics to teach robots how to perform complex tasks and interact with their environment. Robots can learn to navigate, manipulate objects, and perform actions based on rewards and penalties.

3. **Autonomous Vehicles**: Reinforcement learning is used in autonomous vehicles to make decisions about driving behavior, such as lane changing, merging, and navigating intersections. Agents can learn to optimize driving strategies based on safety and efficiency.

4. **Recommendation Systems**: Reinforcement learning can be used to personalize recommendations for users in various domains, such as e-commerce, entertainment, and content platforms. Agents can learn to recommend items or content based on user preferences and feedback.

5. **Resource Management**: Reinforcement learning can be applied to optimize the allocation and management of resources in various systems, such as energy grids, supply chains, and telecommunications networks. Agents can learn to make decisions that maximize efficiency and minimize costs.

6. **Healthcare**: Reinforcement learning has the potential to improve healthcare outcomes by optimizing treatment plans, personalized medicine, and clinical decision-making. Agents can learn to make treatment recommendations based on patient data and medical guidelines.

These are just a few examples of the many possible use cases for reinforcement learning. The applications are diverse and can be tailored to specific domains and problem domains.

In [20]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has a wide range of use cases across various domains.
2. Some examples of use cases for reinforcement learning include game playing, robotics, autonomous vehicles, recommendation systems, resource management, and healthcare.
3. Reinforcement learning can be used to teach agents how to make decisions and improve their strategies through trial and error.
4. It can be applied to tasks such as navigating, manipulating objects, making driving decisions, personalizing recommendations, optimizing resource allocation, and improving healthcare outcomes.
5. Reinforcement learning algorithms often integrate planning methods to speed up learning and decision-making.
6. Planning involves generating and evaluating different policies or actions based on a model of the environment.
7. The Markov decision process (MDP) and partially observable Markov decision process (POMDP) are commonly used models for reinforcement learning.
8. Policies in reinforcement learning define the actions to be taken based on the current state and history of interactions.
9. The value of a policy is the expected return or reward obtained by following that policy.
10. Off-line scenarios in reinforcement learning involve separating the data acquisition module from the optimization or learning module.
11. In off-line scenarios, it is important to generalize from limited interaction experience to make judgments about untried behaviors.
12. Importance sampling and likelihood ratio estimation can be used to estimate the value of policies and generalize from limited experience.
13. The estimation of policy value can be used to select among candidate classes of policies with different complexities.

These points summarize the main topics we discussed regarding reinforcement learning and its applications. Let me know if there's anything else I can help you with.

Sources:
[1]<sup><a href="https://demodatasetsp.blob.core.windows.net/arxivcs/arxivcs/0212/0212025v1.pdf?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-07-01T22:05:56Z&st=2023-12-27T15:05:56Z&spr=https&sig=9%2BxiwXXov2K8qAvXz6ep6Z%2Fi3sh7g1wr8huIvhggyMs%3D" target="_blank">Source 1</a></sup>
[2]<sup><a href="https://demodatasetsp.blob.core.windows.net/books/Made_To_Stick.pdf?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-07-01T22:05:56Z&st=2023-12-27T15:05:56Z&spr=https&sig=9%2BxiwXXov2K8qAvXz6ep6Z%2Fi3sh7g1wr8huIvhggyMs%3D" target="_blank">Source 2</a></sup>
[3]<sup><a href="https://demodatasetsp.blob.core.windows.net/arxivcs/arxivcs/0207/0207073v1.pdf?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-07-01T22:05:56Z&st=2023-12-27T15:05:56Z&spr=https&sig=9%2BxiwXXov2K8qAvXz6ep6Z%2Fi3sh7g1wr8huIvhggyMs%3D" target="_blank">Source 3</a></sup>
[4]<sup><a href="https://demodatasetsp.blob.core.windows.net/arxivcs/arxivcs/0105/0105027v1.pdf?sv=2022-11-02&ss=bfqt&srt=sco&sp=rwdlacupiytfx&se=2024-07-01T22:05:56Z&st=2023-12-27T15:05:56Z&spr=https&sig=9%2BxiwXXov2K8qAvXz6ep6Z%2Fi3sh7g1wr8huIvhggyMs%3D" target="_blank">Source 4</a></sup>

In [21]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

I'm sorry, but I couldn't find any information about the main points of our conversation in the provided sources. Is there anything else I can help you with?

You might get a different answer on the above cell, and it is ok, this bot is not yet well configured to answer any question that is not related to its knowledge base, including salutations.

Let's check our memory to see that it's keeping the conversation

In [22]:
memory.buffer

'Human: Tell me some use cases for reinforcement learning\nAI: Reinforcement learning has a wide range of use cases across various domains. Here are some examples:\n\n1. **Game Playing**: Reinforcement learning has been successfully applied to game playing, where agents learn to make decisions and improve their strategies through trial and error. For example, AlphaGo, developed by DeepMind, used reinforcement learning to become a world champion in the game of Go.\n\n2. **Robotics**: Reinforcement learning is used in robotics to teach robots how to perform complex tasks and interact with their environment. Robots can learn to navigate, manipulate objects, and perform actions based on rewards and penalties.\n\n3. **Autonomous Vehicles**: Reinforcement learning is used in autonomous vehicles to make decisions about driving behavior, such as lane changing, merging, and navigating intersections. Agents can learn to optimize driving strategies based on safety and efficiency.\n\n4. **Recommen

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wisg to provide recommendations. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory, see [HERE](https://python.langchain.com/en/latest/_modules/langchain/memory/chat_message_histories/cosmos_db.html)

In [23]:
# Create CosmosDB instance from langchain cosmos class.
cosmos = CosmosDBChatMessageHistory(
    cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
    cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
    cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
    connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
    session_id="Agent-Test-Session" + str(random.randint(1, 1000)),
    user_id="Agent-Test-User" + str(random.randint(1, 1000))
    )

# prepare the cosmosdb instance
cosmos.prepare_cosmos()

In [24]:
# Create or Memory Object
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question",chat_memory=cosmos)

In [25]:
# Testing using our Question
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has a wide range of use cases. Here are a few examples:

1. **Game playing**: Reinforcement learning has been successfully applied to games such as chess, Go, and poker. By learning from experience and optimizing strategies, reinforcement learning algorithms have achieved superhuman performance in these games.

2. **Robotics**: Reinforcement learning can be used to train robots to perform complex tasks, such as grasping objects, navigating through environments, and even playing sports. By learning from trial and error, robots can improve their actions and adapt to different situations.

3. **Autonomous vehicles**: Reinforcement learning can be used to train self-driving cars and drones. By learning from real-world data and simulations, these vehicles can learn to navigate roads, avoid obstacles, and make safe and efficient decisions.

4. **Recommendation systems**: Reinforcement learning can be used to personalize recommendations for users in various domains, such as e-commerce, music streaming, and online advertising. By learning from user feedback and interactions, these systems can optimize recommendations to improve user satisfaction.

5. **Resource management**: Reinforcement learning can be used to optimize the allocation of resources in various domains, such as energy grids, transportation systems, and supply chains. By learning from historical data and real-time information, these systems can make intelligent decisions to maximize efficiency and minimize costs.

These are just a few examples of the many use cases for reinforcement learning. Its ability to learn from experience and optimize actions makes it a powerful tool in various domains. If you need more information or have specific questions about any of these use cases, feel free to ask!

In [26]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

In [None]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

Let's check our Azure CosmosDB to see the whole conversation


In [None]:
#load message from cosmosdb
cosmos.load_messages()
cosmos.messages

![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, regardless of the input and it struggles to respond to prompts like: Hello, Thank you, Bye, What's your name, What's the weather and any other task that is not search in the knowledge base.


## <u>Important Note</u>:<br>
As we proceed, while all the code will remain compatible with GPT-3.5 models, we highly recommend transitioning to GPT-4. Here's why:

**GPT-3.5-Turbo** can be likened to a 7-year-old child. You can provide it with concise instructions, but it frequently struggles to follow them accurately. Additionally, its limited memory can make sustained conversations challenging.

**GPT-3.5-Turbo-16k** resembles the same 7-year-old, but with an increased attention span for longer instructions. However, it still faces difficulties accurately executing them about half the time.

**GPT-4** exhibits the capabilities of a 10-12-year-old child. It possesses enhanced reasoning skills and more consistently adheres to instructions. While its memory retention for instructions is moderate, it excels at following them.

**GPT-4-32k** is akin to the 10-12-year-old child with an extended memory. It comprehends lengthy sets of instructions and engages in meaningful conversations. Thanks to its robust memory, it offers detailed responses.

Understanding this analogy above will become clearer as you complete the final notebook.


# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data?** The next notebook explains and solves the tabular problem and the concept of Agents