# Understanding Memory in LLMs

In the previous Notebook, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from openai.error import OpenAIError
from langchain.embeddings import OpenAIEmbeddings
from langchain.docstore.document import Document
from langchain.memory import CosmosDBChatMessageHistory

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import (
    get_search_results,
    update_vector_indexes,
    model_tokens_limit,
    num_tokens_from_docs,
    num_tokens_from_string,
    get_answer,
)

from common.prompts import COMBINE_CHAT_PROMPT_TEMPLATE

from dotenv import load_dotenv
load_dotenv()

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_BASE"] = os.environ["AZURE_OPENAI_ENDPOINT"]
os.environ["OPENAI_API_KEY"] = os.environ["AZURE_OPENAI_API_KEY"]
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]
os.environ["OPENAI_API_TYPE"] = "azure"

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning"
FOLLOW_UP_QUESTION = "Give me the main points of our conversation"

In [4]:
# Define model
MODEL = "gpt-35-turbo"
COMPLETION_TOKENS = 500
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)

In [5]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [6]:
# Let's see what the GPT model responds
response = chain.run(QUESTION)
printmd(response)

1. Game playing: Reinforcement learning has been successfully applied to games such as chess, Go, and poker, where the agent learns to make optimal decisions by playing against itself or human opponents.

2. Robotics: Reinforcement learning can be used to train robots to perform complex tasks, such as grasping objects, navigating through environments, or even playing sports.

3. Autonomous driving: Reinforcement learning can be used to train self-driving cars to make decisions in real-time, such as lane changing, merging, and avoiding obstacles.

4. Recommendation systems: Reinforcement learning can be used to personalize recommendations for users in various domains, such as movies, music, or online shopping, by learning from user feedback and optimizing for user satisfaction.

5. Resource management: Reinforcement learning can be used to optimize the allocation of resources in various domains, such as energy management, traffic control, or supply chain management, by learning to make decisions that maximize efficiency and minimize costs.

6. Finance: Reinforcement learning can be used to develop trading strategies in financial markets, where the agent learns to make buy/sell decisions based on market data and maximize profit.

7. Healthcare: Reinforcement learning can be used to develop personalized treatment plans for patients, where the agent learns to make decisions about medication dosages or treatment options based on patient data and optimize for health outcomes.

8. Advertising: Reinforcement learning can be used to optimize online advertising campaigns, where the agent learns to make decisions about ad placements and targeting based on user interactions and maximize click-through rates or conversions.

9. Natural language processing: Reinforcement learning can be used to develop conversational agents or chatbots, where the agent learns to generate responses or carry out tasks based on user interactions and optimize for user satisfaction.

10. Industrial control: Reinforcement learning can be used to optimize control systems in industries such as manufacturing, energy, or chemical processing, where the agent learns to make decisions that maximize production efficiency and minimize downtime or waste.

In [7]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

'I apologize, but as an AI, I do not have the capability to recall the specific details of our conversation. However, if you provide me with the key topics discussed, I can try to summarize the main points related to those topics.'

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: {question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [10]:
printmd(chain.run({"history":Conversation_history, "question": FOLLOW_UP_QUESTION}))

- Reinforcement learning can be used in various domains such as game playing, robotics, autonomous driving, recommendation systems, resource management, finance, healthcare, advertising, natural language processing, and industrial control.
- In game playing, reinforcement learning can help agents make optimal decisions by playing against themselves or human opponents.
- In robotics, reinforcement learning can train robots to perform complex tasks like grasping objects or navigating through environments.
- In autonomous driving, reinforcement learning can help self-driving cars make real-time decisions like lane changing or avoiding obstacles.
- In recommendation systems, reinforcement learning can personalize recommendations for users based on their feedback and optimize for user satisfaction.
- In resource management, reinforcement learning can optimize the allocation of resources in domains like energy management or supply chain management.
- In finance, reinforcement learning can develop trading strategies based on market data to maximize profit.
- In healthcare, reinforcement learning can develop personalized treatment plans for patients based on their data and optimize for health outcomes.
- In advertising, reinforcement learning can optimize online advertising campaigns based on user interactions to maximize click-through rates or conversions.
- In natural language processing, reinforcement learning can develop conversational agents or chatbots that generate responses or carry out tasks based on user interactions and optimize for user satisfaction.
- In industrial control, reinforcement learning can optimize control systems in industries like manufacturing or energy to maximize production efficiency and minimize downtime or waste.

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In [11]:
# Since Memory adds tokens to the prompt, we would need a better model that allows more space on the prompt
MODEL = "gpt-35-turbo-16k"
COMPLETION_TOKENS = 1000
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)
embedder = OpenAIEmbeddings(deployment="text-embedding-ada-002", chunk_size=1) 

In [12]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
index3_name = "cogsrch-index-books-vector"
text_indexes = [index1_name, index2_name]
vector_indexes = [index+"-vector" for index in text_indexes] + [index3_name]

In [13]:
%%time

# Search in text-based indexes first and update vector indexes
k=10 # Top k results per each text-based index
ordered_results = get_search_results(QUESTION, text_indexes, k=k, reranker_threshold=1, vector_search=False)
update_vector_indexes(ordered_search_results=ordered_results, embedder=embedder)

# Search in all vector-based indexes available
similarity_k = 5 # top results from multi-vector-index similarity search
ordered_results = get_search_results(QUESTION, vector_indexes, k=k, vector_search=True,
                                        similarity_k=similarity_k,
                                        query_vector = embedder.embed_query(QUESTION))
print("Number of results:",len(ordered_results))

Number of results: 5
CPU times: user 1.78 s, sys: 40.2 ms, total: 1.82 s
Wall time: 7.8 s


In [14]:
# Uncomment the below line if you want to inspect the ordered results
# ordered_results

In [15]:
top_docs = []
for key,value in ordered_results.items():
    location = value["location"] if value["location"] is not None else ""
    top_docs.append(Document(page_content=value["chunk"], metadata={"source": location+os.environ['BLOB_SAS_TOKEN']}))
        
print("Number of chunks:",len(top_docs))

Number of chunks: 5


In [16]:
# Calculate number of tokens of our docs
if(len(top_docs)>0):
    tokens_limit = model_tokens_limit(MODEL) # this is a custom function we created in common/utils.py
    prompt_tokens = num_tokens_from_string(COMBINE_CHAT_PROMPT_TEMPLATE) # this is a custom function we created in common/utils.py
    context_tokens = num_tokens_from_docs(top_docs) # this is a custom function we created in common/utils.py
    
    requested_tokens = prompt_tokens + context_tokens + COMPLETION_TOKENS
    
    chain_type = "map_reduce" if requested_tokens > 0.9 * tokens_limit else "stuff"  
    
    print("System prompt token count:",prompt_tokens)
    print("Max Completion Token count:", COMPLETION_TOKENS)
    print("Combined docs (context) token count:",context_tokens)
    print("--------")
    print("Requested token count:",requested_tokens)
    print("Token limit for", MODEL, ":", tokens_limit)
    print("Chain Type selected:", chain_type)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")

System prompt token count: 2464
Max Completion Token count: 1000
Combined docs (context) token count: 924
--------
Requested token count: 4388
Token limit for gpt-35-turbo-16k : 16384
Chain Type selected: stuff


In [17]:
%%time
# Get the answer
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

Reinforcement learning can be used in various use cases, including:
1. Learning prevention strategies in the context of pandemic influenza<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[1]</a></sup>.
2. Personalized music recommendation based on capturing changes in listeners' preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[2]</a></sup>.
3. Fairness-aware recommendation systems that dynamically maintain a balance between accuracy and fairness<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[3]</a></sup>.
4. Computing lockdown decisions for individual cities or regions, considering health and economic considerations<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[4]</a></sup>.
5. Modeling epidemics and predicting the spread of diseases, considering individual decisions and external interventions<sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[5]</a></sup>.

References:
[1] Source: [1]
[2] Source: [2]
[3] Source: [3]
[4] Source: [4]
[5] Source: [5]

CPU times: user 7.51 ms, sys: 347 µs, total: 7.86 ms
Wall time: 16.1 s


And if we ask the follow up question:

In [18]:
response = get_answer(llm=llm, docs=top_docs,  query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

The main points of our conversation are about the use of reinforcement learning in different domains. In the context of pandemic influenza, a deep reinforcement learning approach is used to automatically learn prevention strategies, and collaboration between districts is shown to be advantageous in designing prevention strategies<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[1]</a></sup>. In the domain of music recommendation, a personalized hybrid recommendation algorithm based on reinforcement learning is proposed to recommend song sequences that match listeners' preferences better<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[2]</a></sup>. In the field of interactive recommender systems, a reinforcement learning-based framework called FairRec is proposed to dynamically maintain a balance between accuracy and fairness in recommendation systems<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[3]</a></sup>. Additionally, a quantitative approach is presented for computing lockdown decisions for individual cities or regions, taking into account health and economic considerations, and the policies are learned automatically using reinforcement learning<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[4]</a></sup>. Finally, a microscopic approach is introduced to model epidemics, where individual decisions can affect the spread of the disease, and game theory and multi-agent reinforcement learning are used to optimize individual decisions and make predictions about the spread of the disease<sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[5]</a></sup>.

You might get a different response from above, but it doesn't matter what response you get, it will be based on the context given, not on previous answers.

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/docs/modules/memory/how_to/adding_memory_chain_multiple_inputs

In [19]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has several use cases in various domains. Here are a few examples:

1. **Epidemiology**: Reinforcement learning can be used to automatically learn prevention strategies in the context of epidemics, such as pandemic influenza. It can help in designing mitigation policies in complex epidemiological models with a large state space, considering collaboration between districts or regions when designing prevention strategies<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[1]</a></sup><sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[5]</a></sup>.

2. **Recommendation Systems**: Reinforcement learning can be applied to personalized recommendation systems, such as music recommendation. It can capture the changes in users' preferences sensitively and enhance the simulation of the interaction process to continuously update the recommendation model based on user preferences for songs and song transitions<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[2]</a></sup>.

3. **Fairness in Recommendation**: Reinforcement learning can be used to dynamically maintain a long-term balance between accuracy and fairness in interactive recommender systems. It compresses user preferences and the system's fairness status into a state representation to generate recommendations and aims to maximize a cumulative reward that combines accuracy and fairness<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[3]</a></sup>.

4. **Policy Making**: Reinforcement learning can assist in policy-making decisions, such as computing lockdown decisions for individual cities or regions. By balancing health and economic considerations, reinforcement learning algorithms can automatically learn lockdown policies based on disease parameters and population characteristics<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[4]</a></sup>.

These are just a few examples, and reinforcement learning can be applied to many other domains and use cases. Let me know if you have any other questions!

In [20]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Here are the main points of our conversation:

1. Reinforcement learning has several use cases in various domains.
2. In epidemiology, reinforcement learning can be used to automatically learn prevention strategies in the context of epidemics, such as pandemic influenza. It can help design mitigation policies in complex epidemiological models with a large state space, considering collaboration between districts or regions when designing prevention strategies<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[1]</a></sup><sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[5]</a></sup>.
3. In recommendation systems, reinforcement learning can be applied to personalized recommendation systems, such as music recommendation. It can capture changes in users' preferences sensitively and enhance the simulation of the interaction process to continuously update the recommendation model based on user preferences for songs and song transitions<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[2]</a></sup>.
4. In fairness in recommendation, reinforcement learning can be used to dynamically maintain a long-term balance between accuracy and fairness in interactive recommender systems. It compresses user preferences and the system's fairness status into a state representation to generate recommendations and aims to maximize a cumulative reward that combines accuracy and fairness<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[3]</a></sup>.
5. Reinforcement learning can assist in policy-making decisions, such as computing lockdown decisions for individual cities or regions. By balancing health and economic considerations, reinforcement learning algorithms can automatically learn lockdown policies based on disease parameters and population characteristics<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[4]</a></sup>.

These are just a few examples, and reinforcement learning can be applied to many other domains and use cases. Let me know if you have any other questions!

In [21]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has several use cases in various domains.
2. In epidemiology, reinforcement learning can be used to automatically learn prevention strategies in the context of epidemics, such as pandemic influenza. It can help design mitigation policies in complex epidemiological models with a large state space, considering collaboration between districts or regions when designing prevention strategies<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[1]</a></sup><sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[5]</a></sup>.
3. In recommendation systems, reinforcement learning can be applied to personalized recommendation systems, such as music recommendation. It can capture changes in users' preferences sensitively and enhance the simulation of the interaction process to continuously update the recommendation model based on user preferences for songs and song transitions<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[2]</a></sup>.
4. In fairness in recommendation, reinforcement learning can be used to dynamically maintain a long-term balance between accuracy and fairness in interactive recommender systems. It compresses user preferences and the system's fairness status into a state representation to generate recommendations and aims to maximize a cumulative reward that combines accuracy and fairness<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[3]</a></sup>.
5. Reinforcement learning can assist in policy-making decisions, such as computing lockdown decisions for individual cities or regions. By balancing health and economic considerations, reinforcement learning algorithms can automatically learn lockdown policies based on disease parameters and population characteristics<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[4]</a></sup>.

These points summarize our discussion on the use cases of reinforcement learning. Let me know if there's anything else I can help with!

You might get a different answer on the above cell, and it is ok, this bot is not yet well configured to answer any question that is not related to its knowledge base, including salutations.

Let's check our memory to see that it's keeping the conversation

In [22]:
memory.buffer

'Human: Tell me some use cases for reinforcement learning\nAI: Reinforcement learning has several use cases in various domains. Here are a few examples:\n\n1. **Epidemiology**: Reinforcement learning can be used to automatically learn prevention strategies in the context of epidemics, such as pandemic influenza. It can help in designing mitigation policies in complex epidemiological models with a large state space, considering collaboration between districts or regions when designing prevention strategies<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[1]</a></sup><sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[5]</a></sup>.\n\n2. **Recommendation Systems**

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wisg to provide recommendations. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory, see [HERE](https://python.langchain.com/en/latest/_modules/langchain/memory/chat_message_histories/cosmos_db.html)

In [23]:
# Create CosmosDB instance from langchain cosmos class.
cosmos = CosmosDBChatMessageHistory(
    cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
    cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
    cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
    connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
    session_id="Agent-Test-Session" + str(random.randint(1, 1000)),
    user_id="Agent-Test-User" + str(random.randint(1, 1000))
    )

# prepare the cosmosdb instance
cosmos.prepare_cosmos()

In [24]:
# Create or Memory Object
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question",chat_memory=cosmos)

In [25]:
# Testing using our Question
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has several use cases in different domains. Here are a few examples:

1. **Epidemic Prevention**: Reinforcement learning can be used to automatically learn prevention strategies in the context of pandemics, such as influenza. By constructing epidemiological models and using deep reinforcement learning algorithms, researchers have shown that reinforcement learning can be used to learn mitigation policies in complex epidemiological models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[1]</a></sup>.

2. **Personalized Music Recommendation**: Reinforcement learning can be used to improve personalized music recommendation systems. By simulating the interaction process between listeners and continuously updating the model based on their preferences, reinforcement learning algorithms can recommend song sequences that better match listeners' preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[2]</a></sup>.

3. **Fairness in Recommendation Systems**: Reinforcement learning can also be used to improve fairness in recommendation systems. By dynamically maintaining a long-term balance between accuracy and fairness, reinforcement learning algorithms can generate recommendations that consider both user preferences and the system's fairness status<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[3]</a></sup>.

4. **Optimizing Lockdown Decisions**: Reinforcement learning can help compute lockdown decisions for individual cities or regions, taking into account health and economic considerations. By automatically learning policies based on disease parameters and population characteristics, reinforcement learning algorithms can provide a quantitative approach towards lockdown decisions<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[4]</a></sup>.

5. **Modeling and Predicting Epidemics**: Reinforcement learning can be used to model and predict the spread of infectious diseases, such as COVID-19. By formulating microscopic multi-agent epidemic models and solving for optimal decisions using game theory and multi-agent reinforcement learning, researchers can make predictions about the spread of diseases and identify interventions to regulate agents' behaviors<sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[5]</a></sup>.

These are just a few examples of the diverse applications of reinforcement learning. It is a powerful technique that can be applied to various problem domains to learn optimal policies and make informed decisions.

In [26]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has several use cases in different domains, including epidemic prevention, personalized music recommendation, fairness in recommendation systems, optimizing lockdown decisions, and modeling and predicting epidemics.

2. In the context of epidemic prevention, reinforcement learning can be used to automatically learn prevention strategies in the context of pandemics, such as influenza. Researchers have shown that reinforcement learning can be used to learn mitigation policies in complex epidemiological models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[1]</a></sup>.

3. In the field of personalized music recommendation, reinforcement learning algorithms can simulate the interaction process between listeners and continuously update the model based on their preferences, leading to better song sequence recommendations<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[2]</a></sup>.

4. Fairness in recommendation systems can be improved using reinforcement learning algorithms that dynamically maintain a balance between accuracy and fairness. These algorithms generate recommendations that consider both user preferences and the system's fairness status<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[3]</a></sup>.

5. Reinforcement learning can be used to compute lockdown decisions for individual cities or regions, taking into account health and economic considerations. These decisions are based on disease parameters and population characteristics, and reinforcement learning algorithms provide a quantitative approach towards lockdown decisions<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[4]</a></sup>.

6. Reinforcement learning can be used to model and predict the spread of infectious diseases, such as COVID-19. By formulating microscopic multi-agent epidemic models and solving for optimal decisions using game theory and multi-agent reinforcement learning, researchers can make predictions about the spread of diseases and identify interventions to regulate agents' behaviors<sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[5]</a></sup>.

These points summarize the main applications of reinforcement learning discussed in our conversation. Let me know if there's anything else I can help with.

In [27]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

Based on the extracted parts from the documents, here are the main points of our conversation:

1. Reinforcement learning has several use cases in different domains, including epidemic prevention, personalized music recommendation, fairness in recommendation systems, optimizing lockdown decisions, and modeling and predicting epidemics.

2. In the context of epidemic prevention, reinforcement learning can be used to automatically learn prevention strategies in the context of pandemics, such as influenza. Researchers have shown that reinforcement learning can be used to learn mitigation policies in complex epidemiological models with a large state space[1].

3. In the field of personalized music recommendation, reinforcement learning algorithms can simulate the interaction process between listeners and continuously update the model based on their preferences, leading to better song sequence recommendations[2].

4. Fairness in recommendation systems can be improved using reinforcement learning algorithms that dynamically maintain a balance between accuracy and fairness. These algorithms generate recommendations that consider both user preferences and the system's fairness status[3].

5. Reinforcement learning can be used to compute lockdown decisions for individual cities or regions, taking into account health and economic considerations. These decisions are based on disease parameters and population characteristics, and reinforcement learning algorithms provide a quantitative approach towards lockdown decisions[4].

6. Reinforcement learning can be used to model and predict the spread of infectious diseases, such as COVID-19. By formulating microscopic multi-agent epidemic models and solving for optimal decisions using game theory and multi-agent reinforcement learning, researchers can make predictions about the spread of diseases and identify interventions to regulate agents' behaviors[5].

These points summarize the main applications of reinforcement learning discussed in our conversation. Let me know if there's anything else I can help with.

Sources:
[1]<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">1</a></sup>
[2]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">2</a></sup>
[3]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206277/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">3</a></sup>
[4]<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">4</a></sup>
[5]<sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">5</a></sup>

Let me know if there's anything else I can assist you with.

Let's check our Azure CosmosDB to see the whole conversation


In [28]:
#load message from cosmosdb
cosmos.load_messages()
cosmos.messages

[HumanMessage(content='Tell me some use cases for reinforcement learning'),
 AIMessage(content='Reinforcement learning has several use cases in different domains. Here are a few examples:\n\n1. **Epidemic Prevention**: Reinforcement learning can be used to automatically learn prevention strategies in the context of pandemics, such as influenza. By constructing epidemiological models and using deep reinforcement learning algorithms, researchers have shown that reinforcement learning can be used to learn mitigation policies in complex epidemiological models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[1]</a></sup>.\n\n2. **Personalized Music Recommendation**: Reinforcement learning can be used to improve personalized music recommendation systems. By simulating the interaction process between listeners and contin

![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, regardless of the input and it struggles to respond to prompts like: Hello, Thank you, Bye, What's your name, What's the weather and any other task that is not search in the knowledge base.


## <u>Important Note</u>:<br>
As we proceed, while all the code will remain compatible with GPT-3.5 models, we highly recommend transitioning to GPT-4. Here's why:

**GPT-3.5-Turbo** can be likened to a 7-year-old child. You can provide it with concise instructions, but it frequently struggles to follow them accurately. Additionally, its limited memory can make sustained conversations challenging.

**GPT-3.5-Turbo-16k** resembles the same 7-year-old, but with an increased attention span for longer instructions. However, it still faces difficulties accurately executing them about half the time.

**GPT-4** exhibits the capabilities of a 10-12-year-old child. It possesses enhanced reasoning skills and more consistently adheres to instructions. While its memory retention for instructions is moderate, it excels at following them.

**GPT-4-32k** is akin to the 10-12-year-old child with an extended memory. It comprehends lengthy sets of instructions and engages in meaningful conversations. Thanks to its robust memory, it offers detailed responses.

Understanding this analogy above will become clearer as you complete the final notebook.


# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data?** The next notebook explains and solves the tabular problem and the concept of Agents