# Understanding Memory in LLMs

In the previous Notebook, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from openai.error import OpenAIError
from langchain.embeddings import OpenAIEmbeddings
from langchain.docstore.document import Document
from langchain.memory import CosmosDBChatMessageHistory

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import (
    get_search_results,
    update_vector_indexes,
    model_tokens_limit,
    num_tokens_from_docs,
    num_tokens_from_string,
    get_answer,
)

from common.prompts import COMBINE_CHAT_PROMPT_TEMPLATE

from dotenv import load_dotenv
load_dotenv("credentials (my).env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_BASE"]    = os.environ["AZURE_OPENAI_ENDPOINT"]
os.environ["OPENAI_API_KEY"]     = os.environ["AZURE_OPENAI_API_KEY"]
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]
os.environ["OPENAI_API_TYPE"]    = os.environ["OPENAI_API_TYPE"]

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning?"
FOLLOW_UP_QUESTION = "Give me the main points of our conversation"

In [4]:
# Define model
MODEL = os.environ["COMPLETION3516_DEPLOYMENT"] # we DO have chatgpt-4, but not needed in this case
COMPLETION_TOKENS = 500
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)

In [5]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [6]:
# Let's see what the GPT model responds
response = chain.run(QUESTION)
printmd(response)

Reinforcement learning has numerous use cases across various domains. Here are some examples:

1. Game playing: Reinforcement learning has been successfully used to train agents to play complex games such as chess, Go, and poker. AlphaGo, a famous example, defeated world champion Go player Lee Sedol.

2. Robotics: Reinforcement learning enables robots to learn complex tasks such as grasping objects, walking, and navigation in dynamic environments.

3. Autonomous vehicles: Reinforcement learning can be used to train self-driving cars to make decisions in real-time, such as lane changing, merging, and handling complex traffic scenarios.

4. Recommendation systems: Reinforcement learning can be utilized to optimize personalized recommendations for users based on their preferences and historical data.

5. Finance: Reinforcement learning can be applied to algorithmic trading, portfolio management, and risk assessment, where agents learn to make optimal decisions in dynamic market environments.

6. Healthcare: Reinforcement learning can assist in optimizing treatment plans, drug dosage determination, and disease diagnosis by learning from patient data and medical records.

7. Energy management: Reinforcement learning can optimize energy consumption in smart grids, demand response, and energy-efficient systems by learning to make decisions that minimize costs and maximize efficiency.

8. Resource allocation: Reinforcement learning can be used to optimize resource allocation in various scenarios, such as scheduling tasks in cloud computing, optimizing network routing, and managing inventory in supply chains.

9. Advertising: Reinforcement learning can be employed to optimize online advertising strategies, such as selecting the most effective ads and determining bidding strategies to maximize conversions.

10. Education: Reinforcement learning can be used to design intelligent tutoring systems that adapt to individual students' learning styles and provide personalized recommendations for effective learning.

These are just a few examples, and reinforcement learning has potential applications in many other domains where decision-making in dynamic environments is involved.

In [7]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

"I'm sorry, but as an AI language model, I do not have the ability to recall previous conversations or access any information about them. Each interaction with me is treated as a separate and independent conversation. Is there anything specific you would like to discuss or ask about?"

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
from langchain.prompts import PromptTemplate
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: {question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [9]:
conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [10]:
printmd(chain.run({"history": conversation_history, "question": FOLLOW_UP_QUESTION}))

- Reinforcement learning has numerous use cases across various domains.
- Some examples include game playing, robotics, autonomous vehicles, recommendation systems, finance, healthcare, energy management, resource allocation, advertising, and education.
- Reinforcement learning can be used to train agents to play complex games, enable robots to learn complex tasks, train self-driving cars, optimize personalized recommendations, assist in finance and healthcare decision-making, optimize energy consumption, optimize resource allocation, optimize online advertising strategies, and design intelligent tutoring systems.
- These are just a few examples, and reinforcement learning has potential applications in many other domains.

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In [11]:
# Since Memory adds tokens to the prompt, we would need a better model that allows more space on the prompt
COMPLETION_TOKENS = 1000
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)
embedder = OpenAIEmbeddings(deployment=os.environ["EMBEDDING_DEPLOYMENT"], chunk_size=1) 

In [12]:
index1_name    = "cogsrch-index-files"
index2_name    = "cogsrch-index-csv"
index3_name    = "cogsrch-index-books-vector"
text_indexes   = [index1_name, index2_name]
vector_indexes = [index+"-vector" for index in text_indexes] + [index3_name]

In [13]:
%%time

# Search in text-based indexes first and update vector indexes
k=10 # Top k results per each text-based index
ordered_results = get_search_results(QUESTION, text_indexes, k=k, reranker_threshold=1, vector_search=False)
update_vector_indexes(ordered_search_results=ordered_results, embedder=embedder)

# Search in all vector-based indexes available
similarity_k = 5 # top results from multi-vector-index similarity search
ordered_results = get_search_results(QUESTION, vector_indexes, k=k, vector_search=True,
                                        similarity_k=similarity_k,
                                        query_vector = embedder.embed_query(QUESTION))

print("Number of results:", len(ordered_results))

Number of results: 5
CPU times: total: 203 ms
Wall time: 4.19 s


In [14]:
# Uncomment the below line if you want to inspect the ordered results
ordered_results

OrderedDict([('aHR0cHM6Ly9ibG9ic3RvcmFnZWx3YmFzbnM3bHMzdHkuYmxvYi5jb3JlLndpbmRvd3MubmV0L2FyeGl2Y3MvMDAwMS8wMDAxMDA4djIucGRm0_12',
              {'title': 'arXiv:cs/0001008v2  [cs.MA]  17 Jan 2000_chunk_12',
               'name': '0001008v2.pdf',
               'location': 'https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v2.pdf',
               'caption': 'Matarić [1995] has studied reinforcement learning in multi-robot domains. She notes, for example, how learning can give rise to social behaviors (Matarić [1997]). The work shows how robots can be individually programmed to pro- duce certain group behaviors.',
               'index': 'cogsrch-index-files-vector',
               'chunk': 'predictive strategies, such as “if the state of the world was x ten time units\nbefore, then it will be x next time so take action a”. The authors later show\nhow learning can be used to eliminate these chaotic global fluctuations.\n\nMatarić [1995] has studied reinforcem

In [15]:
top_docs = []
for key,value in ordered_results.items():
    location = value["location"] if value["location"] is not None else ""
    top_docs.append(Document(page_content=value["chunk"], metadata={"source": location+os.environ['BLOB_SAS_TOKEN']}))
        
print("Number of chunks:",len(top_docs))

Number of chunks: 5


In [16]:
# Uncomment the below line if you want to inspect the ordered top_docs
print(f"Nr. of elements in top_docs: {len(top_docs)}. Here they are:\n{top_docs}")

Nr. of elements in top_docs: 5. Here they are:
[Document(page_content='predictive strategies, such as “if the state of the world was x ten time units\nbefore, then it will be x next time so take action a”. The authors later show\nhow learning can be used to eliminate these chaotic global fluctuations.\n\nMatarić [1995] has studied reinforcement learning in multi-robot domains.\nShe notes, for example, how learning can give rise to social behaviors (Matarić\n[1997]). The work shows how robots can be individually programmed to pro-\nduce certain group behaviors. It represents a good example of the usefulness\nand flexibility of learning agents in multi-agent domains. However, the author\ndoes not offer a mathematical justification for the chosen individual learning\nalgorithms, nor does she explain why the agents were able to converge to the\nglobal behaviors. Our research hopes to provide the first steps in this direction.\n\nOne particularly interesting approach is taken by Carmel an

In [17]:
# Calculate number of tokens of our docs
if(len(top_docs)>0):
    tokens_limit   = model_tokens_limit(MODEL) # this is a custom function we created in common/utils.py
    prompt_tokens  = num_tokens_from_string(COMBINE_CHAT_PROMPT_TEMPLATE) # this is a custom function we created in common/utils.py
    context_tokens = num_tokens_from_docs(top_docs) # this is a custom function we created in common/utils.py
    
    requested_tokens = prompt_tokens + context_tokens + COMPLETION_TOKENS
    
    chain_type = "map_reduce" if requested_tokens > 0.9 * tokens_limit else "stuff"  
    
    print("System prompt token count:", prompt_tokens)
    print("Max Completion Token count:", COMPLETION_TOKENS)
    print("Combined docs (context) token count:", context_tokens)
    print("--------")
    print("Requested token count:", requested_tokens)
    print("Token limit for", MODEL, ":", tokens_limit)
    print("Chain Type selected:", chain_type)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")

System prompt token count: 2464
Max Completion Token count: 1000
Combined docs (context) token count: 2666
--------
Requested token count: 6130
Token limit for gpt-35-turbo-16k : 16384
Chain Type selected: stuff


## Using load_qa_with_sources_chain
The *get_answer* function in the next cell is equivalent to the next code, in case **chain_type = map_reduce**:

```
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from common.prompts import (COMBINE_PROMPT, COMBINE_QUESTION_PROMPT)

llm35 = AzureChatOpenAI(deployment_name="gpt-35-turbo", temperature=0.5, max_tokens=COMPLETION_TOKENS)
chain = load_qa_with_sources_chain(llm35, chain_type="map_reduce",
                                   combine_prompt = COMBINE_PROMPT,
                                   question_prompt = COMBINE_QUESTION_PROMPT,
                                   return_intermediate_steps = True)

answer = chain( {"input_documents": top_docs, "question": QUESTION, "language": "it"}, return_only_outputs=True)
```

In [18]:
%%time
# Get the answer
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

Reinforcement learning has been applied to various use cases, including:

1. Multi-robot domains: Reinforcement learning has been used to study social behaviors and group behaviors in multi-robot domains<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[22]</a></sup>.

2. Model-based learning: Agents can build models of other agents via observations and learn effective models through reinforcement learning. This approach has been used to learn finite-state machine models of other agents<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[4]</a></sup>.

3. Agent coordination in multi-agent systems (MASs): Reinforcement learning agents in MASs can learn system-wide optimal behavior and develop agent coordination<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[28]</a></sup>.

4. Epidemic prevention strategies: Reinforcement learning has been used to learn prevention strategies in the context of pandemic influenza, balancing health and economic considerations<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[20]</a></sup>.

5. Lockdown decisions during epidemics: Reinforcement learning has been used to compute lockdown decisions for individual cities or regions, considering health and economic factors<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[21]</a></sup>.

6. Personalized recommendation systems: Reinforcement learning has been used to improve personalized music recommendation algorithms, capturing changes in listeners' preferences and song transitions<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7

CPU times: total: 15.6 ms
Wall time: 16.3 s


In [19]:
printmd(response['output_text'])

Reinforcement learning has been applied to various use cases, including:

1. Multi-robot domains: Reinforcement learning has been used to study social behaviors and group behaviors in multi-robot domains<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[22]</a></sup>.

2. Model-based learning: Agents can build models of other agents via observations and learn effective models through reinforcement learning. This approach has been used to learn finite-state machine models of other agents<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[4]</a></sup>.

3. Agent coordination in multi-agent systems (MASs): Reinforcement learning agents in MASs can learn system-wide optimal behavior and develop agent coordination<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[28]</a></sup>.

4. Epidemic prevention strategies: Reinforcement learning has been used to learn prevention strategies in the context of pandemic influenza, balancing health and economic considerations<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[20]</a></sup>.

5. Lockdown decisions during epidemics: Reinforcement learning has been used to compute lockdown decisions for individual cities or regions, considering health and economic factors<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[21]</a></sup>.

6. Personalized recommendation systems: Reinforcement learning has been used to improve personalized music recommendation algorithms, capturing changes in listeners' preferences and song transitions<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7

And if we ask the follow up question:

In [20]:
# follow-up questions don't work!
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

There is no information provided in the extracted parts about the main points of our conversation.

You might get a different response from above, but it doesn't matter what response you get, it will be based on the context given, not on previous answers.

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/docs/modules/memory/how_to/adding_memory_chain_multiple_inputs

In [21]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type,
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has several use cases across different domains. Here are some examples:

1. **Robotics**: Reinforcement learning can be used to train robots to perform complex tasks and navigate in dynamic environments. For example, Matarić [1995] studied reinforcement learning in multi-robot domains and showed how learning can give rise to social behaviors.

2. **Epidemiology**: Reinforcement learning can be applied to model and predict the spread of diseases, such as pandemic influenza. It can be used to learn prevention strategies and optimize policies for limiting the damage caused by infectious diseases [2].

3. **Multi-Agent Systems**: Reinforcement learning is useful in studying the behavior of multiple learning agents in complex environments. It can be applied to develop agent coordination and learn optimal system-wide behaviors [27, 28].

4. **Healthcare**: Reinforcement learning can be used in personalized recommendation systems for healthcare. For example, it can learn users' preferences and recommend personalized music playlists based on their feedback and interaction patterns [32].

These are just a few examples of the wide range of applications for reinforcement learning. It can be applied to various domains where decision-making and learning from interactions are important. 

If you would like to explore any of these topics further, please let me know.

References:
- [2]<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D" target="_blank">[2]</a></sup>: "Deep Reinforcement Learning for Prevention Strategies in Pandemic Influenza"
- [27]<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D" target="_blank">[27]</a></sup>: "Learning Agents in Multi-Agent Systems"
- [32]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D" target="_blank">[32]</a></sup>: "Personalized Hybrid Recommendation Algorithm for Music Based on Reinforcement Learning"
- [28]<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v3.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D" target="_blank">[28]</a></sup>: "Reinforcement Learning in Multi-Agent Systems"

Is there anything else I can assist you with?

In [22]:
memory.buffer

'Human: Tell me some use cases for reinforcement learning?\nAI: Reinforcement learning has several use cases across different domains. Here are some examples:\n\n1. **Robotics**: Reinforcement learning can be used to train robots to perform complex tasks and navigate in dynamic environments. For example, Matarić [1995] studied reinforcement learning in multi-robot domains and showed how learning can give rise to social behaviors.\n\n2. **Epidemiology**: Reinforcement learning can be applied to model and predict the spread of diseases, such as pandemic influenza. It can be used to learn prevention strategies and optimize policies for limiting the damage caused by infectious diseases [2].\n\n3. **Multi-Agent Systems**: Reinforcement learning is useful in studying the behavior of multiple learning agents in complex environments. It can be applied to develop agent coordination and learn optimal system-wide behaviors [27, 28].\n\n4. **Healthcare**: Reinforcement learning can be used in per

In [23]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Here are the main points of our conversation:

1. Reinforcement learning has several use cases across different domains.
2. Some examples include robotics, epidemiology, multi-agent systems, and healthcare.
3. In robotics, reinforcement learning can be used to train robots to perform complex tasks and navigate in dynamic environments.
4. In epidemiology, reinforcement learning can be applied to model and predict the spread of diseases, such as pandemic influenza, and learn prevention strategies.
5. In multi-agent systems, reinforcement learning is useful for studying the behavior of multiple learning agents in complex environments and developing agent coordination.
6. In healthcare, reinforcement learning can be used in personalized recommendation systems, such as personalized music playlists based on user feedback and interaction patterns.
7. These are just a few examples, and reinforcement learning can be applied to various domains where decision-making and learning from interactions are important.

If you would like more information or have any other questions, feel free to let me know!

References:
- [2]<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D" target="_blank">[2]</a></sup>: "Deep Reinforcement Learning for Prevention Strategies in Pandemic Influenza"
- [27]<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D" target="_blank">[27]</a></sup>: "Learning Agents in Multi-Agent Systems"
- [32]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D" target="_blank">[32]</a></sup>: "Personalized Hybrid Recommendation Algorithm for Music Based on Reinforcement Learning"
- [28]<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v3.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D" target="_blank">[28]</a></sup>: "Reinforcement Learning in Multi-Agent Systems"

Let me know if there's anything else I can assist you with!

In [24]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has several use cases across different domains.
2. Some examples include robotics, epidemiology, multi-agent systems, and healthcare.
3. In robotics, reinforcement learning can be used to train robots to perform complex tasks and navigate in dynamic environments.
4. In epidemiology, reinforcement learning can be applied to model and predict the spread of diseases, such as pandemic influenza, and learn prevention strategies [2].
5. In multi-agent systems, reinforcement learning is useful for studying the behavior of multiple learning agents in complex environments and developing agent coordination [27, 28].
6. In healthcare, reinforcement learning can be used in personalized recommendation systems, such as personalized music playlists based on user feedback and interaction patterns [32].

These are just a few examples, and reinforcement learning can be applied to various domains where decision-making and learning from interactions are important.

References:
- [2]<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D" target="_blank">[2]</a></sup>: "Deep Reinforcement Learning for Prevention Strategies in Pandemic Influenza"
- [27]<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D" target="_blank">[27]</a></sup>: "Learning Agents in Multi-Agent Systems"
- [32]<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D" target="_blank">[32]</a></sup>: "Personalized Hybrid Recommendation Algorithm for Music Based on Reinforcement Learning"
- [28]<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v3.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D" target="_blank">[28]</a></sup>: "Reinforcement Learning in Multi-Agent Systems"

Let me know if there's anything else I can assist you with!

You might get a different answer on the above cell, and it is ok, this bot is not yet well configured to answer any question that is not related to its knowledge base, including salutations.

Let's check our memory to see that it's keeping the conversation

In [25]:
memory.buffer

'Human: Tell me some use cases for reinforcement learning?\nAI: Reinforcement learning has several use cases across different domains. Here are some examples:\n\n1. **Robotics**: Reinforcement learning can be used to train robots to perform complex tasks and navigate in dynamic environments. For example, Matarić [1995] studied reinforcement learning in multi-robot domains and showed how learning can give rise to social behaviors.\n\n2. **Epidemiology**: Reinforcement learning can be applied to model and predict the spread of diseases, such as pandemic influenza. It can be used to learn prevention strategies and optimize policies for limiting the damage caused by infectious diseases [2].\n\n3. **Multi-Agent Systems**: Reinforcement learning is useful in studying the behavior of multiple learning agents in complex environments. It can be applied to develop agent coordination and learn optimal system-wide behaviors [27, 28].\n\n4. **Healthcare**: Reinforcement learning can be used in per

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wisg to provide recommendations. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory, see [HERE](https://python.langchain.com/en/latest/_modules/langchain/memory/chat_message_histories/cosmos_db.html)

In [26]:
# Create CosmosDB instance from langchain cosmos class.
cosmos = CosmosDBChatMessageHistory(
    cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
    cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
    cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
    connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
    session_id="Agent-Test-Session" + str(random.randint(1, 1000)),
    user_id="Agent-Test-User" + str(random.randint(1, 1000))
    )

# prepare the cosmosdb instance
cosmos.prepare_cosmos()

In [27]:
# Create or Memory Object
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question",chat_memory=cosmos)

In [28]:
# Testing using our Question
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has several use cases in various domains. Here are some examples:

1. **Robotics**: Reinforcement learning can be used to train robots to perform complex tasks and navigate their environment. For example, researchers have studied how learning can give rise to social behaviors in multi-robot domains, where robots can be individually programmed to produce certain group behaviors<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[22]</a></sup>.

2. **Epidemiology**: Reinforcement learning has been applied to develop prevention strategies in the context of pandemic influenza. Researchers have used deep reinforcement learning to automatically learn prevention strategies and control the spread of infectious diseases<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[23]</a></sup>.

3. **Multi-Agent Systems**: Reinforcement learning can be used to study the behavior of multi-agent systems (MASs) composed of learning agents. It has been shown that learning agents in MASs can converge to system-wide optimal behavior, and reinforcement learning algorithms have been developed for agent coordination and cooperation<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v3.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[28]</a></sup>.

4. **Public Health**: Reinforcement learning can be used to compute lockdown decisions for individual cities or regions, considering both health and economic considerations. By using reinforcement learning algorithms, policies for controlling the spread of diseases can be learned automatically based on disease parameters and population characteristics<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[29]</a></sup>.

5. **Recommendation Systems**: Reinforcement learning can be used to personalize recommendation systems, such as music recommendation. By continuously updating the model based on user preferences, reinforcement learning algorithms can recommend song sequences that better match listeners' preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[36]</a></sup>.

These are just a few examples of the diverse applications of reinforcement learning. The field continues to evolve, and researchers are exploring new domains and problem areas where reinforcement learning can be applied effectively. Let me know if there's anything else I

In [29]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has several use cases in various domains.
2. In robotics, reinforcement learning can be used to train robots to perform complex tasks and navigate their environment<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[22]</a></sup>.
3. In epidemiology, reinforcement learning has been applied to develop prevention strategies for infectious diseases like pandemic influenza<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[23]</a></sup>.
4. In multi-agent systems, reinforcement learning can be used to study the behavior of learning agents and achieve system-wide optimal behavior<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v3.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[28]</a></sup>.
5. In public health, reinforcement learning can be used to compute lockdown decisions for cities or regions, considering health and economic factors<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[29]</a></sup>.
6. In recommendation systems, reinforcement learning can be used to personalize recommendations, such as music recommendations, by continuously updating the model based on user preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[36]</a></sup>.

These examples highlight the diverse applications of reinforcement learning in different fields. If you have any more questions, feel free to ask!

In [30]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has several use cases in various domains.
2. In robotics, reinforcement learning can be used to train robots to perform complex tasks and navigate their environment<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[22]</a></sup>.
3. In epidemiology, reinforcement learning has been applied to develop prevention strategies for infectious diseases like pandemic influenza<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[23]</a></sup>.
4. In multi-agent systems, reinforcement learning can be used to study the behavior of learning agents and achieve system-wide optimal behavior<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v3.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[28]</a></sup>.
5. In public health, reinforcement learning can be used to compute lockdown decisions for cities or regions, considering health and economic factors<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[29]</a></sup>.
6. In recommendation systems, reinforcement learning can be used to personalize recommendations, such as music recommendations, by continuously updating the model based on user preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[36]</a></sup>.

These examples highlight the diverse applications of reinforcement learning in different fields. If you have any more questions, feel free to ask!

Let's check our Azure CosmosDB to see the whole conversation


In [31]:
#load message from cosmosdb
cosmos.load_messages()
cosmos.messages

[HumanMessage(content='Tell me some use cases for reinforcement learning?', additional_kwargs={}, example=False),
 AIMessage(content='Reinforcement learning has several use cases in various domains. Here are some examples:\n\n1. **Robotics**: Reinforcement learning can be used to train robots to perform complex tasks and navigate their environment. For example, researchers have studied how learning can give rise to social behaviors in multi-robot domains, where robots can be individually programmed to produce certain group behaviors<sup><a href="https://blobstoragelwbasns7ls3ty.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2021-10-04&ss=btqf&srt=sco&st=2023-10-14T09%3A46%3A05Z&se=2030-12-30T23%3A00%3A00Z&sp=rl&sig=154k5RvEq964JHojm%2BU7iFiYzczAXcaHBZ7wClxSj5I%3D">[22]</a></sup>.\n\n2. **Epidemiology**: Reinforcement learning has been applied to develop prevention strategies in the context of pandemic influenza. Researchers have used deep reinforcement learning to automatically le

![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, regardless of the input and it struggles to respond to prompts like: Hello, Thank you, Bye, What's your name, What's the weather and any other task that is not search in the knowledge base.


## <u>Important Note</u>:<br>
As we proceed, while all the code will remain compatible with GPT-3.5 models, we highly recommend transitioning to GPT-4. Here's why:

**GPT-3.5-Turbo** can be likened to a 7-year-old child. You can provide it with concise instructions, but it frequently struggles to follow them accurately. Additionally, its limited memory can make sustained conversations challenging.

**GPT-3.5-Turbo-16k** resembles the same 7-year-old, but with an increased attention span for longer instructions. However, it still faces difficulties accurately executing them about half the time.

**GPT-4** exhibits the capabilities of a 10-12-year-old child. It possesses enhanced reasoning skills and more consistently adheres to instructions. While its memory retention for instructions is moderate, it excels at following them.

**GPT-4-32k** is akin to the 10-12-year-old child with an extended memory. It comprehends lengthy sets of instructions and engages in meaningful conversations. Thanks to its robust memory, it offers detailed responses.

Understanding this analogy above will become clearer as you complete the final notebook.


# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data?** The next notebook explains and solves the tabular problem and the concept of Agents