# Understanding Memory in LLMs

In the previous Notebook, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.embeddings import AzureOpenAIEmbeddings
from langchain.docstore.document import Document
from langchain.memory import CosmosDBChatMessageHistory

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import (
    get_search_results,
    update_vector_indexes,
    model_tokens_limit,
    num_tokens_from_docs,
    num_tokens_from_string,
    get_answer,
)

from common.prompts import COMBINE_CHAT_PROMPT_TEMPLATE

from dotenv import load_dotenv
load_dotenv("credentials.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning"
FOLLOW_UP_QUESTION = "Give me the main points of our conversation"

In [4]:
# Define model
MODEL = "gpt-35-turbo"
COMPLETION_TOKENS = 500
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)

In [5]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [6]:
# Let's see what the GPT model responds
response = chain.run(QUESTION)
printmd(response)

1. Autonomous vehicles: Reinforcement learning can be used to train self-driving cars to make decisions such as when to accelerate, brake, or change lanes based on the current road conditions and traffic.

2. Robotics: Reinforcement learning can be used to train robots to perform complex tasks such as grasping and manipulating objects, navigating through cluttered environments, and interacting with humans.

3. Game playing: Reinforcement learning has been used to train AI agents to play complex games such as chess, Go, and video games. These agents learn from experience and improve their strategies over time.

4. Personalized recommendation systems: Reinforcement learning can be used to create personalized recommendation systems for products, movies, or music based on user preferences and feedback.

5. Finance: Reinforcement learning can be used to optimize trading strategies, portfolio management, and risk assessment in financial markets.

6. Healthcare: Reinforcement learning can be used to develop personalized treatment plans for patients based on their medical history and current condition.

7. Energy management: Reinforcement learning can be used to optimize energy consumption in smart grids, HVAC systems, and other energy-intensive applications.

8. Industrial automation: Reinforcement learning can be used to optimize production processes, predictive maintenance, and quality control in manufacturing and other industrial settings.

9. Natural language processing: Reinforcement learning can be used to train chatbots and virtual assistants to understand and respond to user queries in a more natural and context-aware manner.

10. Adaptive control systems: Reinforcement learning can be used to develop adaptive control systems for autonomous vehicles, drones, and other complex systems that need to continuously adjust their behavior based on changing environments.

In [7]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

'1. We discussed the current project status and the upcoming deadline.\n2. We talked about the challenges we are facing and potential solutions.\n3. We reviewed the responsibilities and tasks for each team member.\n4. We discussed the need for better communication and collaboration within the team.\n5. We agreed on a plan of action to address the issues and improve our project management.'

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: {question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [10]:
printmd(chain.run({"history":Conversation_history, "question": FOLLOW_UP_QUESTION}))

1. Use cases for reinforcement learning include autonomous vehicles, robotics, game playing, personalized recommendation systems, finance, healthcare, energy management, industrial automation, natural language processing, and adaptive control systems.
2. Reinforcement learning can be used to train self-driving cars, robots, AI agents for game playing, personalized recommendation systems, optimize trading strategies, develop personalized treatment plans, optimize energy consumption, industrial automation processes, train chatbots and virtual assistants, and develop adaptive control systems.
3. These applications of reinforcement learning involve training systems to make decisions, perform complex tasks, improve strategies over time, create personalized recommendations, optimize processes, and develop adaptive control systems based on changing environments.

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In [11]:
# Since Memory adds tokens to the prompt, we would need a better model that allows more space on the prompt
MODEL = "gpt-35-turbo-16k"
COMPLETION_TOKENS = 2000
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)
embedder = AzureOpenAIEmbeddings(deployment="text-embedding-ada-002", chunk_size=1) 

In [12]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
index3_name = "cogsrch-index-books-vector"
text_indexes = [index1_name, index2_name]
vector_indexes = [index+"-vector" for index in text_indexes] + [index3_name]

In [13]:
%%time

# Search in text-based indexes first and update vector indexes
k=10 # Top k results per each text-based index
ordered_results = get_search_results(QUESTION, text_indexes, k=k, reranker_threshold=1, vector_search=False)
update_vector_indexes(ordered_search_results=ordered_results, embedder=embedder)

# Search in all vector-based indexes available
similarity_k = 5 # top results from multi-vector-index similarity search
ordered_results = get_search_results(QUESTION, vector_indexes, k=k, vector_search=True,
                                        similarity_k=similarity_k,
                                        query_vector = embedder.embed_query(QUESTION))
print("Number of results:",len(ordered_results))

Number of results: 5
CPU times: user 5.85 s, sys: 119 ms, total: 5.97 s
Wall time: 32.3 s


In [14]:
# Uncomment the below line if you want to inspect the ordered results
# ordered_results

In [15]:
top_docs = []
for key,value in ordered_results.items():
    location = value["location"] if value["location"] is not None else ""
    top_docs.append(Document(page_content=value["chunk"], metadata={"source": location+os.environ['BLOB_SAS_TOKEN']}))
        
print("Number of chunks:",len(top_docs))

Number of chunks: 5


In [16]:
# Calculate number of tokens of our docs
if(len(top_docs)>0):
    tokens_limit = model_tokens_limit(MODEL) # this is a custom function we created in common/utils.py
    prompt_tokens = num_tokens_from_string(COMBINE_CHAT_PROMPT_TEMPLATE) # this is a custom function we created in common/utils.py
    context_tokens = num_tokens_from_docs(top_docs) # this is a custom function we created in common/utils.py
    
    requested_tokens = prompt_tokens + context_tokens + COMPLETION_TOKENS
    
    chain_type = "map_reduce" if requested_tokens > 0.9 * tokens_limit else "stuff"  
    
    print("System prompt token count:",prompt_tokens)
    print("Max Completion Token count:", COMPLETION_TOKENS)
    print("Combined docs (context) token count:",context_tokens)
    print("--------")
    print("Requested token count:",requested_tokens)
    print("Token limit for", MODEL, ":", tokens_limit)
    print("Chain Type selected:", chain_type)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")

System prompt token count: 2464
Max Completion Token count: 2000
Combined docs (context) token count: 1821
--------
Requested token count: 6285
Token limit for gpt-35-turbo-16k : 16384
Chain Type selected: stuff


In [17]:
%%time
# Get the answer
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

Reinforcement learning has several use cases, including:

1. Learning prevention strategies for epidemics: Deep reinforcement learning can be used to automatically learn prevention strategies in the context of pandemic influenza. It can learn mitigation policies in complex epidemiological models with a large state space, and consider collaboration between districts when designing prevention strategies<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[1]</a></sup>.

2. Optimal behavior modeling: Reinforcement learning can be used to model and predict the spread of diseases like Covid-19 and compute lockdown decisions for individual cities or regions. It balances health and economic considerations and can learn policies automatically based on disease parameters and population characteristics<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[2]</a></sup>.

3. Sparse reward tasks and exploration: Reinforcement learning can be used in tasks with sparse rewards by combining self-imitation learning, which encourages exploitation, with exploration bonuses, which enhance exploration. This combination improves performance in environments with episodic reward settings<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[3]</a></sup>.

4. Modeling epidemics and individual decisions: Reinforcement learning can be used to model epidemics at a microscopic level, where individual agents make decisions that affect the spread of the disease. By solving for the optimal decisions of individual agents, predictions about the spread of the disease can be made. This approach can also consider external interventions to regulate agents' behaviors<sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[4]</a></sup>.

References:
[1] Source: [1]
[2] Source: [2]
[3] Source: [3]
[4] Source: [4]

CPU times: user 22 ms, sys: 3.54 ms, total: 25.6 ms
Wall time: 8.68 s


And if we ask the follow up question:

In [18]:
response = get_answer(llm=llm, docs=top_docs,  query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

The main points of our conversation are about the use of deep reinforcement learning to automatically learn prevention strategies in the context of pandemic influenza, the construction of an epidemiological meta-population model to capture the infection process, the evaluation of the 'Proximal Policy Optimization' algorithm in a single district of the model, and the consideration of collaboration between districts when designing prevention strategies<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[1]</a></sup>.

You might get a different response from above, but it doesn't matter what response you get, it will be based on the context given, not on previous answers.

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/docs/modules/memory/how_to/adding_memory_chain_multiple_inputs

In [19]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has various use cases in different domains. Here are some examples:

1. **Epidemiology**: Reinforcement learning can be used to automatically learn prevention strategies for infectious diseases like pandemic influenza. By constructing epidemiological models and using reinforcement learning algorithms, mitigation policies can be learned to control the spread of diseases in complex scenarios<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[1]</a></sup>.

2. **Decision-making in pandemics**: Reinforcement learning can help compute lockdown decisions for individual cities or regions during a pandemic. By balancing health and economic considerations, reinforcement learning algorithms can automatically learn policies based on disease parameters and population characteristics<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[2]</a></sup>.

3. **Sparse reward tasks**: Reinforcement learning can be applied to solve tasks with sparse rewards. Techniques like self-imitation learning and exploration bonuses can be combined to enhance both exploitation and exploration, leading to improved performance in challenging environments<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[3]</a></sup>.

4. **Microscopic epidemic modeling**: Reinforcement learning can be used to model and predict the spread of epidemics at an individual level. By formulating a multi-agent epidemic model and applying game theory and multi-agent reinforcement learning, optimal decisions for individual agents can be determined, leading to predictions about the spread of the disease<sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[4]</a></sup>.

These are just a few examples of the many use cases for reinforcement learning. It is a versatile approach that can be applied to various complex problems where decision-making and optimization are involved. Let me know if you have any other questions!

References:
1. [Source 1](https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D)
2. [Source 2](https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D)
3. [Source 3](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D)
4. [Source 4](https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D)

In [20]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has various use cases in different domains.
2. In epidemiology, reinforcement learning can be used to learn prevention strategies for infectious diseases like pandemic influenza<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[1]</a></sup>.
3. Reinforcement learning can help compute lockdown decisions for individual cities or regions during a pandemic, balancing health and economic considerations<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[2]</a></sup>.
4. Reinforcement learning can be applied to solve tasks with sparse rewards, using techniques like self-imitation learning and exploration bonuses<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[3]</a></sup>.
5. Reinforcement learning can be used to model and predict the spread of epidemics at an individual level, considering optimal decisions for individual agents<sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[4]</a></sup>.

These references provide more details and insights into each use case.

References:
1. [Source 1](https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D)
2. [Source 2](https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D)
3. [Source 3](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D)
4. [Source 4](https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D)

Let me know if there's anything else I can help with!

In [21]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has various use cases in different domains.
2. In epidemiology, reinforcement learning can be used to learn prevention strategies for infectious diseases like pandemic influenza<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[1]</a></sup>.
3. Reinforcement learning can help compute lockdown decisions for individual cities or regions during a pandemic, balancing health and economic considerations<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[2]</a></sup>.
4. Reinforcement learning can be applied to solve tasks with sparse rewards, using techniques like self-imitation learning and exploration bonuses<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[3]</a></sup>.
5. Reinforcement learning can be used to model and predict the spread of epidemics at an individual level, considering optimal decisions for individual agents<sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[4]</a></sup>.

These references provide more details and insights into each use case.

References:
1. [Source 1](https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D)
2. [Source 2](https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D)
3. [Source 3](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D)
4. [Source 4](https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D)

Let me know if there's anything else I can help with!

You might get a different answer on the above cell, and it is ok, this bot is not yet well configured to answer any question that is not related to its knowledge base, including salutations.

Let's check our memory to see that it's keeping the conversation

In [22]:
memory.buffer

'Human: Tell me some use cases for reinforcement learning\nAI: Reinforcement learning has various use cases in different domains. Here are some examples:\n\n1. **Epidemiology**: Reinforcement learning can be used to automatically learn prevention strategies for infectious diseases like pandemic influenza. By constructing epidemiological models and using reinforcement learning algorithms, mitigation policies can be learned to control the spread of diseases in complex scenarios<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[1]</a></sup>.\n\n2. **Decision-making in pandemics**: Reinforcement learning can help compute lockdown decisions for individual cities or regions during a pandemic. By balancing health and economic considerations, reinforcement learning algorithms can automatically learn policies based on disease paramet

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wisg to provide recommendations. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory, see [HERE](https://python.langchain.com/en/latest/_modules/langchain/memory/chat_message_histories/cosmos_db.html)

In [23]:
# Create CosmosDB instance from langchain cosmos class.
cosmos = CosmosDBChatMessageHistory(
    cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
    cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
    cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
    connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
    session_id="Agent-Test-Session" + str(random.randint(1, 1000)),
    user_id="Agent-Test-User" + str(random.randint(1, 1000))
    )

# prepare the cosmosdb instance
cosmos.prepare_cosmos()

In [24]:
# Create or Memory Object
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question",chat_memory=cosmos)

In [25]:
# Testing using our Question
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has several use cases in various fields. Here are a few examples:

1. Epidemic Prevention Strategies: In the context of infectious diseases like pandemic influenza, reinforcement learning can be used to automatically learn prevention strategies. By constructing epidemiological models and using reinforcement learning algorithms, policies can be learned to control the spread of diseases in different districts or communities<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[1]</a></sup>.

2. Optimal Behavior Modeling: Reinforcement learning can be used to model and predict the spread of diseases like Covid-19. By balancing health and economic considerations, policies for lockdown decisions can be computed for individual cities or regions<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[2]</a></sup>.

3. Sparse Reward Tasks: Reinforcement learning can be applied to solve tasks with sparse rewards. The combination of self-imitation learning and exploration bonuses can enhance both exploitation and exploration, leading to improved performance in environments with episodic reward settings<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[3]</a></sup>.

4. Individual Decision-Making and Epidemic Spread: Reinforcement learning can be used to model epidemics at a microscopic level, where individual agents make decisions that affect the spread of the disease. By solving for optimal decisions using game theory and multi-agent reinforcement learning, predictions about the spread of the disease can be made<sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[4]</a></sup>.

These are just a few examples of the use cases for reinforcement learning. It is a versatile approach that can be applied to various complex problems where learning optimal behavior is required.

In [26]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

The main points of our conversation are:

1. Reinforcement learning has several use cases in various fields, including epidemic prevention strategies, optimal behavior modeling, solving sparse reward tasks, and individual decision-making in epidemic spread<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[1]</a></sup><sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[2]</a></sup><sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[3]</a></sup><sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[4]</a></sup>.

2. Reinforcement learning can be used to automatically learn prevention strategies for epidemic influenza by constructing epidemiological models and using reinforcement learning algorithms<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[1]</a></sup>.

3. Reinforcement learning can be applied to model and predict the spread of diseases like Covid-19, balancing health and economic considerations to compute lockdown decisions for individual cities or regions<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[2]</a></sup>.

4. Reinforcement learning can be used to solve tasks with sparse rewards by combining self-imitation learning and exploration bonuses, leading to improved performance in environments with episodic reward settings<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[3]</a></sup>.

5. Reinforcement learning can be used to model epidemics at a microscopic level, where individual agents make decisions that affect the spread of the disease. Optimal decisions can be solved using game theory and multi-agent reinforcement learning, allowing predictions about the spread of the disease<sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[4]</a></sup>.

These points highlight the diverse applications of reinforcement learning in addressing complex problems and learning optimal behavior in various domains.

In [27]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

The main points of our conversation are:

1. Reinforcement learning has several use cases in various fields, including epidemic prevention strategies, optimal behavior modeling, solving sparse reward tasks, and individual decision-making in epidemic spread<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[1]</a></sup><sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[2]</a></sup><sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[3]</a></sup><sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[4]</a></sup>.

2. Reinforcement learning can be used to automatically learn prevention strategies for epidemic influenza by constructing epidemiological models and using reinforcement learning algorithms<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[1]</a></sup>.

3. Reinforcement learning can be applied to model and predict the spread of diseases like Covid-19, balancing health and economic considerations to compute lockdown decisions for individual cities or regions<sup><a href="https://arxiv.org/pdf/2003.14093v2.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[2]</a></sup>.

4. Reinforcement learning can be used to solve tasks with sparse rewards by combining self-imitation learning and exploration bonuses, leading to improved performance in environments with episodic reward settings<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[3]</a></sup>.

5. Reinforcement learning can be used to model epidemics at a microscopic level, where individual agents make decisions that affect the spread of the disease. Optimal decisions can be solved using game theory and multi-agent reinforcement learning, allowing predictions about the spread of the disease<sup><a href="https://arxiv.org/pdf/2004.12959v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[4]</a></sup>.

These points highlight the diverse applications of reinforcement learning in addressing complex problems and learning optimal behavior in various domains.

Is there anything else I can help you with?

Let's check our Azure CosmosDB to see the whole conversation


In [28]:
#load message from cosmosdb
cosmos.load_messages()
cosmos.messages

[HumanMessage(content='Tell me some use cases for reinforcement learning'),
 AIMessage(content='Reinforcement learning has several use cases in various fields. Here are a few examples:\n\n1. Epidemic Prevention Strategies: In the context of infectious diseases like pandemic influenza, reinforcement learning can be used to automatically learn prevention strategies. By constructing epidemiological models and using reinforcement learning algorithms, policies can be learned to control the spread of diseases in different districts or communities<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T18:11:44Z&spr=https&sig=ngrEqvqBVaxyuSYqgPVeF%2B9c0fXLs94v3ASgwg7LDBs%3D" target="_blank">[1]</a></sup>.\n\n2. Optimal Behavior Modeling: Reinforcement learning can be used to model and predict the spread of diseases like Covid-19. By balancing health and economic considerations, policies for lockdown decisions can be computed f

Bad pipe message: %s [b'e\x91\x14\xe0C4\xc9W\x9f\xc8\xc3l?\xe4`\xcb\xf4D \xb6\x99\xa5\xd5\xcb\xf5#\x9d\x9d?g\x1b\xf8\xdc\xf1\xda\xe7\xbeQ.\xd0\xeb=\x7fuu0d\x8e\xfc\xd7\xb1\x00\x08\x13\x02\x13\x03\x13\x01\x00\xff\x01\x00\x00\x8f\x00\x00\x00\x0e\x00\x0c\x00\x00\t127.0.0']
Bad pipe message: %s [b'\x00\x0b\x00\x04\x03\x00\x01\x02\x00\n\x00\x0c\x00\n\x00\x1d\x00\x17\x00\x1e\x00\x19\x00\x18\x00#\x00\x00\x00\x16\x00\x00\x00\x17\x00\x00\x00\r\x00\x1e\x00\x1c\x04\x03\x05', b'\x03\x08']
Bad pipe message: %s [b'\x08\x08\t\x08\n\x08']
Bad pipe message: %s [b'\x04\x08\x05\x08\x06\x04\x01\x05\x01\x06']
Bad pipe message: %s [b"\xb9<\xd00\xf4\xea\xa4\x13\x08_\x11@^\xe7\xe5\x9c[\xab\x00\x00|\xc0,\xc00\x00\xa3\x00\x9f\xcc\xa9\xcc\xa8\xcc\xaa\xc0\xaf\xc0\xad\xc0\xa3\xc0\x9f\xc0]\xc0a\xc0W\xc0S\xc0+\xc0/\x00\xa2\x00\x9e\xc0\xae\xc0\xac\xc0\xa2\xc0\x9e\xc0\\\xc0`\xc0V\xc0R\xc0$\xc0(\x00k\x00j\xc0#\xc0'\x00g\x00@\xc0\n\xc0\x14\x009\x008\xc0\t\xc0\x13\x003\x002\x00\x9d\xc0\xa1\xc0\x9d\xc0Q\x00\x9c\xc0\xa0\xc

![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, regardless of the input and it struggles to respond to prompts like: Hello, Thank you, Bye, What's your name, What's the weather and any other task that is not search in the knowledge base.


## <u>Important Note</u>:<br>
As we proceed, while all the code will remain compatible with GPT-3.5 models, we highly recommend transitioning to GPT-4. Here's why:

**GPT-3.5-Turbo** can be likened to a 7-year-old child. You can provide it with concise instructions, but it frequently struggles to follow them accurately. Additionally, its limited memory can make sustained conversations challenging.

**GPT-3.5-Turbo-16k** resembles the same 7-year-old, but with an increased attention span for longer instructions. However, it still faces difficulties accurately executing them about half the time.

**GPT-4** exhibits the capabilities of a 10-12-year-old child. It possesses enhanced reasoning skills and more consistently adheres to instructions. While its memory retention for instructions is moderate, it excels at following them.

**GPT-4-32k** is akin to the 10-12-year-old child with an extended memory. It comprehends lengthy sets of instructions and engages in meaningful conversations. Thanks to its robust memory, it offers detailed responses.

Understanding this analogy above will become clearer as you complete the final notebook.


# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data?** The next notebook explains and solves the tabular problem and the concept of Agents