# Understanding Memory in LLMs

In the previous Notebook, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from openai.error import OpenAIError
from langchain.embeddings import OpenAIEmbeddings
from langchain.docstore.document import Document
from langchain.memory import CosmosDBChatMessageHistory

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import (
    get_search_results,
    update_vector_indexes,
    model_tokens_limit,
    num_tokens_from_docs,
    num_tokens_from_string,
    get_answer,
)

from common.prompts import COMBINE_CHAT_PROMPT_TEMPLATE

from dotenv import load_dotenv
load_dotenv("credentials.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_BASE"] = os.environ["AZURE_OPENAI_ENDPOINT"]
os.environ["OPENAI_API_KEY"] = os.environ["AZURE_OPENAI_API_KEY"]
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]
os.environ["OPENAI_API_TYPE"] = "azure"

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning"
FOLLOW_UP_QUESTION = "Give me the main points of our conversation"

In [4]:
# Define model
MODEL = "gpt-35-turbo"
COMPLETION_TOKENS = 500
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)

In [5]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [6]:
# Let's see what the GPT model responds
response = chain.run(QUESTION)
printmd(response)

1. Game playing: Reinforcement learning algorithms can be used to train agents to play games such as chess, Go, and video games.

2. Robotics: Reinforcement learning algorithms can be used to train robots to perform tasks such as grasping objects, navigating through environments, and interacting with humans.

3. Autonomous driving: Reinforcement learning can be used to train self-driving cars to navigate through traffic and avoid obstacles.

4. Personalized recommendations: Reinforcement learning can be used to personalize recommendations for users based on their past behavior and preferences.

5. Ad placement: Reinforcement learning can be used to optimize ad placement on websites and social media platforms to maximize revenue.

6. Healthcare: Reinforcement learning can be used to develop personalized treatment plans for patients based on their medical history and response to treatment.

7. Finance: Reinforcement learning can be used to develop trading strategies for financial markets.

8. Energy management: Reinforcement learning can be used to optimize energy consumption in buildings and power grids.

9. Supply chain management: Reinforcement learning can be used to optimize supply chain operations such as inventory management and logistics.

10. Agriculture: Reinforcement learning can be used to optimize crop yields and reduce waste in farming operations.

In [7]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

"I'm sorry, but as an AI language model, I do not have access to any previous conversation you may have had. Please provide me with more context or specific information about the conversation you are referring to so I can assist you better."

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: {question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [10]:
printmd(chain.run({"history":Conversation_history, "question": FOLLOW_UP_QUESTION}))

1. Use cases for reinforcement learning include game playing, robotics, autonomous driving, personalized recommendations, ad placement, healthcare, finance, energy management, supply chain management, and agriculture.
2. Reinforcement learning can be used to train agents to perform tasks and make decisions based on rewards and punishments.
3. Reinforcement learning can be applied to various industries to optimize operations and improve outcomes.

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In [11]:
# Since Memory adds tokens to the prompt, we would need a better model that allows more space on the prompt
MODEL = "gpt-35-turbo-16k"
COMPLETION_TOKENS = 1000
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)
embedder = OpenAIEmbeddings(deployment="text-embedding-ada-002", chunk_size=1) 

In [12]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
index3_name = "cogsrch-index-books-vector"
text_indexes = [index1_name, index2_name]
vector_indexes = [index+"-vector" for index in text_indexes] + [index3_name]

In [13]:
%%time

# Search in text-based indexes first and update vector indexes
k=10 # Top k results per each text-based index
ordered_results = get_search_results(QUESTION, text_indexes, k=k, reranker_threshold=1, vector_search=False)
update_vector_indexes(ordered_search_results=ordered_results, embedder=embedder)

# Search in all vector-based indexes available
similarity_k = 5 # top results from multi-vector-index similarity search
ordered_results = get_search_results(QUESTION, vector_indexes, k=k, vector_search=True,
                                        similarity_k=similarity_k,
                                        query_vector = embedder.embed_query(QUESTION))
print("Number of results:",len(ordered_results))

Number of results: 5
CPU times: total: 16.7 s
Wall time: 44.7 s


In [14]:
# Uncomment the below line if you want to inspect the ordered results
# ordered_results

In [15]:
top_docs = []
for key,value in ordered_results.items():
    location = value["location"] if value["location"] is not None else ""
    top_docs.append(Document(page_content=value["chunk"], metadata={"source": location+os.environ['BLOB_SAS_TOKEN']}))
        
print("Number of chunks:",len(top_docs))

Number of chunks: 5


In [16]:
# Calculate number of tokens of our docs
if(len(top_docs)>0):
    tokens_limit = model_tokens_limit(MODEL) # this is a custom function we created in common/utils.py
    prompt_tokens = num_tokens_from_string(COMBINE_CHAT_PROMPT_TEMPLATE) # this is a custom function we created in common/utils.py
    context_tokens = num_tokens_from_docs(top_docs) # this is a custom function we created in common/utils.py
    
    requested_tokens = prompt_tokens + context_tokens + COMPLETION_TOKENS
    
    chain_type = "map_reduce" if requested_tokens > 0.9 * tokens_limit else "stuff"  
    
    print("System prompt token count:",prompt_tokens)
    print("Max Completion Token count:", COMPLETION_TOKENS)
    print("Combined docs (context) token count:",context_tokens)
    print("--------")
    print("Requested token count:",requested_tokens)
    print("Token limit for", MODEL, ":", tokens_limit)
    print("Chain Type selected:", chain_type)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")

System prompt token count: 2464
Max Completion Token count: 1000
Combined docs (context) token count: 1273
--------
Requested token count: 4737
Token limit for gpt-35-turbo-16k : 16384
Chain Type selected: stuff


In [17]:
%%time
# Get the answer
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

Reinforcement learning can be used in various use cases, including:
1. Learning prevention strategies for epidemics of infectious diseases, such as pandemic influenza, in complex epidemiological models<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[1]</a></sup>.
2. Personalized music recommendation based on reinforcement learning, where the algorithm learns and updates models based on listeners' preferences for songs and song transitions<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[2]</a></sup>.
3. Abdominal wall reinforcement for ventral hernia repair or prophylaxis in contaminated fields, where reinforcement learning can be used to evaluate outcomes and compare synthetic non-absorbable and biologic prosthetics<sup><a href="https://doi.org/10.1007/s00464-014-3499-5; https://www.ncbi.nlm.nih.gov/pubmed/24619334/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[3]</a></sup>.
4. Feature engineering in machine learning projects, where a framework called CAFEM (Cross-data Automatic Feature Engineering Machine) uses reinforcement learning to optimize feature transformation graphs and improve performance<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[4]</a></sup>.
5. Learning in sparse reward tasks, where the Explore-then-Exploit (EE) framework combines self-imitation learning and exploration bonuses to improve performance in environments with episodic reward settings<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D" target="_blank">[5]</a></sup>.

References:
[1] Source: [1]
[2] Source: [2]
[3] Source: [3]
[4] Source: [4]
[5] Source: [5]

CPU times: total: 31.2 ms
Wall time: 13.3 s


And if we ask the follow up question:

In [18]:
response = get_answer(llm=llm, docs=top_docs,  query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

I'm sorry, but I don't have access to the conversation we had.

You might get a different response from above, but it doesn't matter what response you get, it will be based on the context given, not on previous answers.

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/docs/modules/memory/how_to/adding_memory_chain_multiple_inputs

In [19]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has various use cases in different domains. Here are a few examples:

1. **Epidemic Prevention**: Reinforcement learning can be used to automatically learn prevention strategies for controlling epidemics of infectious diseases. By constructing epidemiological models and using reinforcement learning techniques, policies can be learned to control the spread of diseases in different regions<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[1]</a></sup>.

2. **Personalized Recommendation Systems**: Reinforcement learning can be applied to personalized recommendation systems, such as music recommendation. By simulating the interaction process between listeners and continuously updating the recommendation model based on their preferences, reinforcement learning can improve the accuracy of song sequence recommendations<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[2]</a></sup>.

3. **Abdominal Wall Reinforcement**: Reinforcement learning can be used to optimize the choice of prosthetics for abdominal wall reinforcement in contaminated fields. By analyzing outcomes from studies, reinforcement learning can help identify the most effective prosthetics for ventral hernia repair or prophylaxis<sup><a href="https://doi.org/10.1007/s00464-014-3499-5; https://www.ncbi.nlm.nih.gov/pubmed/24619334/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[3]</a></sup>.

4. **Feature Engineering**: Reinforcement learning can aid in automating feature engineering, which is a crucial and time-consuming task in machine learning projects. By formulating feature engineering as an optimization problem, reinforcement learning can learn fine-grained strategies for feature transformation and increase learning performance<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[4]</a></sup>.

5. **Sparse Reward Tasks**: Reinforcement learning techniques can be used to tackle sparse reward tasks, which are challenging due to the lack of immediate feedback. By combining self-imitation learning and exploration bonuses, reinforcement learning can achieve better performance in tasks with episodic rewards, such as MuJoCo environments<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[5]</a></sup>.

Please note that the references provided correspond to the sources of the extracted parts and can be accessed for more detailed information. Let me know if there's anything else I can assist you with!

In [20]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has various use cases in different domains.
2. Epidemic prevention: Reinforcement learning can be used to automatically learn prevention strategies for controlling epidemics of infectious diseases<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[1]</a></sup>.
3. Personalized recommendation systems: Reinforcement learning can be applied to personalized recommendation systems, such as music recommendation, to improve the accuracy of song sequence recommendations<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[2]</a></sup>.
4. Abdominal wall reinforcement: Reinforcement learning can help optimize the choice of prosthetics for abdominal wall reinforcement in contaminated fields, improving outcomes for ventral hernia repair or prophylaxis<sup><a href="https://doi.org/10.1007/s00464-014-3499-5; https://www.ncbi.nlm.nih.gov/pubmed/24619334/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[3]</a></sup>.
5. Feature engineering: Reinforcement learning can aid in automating feature engineering, a crucial task in machine learning projects, by learning fine-grained strategies for feature transformation<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[4]</a></sup>.
6. Sparse reward tasks: Reinforcement learning techniques can be used to tackle sparse reward tasks, such as MuJoCo environments, by combining self-imitation learning and exploration bonuses<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[5]</a></sup>.

Please note that the references provided correspond to the sources of the extracted parts and can be accessed for more detailed information. Let me know if there's anything else I can assist you with!

In [21]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

Based on the extracted parts, here are the main points of our conversation:

1. Reinforcement learning has various use cases in different domains.
2. Epidemic prevention: Reinforcement learning can be used to automatically learn prevention strategies for controlling epidemics of infectious diseases<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[1]</a></sup>.
3. Personalized recommendation systems: Reinforcement learning can be applied to personalized recommendation systems, such as music recommendation, to improve the accuracy of song sequence recommendations<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[2]</a></sup>.
4. Abdominal wall reinforcement: Reinforcement learning can help optimize the choice of prosthetics for abdominal wall reinforcement in contaminated fields, improving outcomes for ventral hernia repair or prophylaxis<sup><a href="https://doi.org/10.1007/s00464-014-3499-5; https://www.ncbi.nlm.nih.gov/pubmed/24619334/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[3]</a></sup>.
5. Feature engineering: Reinforcement learning can aid in automating feature engineering, a crucial task in machine learning projects, by learning fine-grained strategies for feature transformation<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[4]</a></sup>.
6. Sparse reward tasks: Reinforcement learning techniques can be used to tackle sparse reward tasks, such as MuJoCo environments, by combining self-imitation learning and exploration bonuses<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[5]</a></sup>.

Please note that the references provided correspond to the sources of the extracted parts and can be accessed for more detailed information. Let me know if there's anything else I can assist you with!

You might get a different answer on the above cell, and it is ok, this bot is not yet well configured to answer any question that is not related to its knowledge base, including salutations.

Let's check our memory to see that it's keeping the conversation

In [22]:
memory.buffer

'Human: Tell me some use cases for reinforcement learning\nAI: Reinforcement learning has various use cases in different domains. Here are a few examples:\n\n1. **Epidemic Prevention**: Reinforcement learning can be used to automatically learn prevention strategies for controlling epidemics of infectious diseases. By constructing epidemiological models and using reinforcement learning techniques, policies can be learned to control the spread of diseases in different regions<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[1]</a></sup>.\n\n2. **Personalized Recommendation Systems**: Reinforcement learning can be applied to personalized recommendation systems, such as music recommendation. By simulating the interaction process between listeners and continuously updating the recommendation model based on their preferences, reinforcement lear

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wisg to provide recommendations. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory, see [HERE](https://python.langchain.com/en/latest/_modules/langchain/memory/chat_message_histories/cosmos_db.html)

In [23]:
# Create CosmosDB instance from langchain cosmos class.
cosmos = CosmosDBChatMessageHistory(
    cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
    cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
    cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
    connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
    session_id="Agent-Test-Session" + str(random.randint(1, 1000)),
    user_id="Agent-Test-User" + str(random.randint(1, 1000))
    )

# prepare the cosmosdb instance
cosmos.prepare_cosmos()

In [24]:
# Create or Memory Object
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question",chat_memory=cosmos)

In [25]:
# Testing using our Question
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has several use cases across different domains. Here are a few examples:

1. **Epidemiology**: Reinforcement learning can be used to learn prevention strategies for controlling the spread of infectious diseases, such as pandemic influenza. By constructing epidemiological models and using reinforcement learning algorithms like Proximal Policy Optimization, researchers can learn mitigation policies and prevention strategies in complex models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[1]</a></sup>.

2. **Recommendation Systems**: Reinforcement learning can be used in personalized recommendation systems, such as music recommendation. By simulating the interaction process between listeners and continuously updating the model based on their preferences, reinforcement learning algorithms can recommend song sequences that better match listeners' preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[2]</a></sup>.

3. **Medical Research**: Reinforcement learning can be applied in medical research, such as abdominal wall reinforcement in ventral hernia repair. By comparing outcomes after using synthetic non-absorbable and biologic prosthetics, reinforcement learning can help evaluate the effectiveness of different reinforcement materials<sup><a href="https://doi.org/10.1007/s00464-014-3499-5; https://www.ncbi.nlm.nih.gov/pubmed/24619334/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[3]</a></sup>.

4. **Machine Learning**: Reinforcement learning can be used in feature engineering, which is a crucial and time-consuming task in machine learning projects. By formalizing feature engineering as an optimization problem and using reinforcement learning techniques, researchers can design generalized ways to perform feature engineering and improve learning performance across different datasets<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[4]</a></sup>.

5. **Reinforcement Learning Algorithms**: Reinforcement learning techniques, such as self-imitation learning and exploration bonuses, can be combined to tackle sparse reward tasks. The Explore-then-Exploit (EE) framework interleaves self-imitation learning with an exploration bonus to enhance both exploitation and exploration, leading to improved performance in tasks with episodic reward settings<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[5]</a></sup>.

These are just a few examples of the diverse applications of reinforcement learning. It can be applied to various domains where decision-making and optimization problems

In [26]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Our conversation covered several use cases for reinforcement learning. Here are the main points:

1. **Epidemiology**: Reinforcement learning can be used to learn prevention strategies for controlling the spread of infectious diseases, such as pandemic influenza. By constructing epidemiological models and using reinforcement learning algorithms like Proximal Policy Optimization, researchers can learn mitigation policies and prevention strategies in complex models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[1]</a></sup>.

2. **Recommendation Systems**: Reinforcement learning can be used in personalized recommendation systems, such as music recommendation. By simulating the interaction process between listeners and continuously updating the model based on their preferences, reinforcement learning algorithms can recommend song sequences that better match listeners' preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[2]</a></sup>.

3. **Medical Research**: Reinforcement learning can be applied in medical research, such as abdominal wall reinforcement in ventral hernia repair. By comparing outcomes after using synthetic non-absorbable and biologic prosthetics, reinforcement learning can help evaluate the effectiveness of different reinforcement materials<sup><a href="https://doi.org/10.1007/s00464-014-3499-5; https://www.ncbi.nlm.nih.gov/pubmed/24619334/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[3]</a></sup>.

4. **Machine Learning**: Reinforcement learning can be used in feature engineering, which is a crucial and time-consuming task in machine learning projects. By formalizing feature engineering as an optimization problem and using reinforcement learning techniques, researchers can design generalized ways to perform feature engineering and improve learning performance across different datasets<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[4]</a></sup>.

5. **Reinforcement Learning Algorithms**: Reinforcement learning techniques, such as self-imitation learning and exploration bonuses, can be combined to tackle sparse reward tasks. The Explore-then-Exploit (EE) framework interleaves self-imitation learning with an exploration bonus to enhance both exploitation and exploration, leading to improved performance in tasks with episodic reward settings<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[5]</a></sup>.

These examples demonstrate the versatility and potential of reinforcement learning in various domains. If you have any more questions, feel free to ask!



In [27]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

Reinforcement learning has several use cases across different domains. Here are the main points of our conversation:

1. **Epidemiology**: Reinforcement learning can be used to learn prevention strategies for controlling the spread of infectious diseases, such as pandemic influenza. By constructing epidemiological models and using reinforcement learning algorithms like Proximal Policy Optimization, researchers can learn mitigation policies and prevention strategies in complex models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[1]</a></sup>.

2. **Recommendation Systems**: Reinforcement learning can be used in personalized recommendation systems, such as music recommendation. By simulating the interaction process between listeners and continuously updating the model based on their preferences, reinforcement learning algorithms can recommend song sequences that better match listeners' preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[2]</a></sup>.

3. **Medical Research**: Reinforcement learning can be applied in medical research, such as abdominal wall reinforcement in ventral hernia repair. By comparing outcomes after using synthetic non-absorbable and biologic prosthetics, reinforcement learning can help evaluate the effectiveness of different reinforcement materials<sup><a href="https://doi.org/10.1007/s00464-014-3499-5; https://www.ncbi.nlm.nih.gov/pubmed/24619334/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[3]</a></sup>.

4. **Machine Learning**: Reinforcement learning can be used in feature engineering, which is a crucial and time-consuming task in machine learning projects. By formalizing feature engineering as an optimization problem and using reinforcement learning techniques, researchers can design generalized ways to perform feature engineering and improve learning performance across different datasets<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[4]</a></sup>.

5. **Reinforcement Learning Algorithms**: Reinforcement learning techniques, such as self-imitation learning and exploration bonuses, can be combined to tackle sparse reward tasks. The Explore-then-Exploit (EE) framework interleaves self-imitation learning with an exploration bonus to enhance both exploitation and exploration, leading to improved performance in tasks with episodic reward settings<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[5]</a></sup>.

These examples demonstrate the versatility and potential of reinforcement learning in various domains. If you have any more questions, feel free to

Let's check our Azure CosmosDB to see the whole conversation


In [28]:
#load message from cosmosdb
cosmos.load_messages()
cosmos.messages

[HumanMessage(content='Tell me some use cases for reinforcement learning'),
 AIMessage(content='Reinforcement learning has several use cases across different domains. Here are a few examples:\n\n1. **Epidemiology**: Reinforcement learning can be used to learn prevention strategies for controlling the spread of infectious diseases, such as pandemic influenza. By constructing epidemiological models and using reinforcement learning algorithms like Proximal Policy Optimization, researchers can learn mitigation policies and prevention strategies in complex models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rl&se=2025-11-06T23:27:04Z&st=2023-11-06T15:27:04Z&spr=https&sig=IxmYt1nWtSI0MtBHeQBC1t%2F4VeoN19HqQM1Xu6tvacU%3D">[1]</a></sup>.\n\n2. **Recommendation Systems**: Reinforcement learning can be used in personalized recommendation systems, such as music recommendation. By simulating the interaction process between listeners an

![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, regardless of the input and it struggles to respond to prompts like: Hello, Thank you, Bye, What's your name, What's the weather and any other task that is not search in the knowledge base.


## <u>Important Note</u>:<br>
As we proceed, while all the code will remain compatible with GPT-3.5 models, we highly recommend transitioning to GPT-4. Here's why:

**GPT-3.5-Turbo** can be likened to a 7-year-old child. You can provide it with concise instructions, but it frequently struggles to follow them accurately. Additionally, its limited memory can make sustained conversations challenging.

**GPT-3.5-Turbo-16k** resembles the same 7-year-old, but with an increased attention span for longer instructions. However, it still faces difficulties accurately executing them about half the time.

**GPT-4** exhibits the capabilities of a 10-12-year-old child. It possesses enhanced reasoning skills and more consistently adheres to instructions. While its memory retention for instructions is moderate, it excels at following them.

**GPT-4-32k** is akin to the 10-12-year-old child with an extended memory. It comprehends lengthy sets of instructions and engages in meaningful conversations. Thanks to its robust memory, it offers detailed responses.

Understanding this analogy above will become clearer as you complete the final notebook.


# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data?** The next notebook explains and solves the tabular problem and the concept of Agents