# Understanding Memory in LLMs

In the previous Notebook, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from openai.error import OpenAIError
from langchain.embeddings import OpenAIEmbeddings
from langchain.docstore.document import Document
from langchain.memory import CosmosDBChatMessageHistory

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import (
    get_search_results,
    update_vector_indexes,
    model_tokens_limit,
    num_tokens_from_docs,
    num_tokens_from_string,
    get_answer,
)

from common.prompts import COMBINE_CHAT_PROMPT_TEMPLATE

from dotenv import load_dotenv
load_dotenv("credentials.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_BASE"] = os.environ["AZURE_OPENAI_ENDPOINT"]
os.environ["OPENAI_API_KEY"] = os.environ["AZURE_OPENAI_API_KEY"]
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]
os.environ["OPENAI_API_TYPE"] = "azure"

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning?"
FOLLOW_UP_QUESTION = "Give me the main points of our conversation"

In [4]:
# Define model
MODEL = "gpt-35-turbo"
COMPLETION_TOKENS = 500
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)

In [5]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [6]:
# Let's see what the GPT model responds
response = chain.run(QUESTION)
printmd(response)

Reinforcement learning has a wide range of applications across various domains. Here are some use cases for reinforcement learning:

1. Game Playing: Reinforcement learning has been successfully applied to games like Chess, Go, and Atari games, where the agent learns to make optimal decisions by playing against itself or learning from human experts.

2. Robotics: Reinforcement learning is used to train robots to perform complex tasks, such as grasping objects, walking, or navigating through dynamic environments, by rewarding desired behaviors and penalizing undesired ones.

3. Autonomous Vehicles: Reinforcement learning can be used to train self-driving cars to make decisions in real-time, such as lane changing, merging, and navigating intersections, based on the surrounding environment and traffic conditions.

4. Recommendation Systems: Reinforcement learning can be employed to build personalized recommendation systems that learn user preferences and provide relevant suggestions for movies, music, products, or advertisements.

5. Finance: Reinforcement learning can be used to optimize trading strategies in financial markets by learning to make buy/sell decisions based on historical data and market conditions.

6. Healthcare: Reinforcement learning can help in optimizing treatment plans by learning from patient data and medical guidelines to recommend personalized therapies or dosage adjustments.

7. Resource Management: Reinforcement learning can be applied to optimize resource allocation and scheduling problems, such as managing energy consumption in smart grids, controlling traffic signals, or optimizing supply chain logistics.

8. Natural Language Processing: Reinforcement learning can be used to train chatbots or virtual assistants to interact with users, understand natural language, and provide relevant responses.

9. Education: Reinforcement learning can be employed to create intelligent tutoring systems that adapt to individual student needs, providing personalized feedback and guidance in the learning process.

10. Healthcare Robotics: Reinforcement learning can be utilized to train robotic systems to assist in healthcare tasks, such as patient lifting, medication delivery, or physical therapy, while ensuring safety and patient comfort.

These are just a few examples, and the potential applications of reinforcement learning are vast and expanding as research progresses.

In [7]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

"I'm sorry, but as an AI language model, I don't have the ability to remember or recall past conversations. Once a conversation ends, the information is not retained. Is there anything specific you would like to discuss or ask about?"

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: {question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [10]:
printmd(chain.run({"history":Conversation_history, "question": FOLLOW_UP_QUESTION}))

- Reinforcement learning has a wide range of applications across various domains.
- Some use cases of reinforcement learning include game playing, robotics, autonomous vehicles, recommendation systems, finance, healthcare, resource management, natural language processing, education, and healthcare robotics.
- Reinforcement learning can be used to train agents to make optimal decisions in games, perform complex tasks in robotics, make real-time decisions in autonomous vehicles, provide personalized recommendations, optimize trading strategies, optimize treatment plans in healthcare, optimize resource allocation and scheduling, train chatbots or virtual assistants, create intelligent tutoring systems, and assist in healthcare tasks.
- The potential applications of reinforcement learning are vast and expanding as research progresses.

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In [11]:
# Since Memory adds tokens to the prompt, we would need a better model that allows more space on the prompt
MODEL = "gpt-35-turbo-16k"
COMPLETION_TOKENS = 1000
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)
embedder = OpenAIEmbeddings(deployment="text-embedding-ada-002", chunk_size=1) 

In [12]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
index3_name = "cogsrch-index-books-vector"
text_indexes = [index1_name, index2_name]
vector_indexes = [index+"-vector" for index in text_indexes] + [index3_name]

In [13]:
%%time

# Search in text-based indexes first and update vector indexes
k=10 # Top k results per each text-based index
ordered_results = get_search_results(QUESTION, text_indexes, k=k, reranker_threshold=1, vector_search=False)
update_vector_indexes(ordered_search_results=ordered_results, embedder=embedder)

# Search in all vector-based indexes available
similarity_k = 5 # top results from multi-vector-index similarity search
ordered_results = get_search_results(QUESTION, vector_indexes, k=k, vector_search=True,
                                        similarity_k=similarity_k,
                                        query_vector = embedder.embed_query(QUESTION))
print("Number of results:",len(ordered_results))

Number of results: 5
CPU times: user 528 ms, sys: 40.4 ms, total: 568 ms
Wall time: 3.02 s


In [14]:
# Uncomment the below line if you want to inspect the ordered results
# ordered_results

In [15]:
top_docs = []
for key,value in ordered_results.items():
    location = value["location"] if value["location"] is not None else ""
    top_docs.append(Document(page_content=value["chunk"], metadata={"source": location+os.environ['BLOB_SAS_TOKEN']}))
        
print("Number of chunks:",len(top_docs))

Number of chunks: 5


In [16]:
# Calculate number of tokens of our docs
if(len(top_docs)>0):
    tokens_limit = model_tokens_limit(MODEL) # this is a custom function we created in common/utils.py
    prompt_tokens = num_tokens_from_string(COMBINE_CHAT_PROMPT_TEMPLATE) # this is a custom function we created in common/utils.py
    context_tokens = num_tokens_from_docs(top_docs) # this is a custom function we created in common/utils.py
    
    requested_tokens = prompt_tokens + context_tokens + COMPLETION_TOKENS
    
    chain_type = "map_reduce" if requested_tokens > 0.9 * tokens_limit else "stuff"  
    
    print("System prompt token count:",prompt_tokens)
    print("Max Completion Token count:", COMPLETION_TOKENS)
    print("Combined docs (context) token count:",context_tokens)
    print("--------")
    print("Requested token count:",requested_tokens)
    print("Token limit for", MODEL, ":", tokens_limit)
    print("Chain Type selected:", chain_type)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")

System prompt token count: 2464
Max Completion Token count: 1000
Combined docs (context) token count: 1019
--------
Requested token count: 4483
Token limit for gpt-35-turbo-16k : 16384
Chain Type selected: stuff


In [17]:
%%time
# Get the answer
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

Reinforcement learning can be applied in various use cases, including:
1. Learning prevention strategies for epidemics of infectious diseases, such as pandemic influenza, by automatically learning mitigation policies in complex epidemiological models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.
2. Learning sparse reward tasks efficiently by combining self-imitation learning with exploration bonuses, which enhances exploration by producing intrinsic rewards when the agent visits novel states<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup>.
3. Personalized hybrid recommendation algorithms for music based on reinforcement learning, which consider the simulation of the interaction process to capture changes in listeners' preferences sensitively<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup>.
4. Automatic feature engineering in machine learning projects, where a framework called CAFEM (Cross-data Automatic Feature Engineering Machine) is used to optimize feature transformation and improve learning performance<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[4]</a></sup>.
5. Job scheduling in data centers, where an Advantage Actor-Critic (A2C) deep reinforcement learning approach called A2cScheduler is used to automatically learn scheduling policies and achieve competitive scheduling performance<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[5]</a></sup>.

These references provide more details and information about the respective use cases.

CPU times: user 9.97 ms, sys: 0 ns, total: 9.97 ms
Wall time: 21.5 s


And if we ask the follow up question:

In [18]:
response = get_answer(llm=llm, docs=top_docs,  query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

The main points of our conversation are about reinforcement learning in various domains. We discussed the use of deep reinforcement learning to learn prevention strategies in the context of pandemic influenza<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>. We also talked about the Explore-then-Exploit framework, which combines self-imitation learning with exploration bonuses to improve performance in sparse reward tasks<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup>. Additionally, we discussed the use of reinforcement learning in personalized music recommendation systems<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup>. We also touched upon the topic of automatic feature engineering using a framework called CAFEM<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[4]</a></sup>. Lastly, we discussed the A2cScheduler, an Advantage Actor-Critic deep reinforcement learning approach for job scheduling in data centers<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[5]</a></sup>.

Anything else I can help you with?

You might get a different response from above, but it doesn't matter what response you get, it will be based on the context given, not on previous answers.

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/docs/modules/memory/how_to/adding_memory_chain_multiple_inputs

In [19]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has various use cases across different domains. Here are some examples:

1. **Epidemic prevention strategies**: Reinforcement learning can be used to automatically learn prevention strategies for infectious diseases. For example, a study used deep reinforcement learning to learn mitigation policies in complex epidemiological models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.

2. **Sparse reward tasks**: Reinforcement learning can be challenging when dealing with tasks that have sparse rewards. Self-imitation learning and exploration bonuses are two approaches that can address this challenge. A recent framework called Explore-then-Exploit (EE) interleaves self-imitation learning with an exploration bonus to enhance both exploitation and exploration in learning tasks<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup>.

3. **Personalized recommendation systems**: Reinforcement learning can be applied to personalized recommendation systems. For example, a personalized hybrid recommendation algorithm for music based on reinforcement learning was proposed. It uses techniques like weighted matrix factorization and convolutional neural networks to learn and extract song feature vectors, and it continuously updates the model based on the preferences of listeners for songs and song transitions<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup>.

4. **Automatic feature engineering**: Feature engineering is a crucial task in machine learning projects. Reinforcement learning can be used to automate feature engineering. For example, a framework called Cross-data Automatic Feature Engineering Machine (CAFEM) formalizes the feature engineering problem as an optimization problem over a Feature Transformation Graph (FTG). It includes a feature engineering learner that learns fine-grained strategies on a single dataset and a cross-data component that speeds up feature engineering learning on unseen datasets<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[4]</a></sup>.

5. **Job scheduling in data centers**: Reinforcement learning can be applied to job scheduling in data centers. For example, an Advantage Actor-Critic (A2C) deep reinforcement learning-based approach called A2cScheduler was proposed for job scheduling. It consists of two agents, an actor and a critic, that work together to learn the scheduling policy and reduce estimation errors. The approach showed competitive performance using both simulated workloads and real data from an academic data center<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X

In [20]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Here are the main points of our conversation:

1. Reinforcement learning has various use cases across different domains.
2. Some examples of use cases for reinforcement learning include epidemic prevention strategies, sparse reward tasks, personalized recommendation systems, automatic feature engineering, and job scheduling in data centers.

Would you like more information about any of these use cases?

In [21]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

The main points of our conversation are as follows:

1. Reinforcement learning has various use cases across different domains.
2. Some examples of use cases for reinforcement learning include:
   - Epidemic prevention strategies, where deep reinforcement learning is used to learn prevention strategies for infectious diseases<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.
   - Sparse reward tasks, where self-imitation learning and exploration bonuses can be used to address the challenge of sparse rewards<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[2]</a></sup>.
   - Personalized recommendation systems, where reinforcement learning can be applied to recommend personalized song sequences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[3]</a></sup>.
   - Automatic feature engineering, where reinforcement learning can be used to automate the process of feature engineering<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[4]</a></sup>.
   - Job scheduling in data centers, where reinforcement learning can be applied to optimize job scheduling and resource allocation<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[5]</a></sup>.

Please let me know if you would like more information about any of these use cases.

You might get a different answer on the above cell, and it is ok, this bot is not yet well configured to answer any question that is not related to its knowledge base, including salutations.

Let's check our memory to see that it's keeping the conversation

In [22]:
memory.buffer

'Human: Tell me some use cases for reinforcement learning?\nAI: Reinforcement learning has various use cases across different domains. Here are some examples:\n\n1. **Epidemic prevention strategies**: Reinforcement learning can be used to automatically learn prevention strategies for infectious diseases. For example, a study used deep reinforcement learning to learn mitigation policies in complex epidemiological models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D" target="_blank">[1]</a></sup>.\n\n2. **Sparse reward tasks**: Reinforcement learning can be challenging when dealing with tasks that have sparse rewards. Self-imitation learning and exploration bonuses are two approaches that can address this challenge. A recent framework called Explore-then-Exploit (EE) interleaves self-imitation learning with an e

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wisg to provide recommendations. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory, see [HERE](https://python.langchain.com/en/latest/_modules/langchain/memory/chat_message_histories/cosmos_db.html)

In [23]:
# Create CosmosDB instance from langchain cosmos class.
cosmos = CosmosDBChatMessageHistory(
    cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
    cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
    cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
    connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
    session_id="Agent-Test-Session" + str(random.randint(1, 1000)),
    user_id="Agent-Test-User" + str(random.randint(1, 1000))
    )

# prepare the cosmosdb instance
cosmos.prepare_cosmos()

In [24]:
# Create or Memory Object
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question",chat_memory=cosmos)

In [25]:
# Testing using our Question
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has various use cases in different domains. Here are some examples:

1. **Epidemic Prevention**: Reinforcement learning can be used to automatically learn prevention strategies for infectious diseases. For example, a study used deep reinforcement learning to learn mitigation policies in complex epidemiological models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[1]</a></sup>.

2. **Sparse Reward Tasks**: Reinforcement learning is used to tackle tasks with sparse rewards, where efficient exploitation and exploration are required. One approach is self-imitation learning, which encourages exploitation by imitating past good trajectories. Another approach is exploration bonuses, which enhance exploration by providing intrinsic rewards. A novel framework called Explore-then-Exploit (EE) has been introduced, which interleaves self-imitation learning with an exploration bonus to strengthen the effect of these two algorithms<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[2]</a></sup>.

3. **Personalized Recommendation Systems**: Reinforcement learning can be used to improve personalized recommendation systems. For example, a personalized hybrid recommendation algorithm for music based on reinforcement learning was proposed. It recommends song sequences that match listeners' preferences better by simulating the interaction process and updating the model continuously based on their preferences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[3]</a></sup>.

4. **Feature Engineering**: Reinforcement learning can be used to automate feature engineering, which is a time-consuming and challenging task in machine learning projects. A framework called Cross-data Automatic Feature Engineering Machine (CAFEM) has been proposed, which formalizes the feature engineering problem as an optimization problem and learns fine-grained feature engineering strategies using reinforcement learning<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[4]</a></sup>.

5. **Job Scheduling**: Reinforcement learning can be used for efficient job scheduling in data centers. An approach called A2cScheduler, based on Advantage Actor-Critic (A2C) deep reinforcement learning, has been proposed for job scheduling. It consists of two agents, the actor and the critic, which learn the scheduling policy and reduce estimation error, respectively<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[5]</a></sup>.

These are just a few examples of the use cases for reinforcement learning. It

In [26]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Here are the main points of our conversation:

1. Reinforcement learning has various use cases in different domains.
2. One use case is epidemic prevention, where deep reinforcement learning is used to learn prevention strategies for infectious diseases<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[1]</a></sup>.
3. Another use case is tackling tasks with sparse rewards, where self-imitation learning and exploration bonuses are used to enhance exploitation and exploration<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[2]</a></sup>.
4. Reinforcement learning can be used to improve personalized recommendation systems, such as a hybrid recommendation algorithm for music based on reinforcement learning<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[3]</a></sup>.
5. Reinforcement learning can automate feature engineering tasks, such as a framework called CAFEM that learns fine-grained feature engineering strategies using reinforcement learning<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[4]</a></sup>.
6. Reinforcement learning can be used for efficient job scheduling in data centers, such as the A2cScheduler approach based on Advantage Actor-Critic (A2C) deep reinforcement learning<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[5]</a></sup>.

These points summarize the main use cases of reinforcement learning that we discussed. Is there anything else I can help you with?

In [27]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has various use cases in different domains.
2. One use case is epidemic prevention, where deep reinforcement learning is used to learn prevention strategies for infectious diseases<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[1]</a></sup>.
3. Another use case is tackling tasks with sparse rewards, where self-imitation learning and exploration bonuses are used to enhance exploitation and exploration<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206262/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[2]</a></sup>.
4. Reinforcement learning can be used to improve personalized recommendation systems, such as a hybrid recommendation algorithm for music based on reinforcement learning<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206183/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[3]</a></sup>.
5. Reinforcement learning can automate feature engineering tasks, such as a framework called CAFEM that learns fine-grained feature engineering strategies using reinforcement learning<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206177/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[4]</a></sup>.
6. Reinforcement learning can be used for efficient job scheduling in data centers, such as the A2cScheduler approach based on Advantage Actor-Critic (A2C) deep reinforcement learning<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206316/?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[5]</a></sup>.

These points summarize the main use cases of reinforcement learning that we discussed. Let me know if there's anything else I can assist you with.

Let's check our Azure CosmosDB to see the whole conversation


In [28]:
#load message from cosmosdb
cosmos.load_messages()
cosmos.messages

[HumanMessage(content='Tell me some use cases for reinforcement learning?', additional_kwargs={}, example=False),
 AIMessage(content='Reinforcement learning has various use cases in different domains. Here are some examples:\n\n1. **Epidemic Prevention**: Reinforcement learning can be used to automatically learn prevention strategies for infectious diseases. For example, a study used deep reinforcement learning to learn mitigation policies in complex epidemiological models with a large state space<sup><a href="https://arxiv.org/pdf/2003.13676v1.pdf?sv=2022-11-02&ss=bf&srt=sco&sp=rltfx&se=2024-10-02T01:02:07Z&st=2023-08-03T17:02:07Z&spr=https&sig=gLxStXFSY6X29OPpPDpBEhoQDdtJNDrMVExNYJ%2BhmBQ%3D">[1]</a></sup>.\n\n2. **Sparse Reward Tasks**: Reinforcement learning is used to tackle tasks with sparse rewards, where efficient exploitation and exploration are required. One approach is self-imitation learning, which encourages exploitation by imitating past good trajectories. Another approac

![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, regardless of the input and it struggles to respond to prompts like: Hello, Thank you, Bye, What's your name, What's the weather and any other task that is not search in the knowledge base.


## <u>Important Note</u>:<br>
As we proceed, while all the code will remain compatible with GPT-3.5 models, we highly recommend transitioning to GPT-4. Here's why:

**GPT-3.5-Turbo** can be likened to a 7-year-old child. You can provide it with concise instructions, but it frequently struggles to follow them accurately. Additionally, its limited memory can make sustained conversations challenging.

**GPT-3.5-Turbo-16k** resembles the same 7-year-old, but with an increased attention span for longer instructions. However, it still faces difficulties accurately executing them about half the time.

**GPT-4** exhibits the capabilities of a 10-12-year-old child. It possesses enhanced reasoning skills and more consistently adheres to instructions. While its memory retention for instructions is moderate, it excels at following them.

**GPT-4-32k** is akin to the 10-12-year-old child with an extended memory. It comprehends lengthy sets of instructions and engages in meaningful conversations. Thanks to its robust memory, it offers detailed responses.

Understanding this analogy above will become clearer as you complete the final notebook.


# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data?** The next notebook explains and solves the tabular problem and the concept of Agents