# Understanding Memory in LLMs

In the previous Notebook, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from openai.error import OpenAIError
from langchain.embeddings import OpenAIEmbeddings
from langchain.docstore.document import Document
from langchain.memory import CosmosDBChatMessageHistory

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import (
    get_search_results,
    update_vector_indexes,
    model_tokens_limit,
    num_tokens_from_docs,
    num_tokens_from_string,
    get_answer,
)

from common.prompts import COMBINE_CHAT_PROMPT_TEMPLATE

from dotenv import load_dotenv
load_dotenv("credentials_my.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_BASE"]    = os.environ["AZURE_OPENAI_ENDPOINT"]
os.environ["OPENAI_API_KEY"]     = os.environ["AZURE_OPENAI_API_KEY"]
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]
os.environ["OPENAI_API_TYPE"]    = os.environ["OPENAI_API_TYPE"]

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION           = "Tell me some use cases for reinforcement learning"
FOLLOW_UP_QUESTION = "Give me the main points of our conversation"

In [4]:
# Define model
MODEL = os.environ["COMPLETION3516_DEPLOYMENT"] # we DO have chatgpt-4 too, but not needed in this case
COMPLETION_TOKENS = 500
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)

In [5]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [6]:
# Let's see what the GPT model responds
response = chain.run(QUESTION)
printmd(response)

1. Game playing: Reinforcement learning has been successfully used to train agents to play complex games like chess, Go, and poker. These agents can learn strategies, adapt to opponents, and improve their gameplay over time.

2. Robotics: Reinforcement learning can be applied to train robots to perform various tasks, such as object manipulation, grasping, and navigation. The agents learn to interact with their environment, receive rewards or penalties based on their actions, and optimize their behavior accordingly.

3. Autonomous vehicles: Reinforcement learning can be used to train self-driving cars to make decisions on the road. The agents learn to navigate traffic, follow traffic rules, and make safe and efficient driving choices.

4. Resource management: Reinforcement learning can be applied to optimize the allocation of resources, such as energy, bandwidth, or server capacity. The agents learn to make decisions that maximize efficiency, minimize costs, or optimize performance based on the available resources.

5. Recommendation systems: Reinforcement learning can be used to build personalized recommendation systems. The agents learn from user feedback to recommend relevant products, movies, or content, adapting to individual preferences and improving the accuracy of recommendations over time.

6. Healthcare: Reinforcement learning can assist in medical treatment planning, drug dosage optimization, and personalized therapy. The agents can learn optimal treatment strategies based on patient data, clinical guidelines, and feedback from medical professionals.

7. Finance: Reinforcement learning can be applied to financial trading and portfolio management. The agents learn to make investment decisions, manage risk, and optimize trading strategies based on market data and financial indicators.

8. Industrial control systems: Reinforcement learning can be used to optimize and control complex industrial processes, such as power generation, chemical manufacturing, or supply chain management. The agents learn to make decisions that maximize productivity, minimize costs, and ensure smooth operation.

9. Natural language processing: Reinforcement learning can be applied to build conversational agents or chatbots. The agents learn to generate appropriate responses, engage in meaningful conversations, and improve their language understanding and generation capabilities.

10. Personalized education: Reinforcement learning can be used to develop adaptive learning systems. The agents learn to personalize educational content, adapt teaching strategies, and provide tailored feedback to individual learners, enhancing the effectiveness and efficiency of the learning process.

In [7]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

"I apologize, but as an AI language model, I don't have the capability to remember previous conversations. Once a conversation ends, the information is not stored or accessible. However, I'm here to help with any new questions or topics you'd like to discuss."

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: {question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [10]:
printmd(chain.run({"history":Conversation_history, "question":FOLLOW_UP_QUESTION}))

- Reinforcement learning can be used in various domains such as game playing, robotics, autonomous vehicles, resource management, recommendation systems, healthcare, finance, industrial control systems, natural language processing, and personalized education.
- In game playing, reinforcement learning can train agents to play complex games and improve their gameplay over time.
- In robotics, reinforcement learning can train robots to perform tasks like object manipulation, grasping, and navigation.
- In autonomous vehicles, reinforcement learning can help self-driving cars make safe and efficient driving decisions.
- In resource management, reinforcement learning can optimize the allocation of resources like energy, bandwidth, or server capacity.
- In recommendation systems, reinforcement learning can personalize recommendations based on user feedback and improve accuracy over time.
- In healthcare, reinforcement learning can assist in treatment planning, drug dosage optimization, and personalized therapy.
- In finance, reinforcement learning can be used in trading and portfolio management to make investment decisions and manage risk.
- In industrial control systems, reinforcement learning can optimize complex processes like power generation or supply chain management.
- In natural language processing, reinforcement learning can be used to build conversational agents or chatbots that generate appropriate responses and improve language understanding.
- In personalized education, reinforcement learning can develop adaptive learning systems that personalize educational content and provide tailored feedback to individual learners.

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In [11]:
# Since Memory adds tokens to the prompt, we would need a better model that allows more space on the prompt
COMPLETION_TOKENS = 1000
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5, max_tokens=COMPLETION_TOKENS)
embedder = OpenAIEmbeddings(deployment=os.environ["EMBEDDING_DEPLOYMENT"], chunk_size=1) 

In [12]:
index1_name    = "cogsrch-index-files"
index2_name    = "cogsrch-index-csv"
index3_name    = "cogsrch-index-books-vector"
text_indexes   = [index1_name, index2_name]
vector_indexes = [index+"-vector" for index in text_indexes] + [index3_name]
vector_indexes

['cogsrch-index-files-vector',
 'cogsrch-index-csv-vector',
 'cogsrch-index-books-vector']

In [13]:
%%time

# Search in text-based indexes first and update vector indexes
k = 10 # Top k results per each text-based index
ordered_results = get_search_results(QUESTION, text_indexes, k=k, reranker_threshold=1, vector_search=False)
update_vector_indexes(ordered_search_results=ordered_results, embedder=embedder)

# Search in all vector-based indexes available
similarity_k = 5 # top results from multi-vector-index similarity search
ordered_results = get_search_results(QUESTION, vector_indexes, k=k, vector_search=True,
                                        similarity_k=similarity_k,
                                        query_vector = embedder.embed_query(QUESTION))

print("Number of results:",len(ordered_results))

Number of results: 5
CPU times: user 4.65 s, sys: 82.6 ms, total: 4.73 s
Wall time: 23.1 s


In [14]:
# Uncomment the below line if you want to inspect the ordered results
ordered_results

OrderedDict([('aHR0cHM6Ly9ibG9ic3RvcmFnZW9xdWRlcnRlamZjcXcuYmxvYi5jb3JlLndpbmRvd3MubmV0L2FyeGl2Y3MvMDAwMS8wMDAxMDA4djIucGRm0_12',
              {'title': 'arXiv:cs/0001008v2  [cs.MA]  17 Jan 2000_chunk_12',
               'name': '0001008v2.pdf',
               'location': 'https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf',
               'caption': 'Matarić [1995] has studied reinforcement learning in multi-robot domains. She notes, for example, how learning can give rise to social behaviors (Matarić [1997]). The work shows how robots can be individually programmed to pro- duce certain group behaviors.',
               'index': 'cogsrch-index-files-vector',
               'chunk': 'predictive strategies, such as “if the state of the world was x ten time units\nbefore, then it will be x next time so take action a”. The authors later show\nhow learning can be used to eliminate these chaotic global fluctuations.\n\nMatarić [1995] has studied reinforcem

In [15]:
top_docs = []
for key,value in ordered_results.items():
    location = value["location"] if value["location"] is not None else ""
    top_docs.append(Document(page_content=value["chunk"], metadata={"source": location+os.environ['BLOB_SAS_TOKEN']}))
        
print("Number of chunks:",len(top_docs))

Number of chunks: 5


In [16]:
# Comment / Uncomment the below line if you want to inspect the ordered top_docs
print(f"Nr. of elements in top_docs: {len(top_docs)}. Here they are:\n{top_docs}")

Nr. of elements in top_docs: 5. Here they are:
[Document(page_content='predictive strategies, such as “if the state of the world was x ten time units\nbefore, then it will be x next time so take action a”. The authors later show\nhow learning can be used to eliminate these chaotic global fluctuations.\n\nMatarić [1995] has studied reinforcement learning in multi-robot domains.\nShe notes, for example, how learning can give rise to social behaviors (Matarić\n[1997]). The work shows how robots can be individually programmed to pro-\nduce certain group behaviors. It represents a good example of the usefulness\nand flexibility of learning agents in multi-agent domains. However, the author\ndoes not offer a mathematical justification for the chosen individual learning\nalgorithms, nor does she explain why the agents were able to converge to the\nglobal behaviors. Our research hopes to provide the first steps in this direction.\n\nOne particularly interesting approach is taken by Carmel an

In [17]:
# Calculate number of tokens of our docs
if(len(top_docs)>0):
    tokens_limit = model_tokens_limit(MODEL) # this is a custom function we created in common/utils.py
    prompt_tokens = num_tokens_from_string(COMBINE_CHAT_PROMPT_TEMPLATE) # this is a custom function we created in common/utils.py
    context_tokens = num_tokens_from_docs(top_docs) # this is a custom function we created in common/utils.py
    
    requested_tokens = prompt_tokens + context_tokens + COMPLETION_TOKENS
    
    chain_type = "map_reduce" if requested_tokens > 0.9 * tokens_limit else "stuff"  
    
    print("System prompt token count:",prompt_tokens)
    print("Max Completion Token count:", COMPLETION_TOKENS)
    print("Combined docs (context) token count:",context_tokens)
    print("--------")
    print("Requested token count:",requested_tokens)
    print("Token limit for", MODEL, ":", tokens_limit)
    print("Chain Type selected:", chain_type)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")

System prompt token count: 2464
Max Completion Token count: 1000
Combined docs (context) token count: 3132
--------
Requested token count: 6596
Token limit for gpt-35-turbo-16k : 16384
Chain Type selected: stuff


## Using load_qa_with_sources_chain
The *get_answer* function in the next cell is equivalent to the next code, in case **chain_type = map_reduce**:

```
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from common.prompts import (COMBINE_PROMPT, COMBINE_QUESTION_PROMPT)

llm35 = AzureChatOpenAI(deployment_name="gpt-35-turbo", temperature=0.5, max_tokens=COMPLETION_TOKENS)
chain = load_qa_with_sources_chain(llm35, chain_type="map_reduce",
                                   combine_prompt = COMBINE_PROMPT,
                                   question_prompt = COMBINE_QUESTION_PROMPT,
                                   return_intermediate_steps = True)

answer = chain( {"input_documents": top_docs, "question": QUESTION, "language": "it"}, return_only_outputs=True)
```

In [18]:
%%time
# Get the answer
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

Some use cases for reinforcement learning include:

1. Multi-robot domains: Reinforcement learning can be used to study social behaviors and program robots to produce certain group behaviors in multi-robot domains. This can help improve coordination and cooperation among robots. (Source: [1])

2. Model-based learning: Agents can build models of other agents through observations and learn effective models based on finite state machines. This can be useful for understanding the behaviors and actions of other agents in a system. (Source: [1])

3. Agent coordination in multi-agent systems: Reinforcement learning can be used to develop learning agents that converge to system-wide optimal behavior. This can help improve coordination and cooperation among agents in complex systems. (Source: [1])

4. Synthesis of synthetic DNA: Reinforcement learning can be used to reduce random errors in synthetic DNA synthesis. By using consensus shuffling, errors in synthetic DNA can be identified and removed, leading to the rapid and accurate synthesis of long DNA sequences. (Source: [2])

5. Protein database searching: Reinforcement learning can be applied to improve the efficiency and accuracy of protein database searches. For example, the ReHAB tool uses reinforcement learning to find new protein hits in repeated PSI-BLAST searches, allowing for the identification of potentially significant results buried in a long list of previous hits. (Source: [3])

6. Geographic Information Systems (GIS) in healthcare: Reinforcement learning can be used to improve community health and healthcare practices through GIS. GIS can inform and educate, empower decision-making, help in planning and prioritizing actions, and monitor and analyze changes in health and healthcare. (Source: [4])

Please note that the references provided are numerical references to the sources mentioned in the extracted content.

CPU times: user 45 ms, sys: 3.1 ms, total: 48.1 ms
Wall time: 6.36 s


And if we ask the follow up question:

In [19]:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type)
printmd(response['output_text'])

I'm sorry, but I couldn't find any extracted parts that provide the main points of our conversation.

You might get a different response from above, but it doesn't matter what response you get, it will be based on the context given, not on previous answers.

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/docs/modules/memory/how_to/adding_memory_chain_multiple_inputs

In [20]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has several use cases across different domains. Here are some examples that have been mentioned in the extracted parts:

1. **Multi-robot domains**: Reinforcement learning has been studied in multi-robot domains, where learning can give rise to social behaviors and enable robots to individually program certain group behaviors<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[1]</a></sup>.

2. **Model-based learning**: In model-based learning, agents build models of other agents via observations. This approach has been used to effectively learn models based on finite state machines<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[1]</a></sup>.

3. **Learning about agents in multi-agent systems**: Experimental work has been done on learning agents in simple multi-agent systems. For example, learning agents have been shown to converge to system-wide optimal behavior, and different learning algorithms have been compared for developing agent coordination<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[1]</a></sup>.

4. **Synthetic DNA synthesis**: Reinforcement learning has been used to significantly reduce random errors in synthetic DNA synthesis. A method called consensus shuffling has been introduced to improve the accuracy and efficiency of synthesizing long DNA sequences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1072806/?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[2]</a></sup>.

5. **Finding new protein hits**: A tool called ReHAB (Recent Hits Acquired from BLAST) has been developed to find new protein hits in repeated PSI-BLAST searches. ReHAB compares results from PSI-BLAST searches performed with different versions of a protein sequence database and highlights hits that are present only in the updated database<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC549547/?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[3]</a></sup>.

6. **Geographic

In [21]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Here are the main points of our conversation:

1. Reinforcement learning has several use cases across different domains.
   - Multi-robot domains: Reinforcement learning has been studied in multi-robot domains, where learning can give rise to social behaviors and enable robots to individually program certain group behaviors<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[1]</a></sup>.
   - Model-based learning: In model-based learning, agents build models of other agents via observations. This approach has been used to effectively learn models based on finite state machines<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[1]</a></sup>.
   - Learning about agents in multi-agent systems: Experimental work has been done on learning agents in simple multi-agent systems. For example, learning agents have been shown to converge to system-wide optimal behavior, and different learning algorithms have been compared for developing agent coordination<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[1]</a></sup>.

2. Synthetic DNA synthesis: Reinforcement learning has been used to significantly reduce random errors in synthetic DNA synthesis. A method called consensus shuffling has been introduced to improve the accuracy and efficiency of synthesizing long DNA sequences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1072806/?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[2]</a></sup>.

3. Finding new protein hits: A tool called ReHAB (Recent Hits Acquired from BLAST) has been developed to find new protein hits in repeated PSI-BLAST searches. ReHAB compares results from PSI-BLAST searches performed with different versions of a protein sequence database and highlights hits that are present only in the updated database<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC549547/?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[3]</a></sup>.

4. Geographic Information Systems (GIS): GIS has applications in health and

In [22]:
memory.buffer

'Human: Tell me some use cases for reinforcement learning\nAI: Reinforcement learning has several use cases across different domains. Here are some examples that have been mentioned in the extracted parts:\n\n1. **Multi-robot domains**: Reinforcement learning has been studied in multi-robot domains, where learning can give rise to social behaviors and enable robots to individually program certain group behaviors<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[1]</a></sup>.\n\n2. **Model-based learning**: In model-based learning, agents build models of other agents via observations. This approach has been used to effectively learn models based on finite state machines<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&s

In [23]:
# Another follow up query
response = get_answer(llm=llm, docs=top_docs, query="Thank you", language="English", chain_type=chain_type,  
                      memory=memory)
printmd(response['output_text'])

Here are the main points of our conversation:

1. Reinforcement learning has several use cases across different domains.
   - Multi-robot domains: Reinforcement learning has been studied in multi-robot domains, where learning can give rise to social behaviors and enable robots to individually program certain group behaviors<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[1]</a></sup>.
   - Model-based learning: In model-based learning, agents build models of other agents via observations. This approach has been used to effectively learn models based on finite state machines<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[1]</a></sup>.
   - Learning about agents in multi-agent systems: Experimental work has been done on learning agents in simple multi-agent systems. For example, learning agents have been shown to converge to system-wide optimal behavior, and different learning algorithms have been compared for developing agent coordination<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[1]</a></sup>.

2. Synthetic DNA synthesis: Reinforcement learning has been used to significantly reduce random errors in synthetic DNA synthesis. A method called consensus shuffling has been introduced to improve the accuracy and efficiency of synthesizing long DNA sequences<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1072806/?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[2]</a></sup>.

3. Finding new protein hits: A tool called ReHAB (Recent Hits Acquired from BLAST) has been developed to find new protein hits in repeated PSI-BLAST searches. ReHAB compares results from PSI-BLAST searches performed with different versions of a protein sequence database and highlights hits that are present only in the updated database<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC549547/?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[3]</a></sup>.

4. Geographic Information Systems (GIS): GIS has applications in health and

You might get a different answer on the above cell, and it is ok, this bot is not yet well configured to answer any question that is not related to its knowledge base, including salutations.

Let's check our memory to see that it's keeping the conversation

In [24]:
memory.buffer

'Human: Tell me some use cases for reinforcement learning\nAI: Reinforcement learning has several use cases across different domains. Here are some examples that have been mentioned in the extracted parts:\n\n1. **Multi-robot domains**: Reinforcement learning has been studied in multi-robot domains, where learning can give rise to social behaviors and enable robots to individually program certain group behaviors<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D">[1]</a></sup>.\n\n2. **Model-based learning**: In model-based learning, agents build models of other agents via observations. This approach has been used to effectively learn models based on finite state machines<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&s

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wisg to provide recommendations. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory, see [HERE](https://python.langchain.com/en/latest/_modules/langchain/memory/chat_message_histories/cosmos_db.html)

In [25]:
# Create CosmosDB instance from langchain cosmos class.
cosmos = CosmosDBChatMessageHistory(
    cosmos_endpoint   = os.environ['AZURE_COSMOSDB_ENDPOINT'],
    cosmos_database   = os.environ['AZURE_COSMOSDB_NAME'],
    cosmos_container  = os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
    connection_string = os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
    session_id        = "Agent-Test-Session" + str(random.randint(1, 1000)),
    user_id           = "Agent-Test-User" + str(random.randint(1, 1000))
    )

# prepare the cosmosdb instance
cosmos.prepare_cosmos()

In [26]:
# Create or Memory Object
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question",chat_memory=cosmos)

In [27]:
# Testing using our Question
response = get_answer(llm=llm, docs=top_docs, query=QUESTION, language="English", chain_type=chain_type, 
                        memory=memory)
printmd(response['output_text'])

Reinforcement learning has several use cases across different domains. Here are some examples:

1. **Multi-robot domains**: Reinforcement learning can be used to study the behavior of multiple robots in a coordinated manner. For example, learning algorithms can be used to program robots to exhibit social behaviors and produce certain group behaviors<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D" target="_blank">[1]</a></sup>.

2. **Model-based learning**: Agents can build models of other agents through observations. This approach involves using models based on finite state machines and learning them effectively by observing the actions of other agents<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D" target="_blank">[1]</a></sup>.

3. **Agent coordination**: Learning agents can be used to coordinate their actions and achieve system-wide optimal behavior. This can be done by using learning algorithms such as Q-learning or modified classifier systems<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D" target="_blank">[1]</a></sup>.

4. **Market-based multi-agent systems**: Reinforcement learning can be applied to market-based multi-agent systems, where agents learn to optimize their behavior in market-like environments. This can help in understanding the effects of learning biases and models of other agents<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D" target="_blank">[1]</a></sup>.

These are just a few examples, and reinforcement learning has applications in various other fields. Let me know if you need more information.

References:
[1]<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D" target="_blank">source</a></sup>

In [28]:
# Now we add a follow up question:
response = get_answer(llm=llm, docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", chain_type=chain_type, 
                      memory=memory)
printmd(response['output_text'])

Based on our conversation, here are the main points:

1. Reinforcement learning has several use cases across different domains, including:
   - Multi-robot domains: Reinforcement learning can be used to study the behavior of multiple robots in a coordinated manner, such as programming them to exhibit social behaviors and produce certain group behaviors<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D" target="_blank">[1]</a></sup>.
   - Model-based learning: Agents can build models of other agents through observations, using models based on finite state machines and learning them effectively by observing the actions of other agents<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D" target="_blank">[1]</a></sup>.
   - Agent coordination: Learning agents can be used to coordinate their actions and achieve system-wide optimal behavior, using learning algorithms such as Q-learning or modified classifier systems<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D" target="_blank">[1]</a></sup>.
   - Market-based multi-agent systems: Reinforcement learning can be applied to market-based multi-agent systems, where agents learn to optimize their behavior in market-like environments, helping understand the effects of learning biases and models of other agents<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D" target="_blank">[1]</a></sup>.

2. Consensus shuffling is a method used to significantly reduce random errors in synthetic DNA. It involves re-hybridization of the DNA population, fragmentation, and removal of mismatched fragments. PCR assembly of the remaining fragments yields a new population of full-length sequences enriched for the consensus sequence of the input population<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1072806/?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D" target="_blank">source</a

Let's check our Azure CosmosDB to see the whole conversation


In [30]:
#load message from cosmosdb
cosmos.load_messages()
cosmos.messages

[HumanMessage(content='Tell me some use cases for reinforcement learning'),
 AIMessage(content='Reinforcement learning has several use cases across different domains. Here are some examples:\n\n1. **Multi-robot domains**: Reinforcement learning can be used to study the behavior of multiple robots in a coordinated manner. For example, learning algorithms can be used to program robots to exhibit social behaviors and produce certain group behaviors<sup><a href="https://blobstorageoqudertejfcqw.blob.core.windows.net/arxivcs/0001/0001008v2.pdf?sv=2023-01-03&ss=btqf&srt=sco&st=2023-11-25T09%3A09%3A24Z&se=2030-11-26T09%3A09%3A00Z&sp=rl&sig=1zCNOg4UIcVHew2GngrqYs%2FyF1Nq%2BnvD5nPf6Ka3k%2B0%3D" target="_blank">[1]</a></sup>.\n\n2. **Model-based learning**: Agents can build models of other agents through observations. This approach involves using models based on finite state machines and learning them effectively by observing the actions of other agents<sup><a href="https://blobstorageoqudertejf

![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, regardless of the input and it struggles to respond to prompts like: Hello, Thank you, Bye, What's your name, What's the weather and any other task that is not search in the knowledge base.


## <u>Important Note</u>:<br>
As we proceed, while all the code will remain compatible with GPT-3.5 models, we highly recommend transitioning to GPT-4. Here's why:

**GPT-3.5-Turbo** can be likened to a 7-year-old child. You can provide it with concise instructions, but it frequently struggles to follow them accurately. Additionally, its limited memory can make sustained conversations challenging.

**GPT-3.5-Turbo-16k** resembles the same 7-year-old, but with an increased attention span for longer instructions. However, it still faces difficulties accurately executing them about half the time.

**GPT-4** exhibits the capabilities of a 10-12-year-old child. It possesses enhanced reasoning skills and more consistently adheres to instructions. While its memory retention for instructions is moderate, it excels at following them.

**GPT-4-32k** is akin to the 10-12-year-old child with an extended memory. It comprehends lengthy sets of instructions and engages in meaningful conversations. Thanks to its robust memory, it offers detailed responses.

Understanding this analogy above will become clearer as you complete the final notebook.


# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data?** The next notebook explains and solves the tabular problem and the concept of Agents