# Understanding Memory in LLMs

In the previous Notebook 03, we successfully explored how OpenAI models can enhance the results from Azure Cognitive Search. [Bing Chat](http://chat.bing.com/) is a GPT-4 model that utilizes the content of search results to provide context and deliver accurate responses to queries.

However, we have yet to discover how to engage in a conversation with the LLM. With Bing Chat, this is possible, as the LLM can understand and reference the previous responses.

There is a common misconception that GPT models have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

The aim of this Notebook is to demonstrate how we can "provide memory" to the LLM by utilizing prompts and context.

In [1]:
import os
import random
from collections import OrderedDict
from IPython.display import display, HTML
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationChain
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT
from langchain.memory import ConversationBufferMemory
from langchain.chat_models import AzureChatOpenAI
from openai.error import OpenAIError
from langchain.docstore.document import Document

from IPython.display import Markdown, display

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from app.utils import (
    get_search_results,
    order_search_results,
    model_tokens_limit,
    num_tokens_from_docs,
    embed_docs,
    search_docs,
    get_answer,
)

from app.prompts import COMBINE_QUESTION_PROMPT, COMBINE_PROMPT, COMBINE_CHAT_PROMPT

# Don't mess with this unless you really know what you are doing
AZURE_SEARCH_API_VERSION = '2021-04-30-Preview'
AZURE_OPENAI_API_VERSION = "2023-03-15-preview"

# Change these below with your own services credentials
AZURE_SEARCH_ENDPOINT = "Enter your Azure Cognitive Search Endpoint ..."
AZURE_SEARCH_KEY = "Enter your Azure Cognitive Search Key ..."
AZURE_OPENAI_ENDPOINT = "Enter your Azure OpenAI Endpoint ..."
AZURE_OPENAI_API_KEY = "Enter your Azure OpenAI Key ..."

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_BASE"] = os.environ["AZURE_OPENAI_ENDPOINT"] = AZURE_OPENAI_ENDPOINT
os.environ["OPENAI_API_KEY"] = os.environ["AZURE_OPENAI_API_KEY"] = AZURE_OPENAI_API_KEY
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"] = AZURE_OPENAI_API_VERSION
os.environ["AZURE_SEARCH_KEY"] = AZURE_SEARCH_KEY
os.environ["AZURE_SEARCH_ENDPOINT"] = AZURE_SEARCH_ENDPOINT
os.environ["OPENAI_API_TYPE"] = "azure"

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning?"
FOLLOW_UP_QUESTION = "Can you rephrase what you just said?"

In [4]:
# Define model
MODEL = "gpt-35-turbo"
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=MODEL, temperature=0.5)

In [5]:
# We create a very simple prompt template, just the question as is:
prompt = PromptTemplate(
    input_variables=["question"],
    template="{question}",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [6]:
# Let's see what the GPT model responds
response = chain.run(QUESTION)
printmd(response)

1. Robotics: Reinforcement learning can be used to train robots to perform complex tasks such as object recognition, grasping, and manipulation.

2. Autonomous vehicles: Reinforcement learning can be used to train self-driving cars to navigate through complex environments and make decisions in real-time.

3. Game AI: Reinforcement learning can be used to train game agents to play games such as chess, Go, and poker at a professional level.

4. Recommendation systems: Reinforcement learning can be used to personalize recommendations for users based on their preferences and behavior.

5. Energy management: Reinforcement learning can be used to optimize energy consumption in buildings, factories, and other industrial settings.

6. Finance: Reinforcement learning can be used to optimize investment portfolios, predict stock prices, and detect fraudulent transactions.

7. Healthcare: Reinforcement learning can be used to develop personalized treatment plans for patients based on their medical history and symptoms.

8. Advertising: Reinforcement learning can be used to optimize ad placement and targeting to maximize engagement and conversions.

In [7]:
#Now let's ask a follow up question
chain.run(FOLLOW_UP_QUESTION)

'As an AI language model, I cannot rephrase what I just said as I have no memory of my previous response. However, if you provide me with the specific sentence or phrase you want me to rephrase, I can certainly do that for you.'

As you can see, it doesn't remember what it just responded. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template="""
                {history}
                Human: {question}
                AI:
            """
    )
chain = LLMChain(llm=llm, prompt=hist_prompt)

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response)

In [10]:
chain.run({"history":Conversation_history, "question": FOLLOW_UP_QUESTION})

'Sure, here are some examples of how reinforcement learning can be applied:\n\n1. Training robots to perform complex tasks like object recognition, grasping, and manipulation.\n2. Teaching self-driving cars to navigate through complex environments and make decisions in real-time.\n3. Developing game agents that can play games like chess, Go, and poker at a professional level.\n4. Personalizing recommendations for users based on their preferences and behavior.\n5. Optimizing energy consumption in buildings and factories.\n6. Predicting stock prices, detecting fraudulent transactions, and optimizing investment portfolios in finance.\n7. Developing personalized treatment plans for patients based on their medical history and symptoms in healthcare.\n8. Optimizing ad placement and targeting to maximize engagement and conversions in advertising.'

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

In order to not duplicate code, we have put many of the code in Notebook 3 into functions. These functions are in the app/utils.py and app/prompts.py files This way we can use these functios in the app that we will build later

In [11]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
indexes = [index1_name, index2_name]

agg_search_results = get_search_results(QUESTION, indexes)
ordered_results = order_search_results(agg_search_results, reranker_threshold=1)

In [12]:
docs = []
for key,value in ordered_results.items():
    for page in value["chunks"]:
        docs.append(Document(page_content=page, metadata={"source": value["location"]}))

# Calculate number of tokens of our docs
# setting encoding for GPT3.5 / GPT4 models
tokens_limit = model_tokens_limit(MODEL)

if(len(docs)>0):
    num_tokens = num_tokens_from_docs(docs)
    # if the token count > model threshold
    print("Custom token limit for", MODEL, ":", tokens_limit)
    print("Combined docs tokens count:",num_tokens)
        
else:
    print("NO RESULTS FROM AZURE SEARCH")


Custom token limit for gpt-35-turbo : 3000
Combined docs tokens count: 80428


In [13]:
%%time
if num_tokens > tokens_limit:
    index = embed_docs(docs)
    top_docs = search_docs(index,QUESTION)
    
    # Now we need to recalculate the tokens count of the top results from similarity vector search
    # in order to select the chain type: stuff or map_reduce
    
    num_tokens = num_tokens_from_docs(top_docs)   
    print("Token count after similarity search:", num_tokens)
    chain_type = "map_reduce" if num_tokens > tokens_limit else "stuff"
    
else:
    # if total tokens is less than our limit, we don't need to vectorize and do similarity search
    top_docs = docs
    chain_type = "stuff"
    
print("Chain Type selected:", chain_type)

Number of chunks: 70


  from .autonotebook import tqdm as notebook_tqdm


Token count after similarity search: 4693
Chain Type selected: map_reduce
CPU times: user 18.5 s, sys: 1.99 s, total: 20.5 s
Wall time: 9.33 s


In [14]:
# Get the answer
response = get_answer(docs=top_docs, query=QUESTION, language="English", deployment=MODEL, chain_type=chain_type)
response['output_text']

'The documents do not provide a comprehensive list of use cases for reinforcement learning, but they do provide information on different techniques and approaches that could be applied to various use cases. \nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0604/0604010v1.pdf'

And if we ask the follow up question:

In [15]:
response = get_answer(docs=top_docs,  query=FOLLOW_UP_QUESTION, language="English",deployment=MODEL, chain_type=chain_type)
response['output_text']

'The given portion of the document does not contain a statement that can be rephrased. It mainly discusses reinforcement learning techniques and their applications. \nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf'

Until now we just have the same as the prior Notebook 03: results from Azure Search enhanced by OpenAI model, with no memory

**Now let's add memory to it:**

Reference: https://python.langchain.com/en/latest/modules/memory/examples/adding_memory_chain_multiple_inputs.html

In [20]:
# memory object, which is neccessary to track the inputs/outputs and hold a conversation.
memory = ConversationBufferMemory(memory_key="chat_history",input_key="question")

response = get_answer(docs=top_docs, query=QUESTION, language="English", deployment=MODEL, chain_type=chain_type, 
                      memory=memory)
response['output_text']

'Reinforcement learning techniques can be used in a variety of applications, such as robotics, game playing, and decision-making. While there is no single specific use case mentioned in the provided document, optimal decision thresholds for the multi-armed bandit problem is a common use case for reinforcement learning. Some algorithms have been used in a variety of applications, including the AHC architecture. \nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0604/0604010v1.pdf'

In [21]:
# Now we add a follow up question:
response = get_answer(docs=top_docs, query=FOLLOW_UP_QUESTION, language="English", deployment=MODEL, chain_type=chain_type,  
                      memory=memory)
response['output_text']

'I apologize, I misunderstood your previous question. To rephrase it, reinforcement learning techniques can be applied in various scenarios, including robotics, game playing, and decision-making. However, it can be challenging to scale these techniques for larger problems. Incorporating bias into the learning process is necessary to solve highly complex problems. The exploration/exploitation tradeoff is a crucial aspect of reinforcement learning and depends on the problem formulation and model definition. \nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0604/0604010v1.pdf'

In [22]:
# Another follow up query
response = get_answer(docs=top_docs, query="Thank you", language="English", deployment=MODEL, chain_type=chain_type,  
                      memory=memory)
response['output_text']

"You're welcome! If you have any further questions, feel free to ask. \nSOURCES: N/A"

Let's check our memory to see that it's keeping the conversation

In [23]:
memory.buffer

"Human: Tell me some use cases for reinforcement learning?\nAI: Reinforcement learning techniques can be used in a variety of applications, such as robotics, game playing, and decision-making. While there is no single specific use case mentioned in the provided document, optimal decision thresholds for the multi-armed bandit problem is a common use case for reinforcement learning. Some algorithms have been used in a variety of applications, including the AHC architecture. \nSOURCES: https://demodatasetsp.blob.core.windows.net/arxivcs/9605/9605103v1.pdf, https://demodatasetsp.blob.core.windows.net/arxivcs/0604/0604010v1.pdf\nHuman: Can you rephrase what you just said?\nAI: I apologize, I misunderstood your previous question. To rephrase it, reinforcement learning techniques can be applied in various scenarios, including robotics, game playing, and decision-making. However, it can be challenging to scale these techniques for larger problems. Incorporating bias into the learning process i

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead memory is something that we must provide to the LLM in form of context.


# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

But, does this solve all the possible scenarios that a virtual assistant will require?  **What about if the answer to the Smart Search Engine is not related to text, but instead requires to look into tabular data? or when it needs to, for example, answer something related to recent weather events?** The next notebook 04 explains and solves the tabular problem and the concept of Agents