# Understanding Memory in LLMs

In the previous Notebooks, we successfully explored how OpenAI models can enhance the results from Azure AI Search queries. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that LLMs (Large Language Models) have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain_community.chat_message_histories import ChatMessageHistory, CosmosDBChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables import ConfigurableFieldSpec
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from operator import itemgetter
from typing import List

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import CustomAzureSearchRetriever, get_answer
from common.prompts import DOCSEARCH_PROMPT

from dotenv import load_dotenv
load_dotenv("credentials.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning"
FOLLOW_UP_QUESTION = "What was my prior question?"

In [8]:
COMPLETION_TOKENS = 1000
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=os.environ["GPT4o_DEPLOYMENT_NAME"], 
                      temperature=0.5, max_tokens=COMPLETION_TOKENS)

In [9]:
# We create a very simple prompt template, just the question as is:
output_parser = StrOutputParser()
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an assistant that give thorough responses to users."),
    ("user", "{input}")
])

In [10]:
# Let's see what the GPT model responds
chain = prompt | llm | output_parser
response_to_initial_question = chain.invoke({"input": QUESTION})
display(Markdown(response_to_initial_question))

Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize some notion of cumulative reward. Here are some prominent use cases for RL across various domains:

### 1. **Gaming and Simulations**
   - **Game Playing:** RL has been famously used in games like Chess, Go, and video games. For example, DeepMind's AlphaGo and AlphaZero have demonstrated superhuman performance.
   - **Simulations:** RL can be used to train agents in simulated environments before deploying them in the real world, such as in robotic simulations or autonomous driving simulations.

### 2. **Robotics**
   - **Robot Control:** RL can be used to teach robots complex tasks such as walking, grasping objects, and performing assembly tasks.
   - **Autonomous Navigation:** RL helps in developing navigation systems for drones, self-driving cars, and other autonomous vehicles.

### 3. **Finance**
   - **Trading Algorithms:** RL can be applied to develop trading strategies that adapt to market conditions.
   - **Portfolio Management:** It can help in optimizing portfolios by learning to balance risk and return over time.

### 4. **Healthcare**
   - **Personalized Medicine:** RL can be used to tailor treatment plans for individual patients based on their responses to previous treatments.
   - **Drug Discovery:** It can help in optimizing the process of discovering new drugs by efficiently exploring the chemical space.

### 5. **Energy Management**
   - **Smart Grids:** RL can optimize the distribution of electricity in smart grids, balancing supply and demand in real-time.
   - **Energy-efficient Buildings:** It can be used to manage heating, ventilation, and air conditioning (HVAC) systems to minimize energy consumption while maintaining comfort.

### 6. **Natural Language Processing (NLP)**
   - **Dialogue Systems:** RL can be used to train chatbots and virtual assistants to have more natural and effective conversations.
   - **Text Summarization:** It can help in generating summaries that maximize the informativeness and coherence of the text.

### 7. **Marketing and Advertising**
   - **Personalized Recommendations:** RL can optimize recommendation systems for e-commerce platforms, streaming services, etc.
   - **Ad Placement:** It can help in deciding the best placement and timing of ads to maximize engagement and revenue.

### 8. **Manufacturing**
   - **Process Optimization:** RL can optimize manufacturing processes to improve efficiency and reduce waste.
   - **Predictive Maintenance:** It can help in predicting equipment failures and scheduling maintenance to minimize downtime.

### 9. **Telecommunications**
   - **Network Optimization:** RL can be used to optimize the allocation of resources in telecommunications networks to improve performance and reduce costs.
   - **Traffic Management:** It can help in managing network traffic to avoid congestion and ensure quality of service.

### 10. **Transportation**
   - **Traffic Signal Control:** RL can optimize the timing of traffic signals to reduce congestion and improve traffic flow.
   - **Fleet Management:** It can help in optimizing routes and schedules for delivery trucks, taxis, public transportation, etc.

### 11. **Education**
   - **Personalized Learning:** RL can be used to develop adaptive learning systems that tailor educational content to the needs of individual students.
   - **Tutoring Systems:** It can help in creating intelligent tutoring systems that provide personalized feedback and guidance.

### 12. **Resource Management**
   - **Supply Chain Optimization:** RL can optimize various aspects of the supply chain, from inventory management to logistics.
   - **Water Resource Management:** It can be used to optimize the allocation and use of water resources.

These are just a few examples, and the potential applications of reinforcement learning are vast and continually expanding as the technology matures and evolves.

In [11]:
#Now let's ask a follow up question
printmd(chain.invoke({"input": FOLLOW_UP_QUESTION}))

I'm sorry, but I don't have access to previous interactions or any prior questions you've asked. How can I assist you today?

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [12]:
hist_prompt = ChatPromptTemplate.from_template(
"""
    {history}
    Human: {question}
    AI:
"""
)
chain = hist_prompt | llm | output_parser

In [13]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response_to_initial_question)

In [14]:
printmd(chain.invoke({"history":Conversation_history, "question": FOLLOW_UP_QUESTION}))

Your prior question was: "Tell me some use cases for reinforcement learning."

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

From Langchain website:
    
A memory system needs to support two basic actions: reading and writing. Recall that every chain defines some core execution logic that expects certain inputs. Some of these inputs come directly from the user, but some of these inputs can come from memory. A chain will interact with its memory system twice in a given run.

    AFTER receiving the initial user inputs but BEFORE executing the core logic, a chain will READ from its memory system and augment the user inputs.
    AFTER executing the core logic but BEFORE returning the answer, a chain will WRITE the inputs and outputs of the current run to memory, so that they can be referred to in future runs.
    
So this process adds delays to the response, but it is a necessary delay :)

![image](./images/memory_diagram.png)

In [15]:
index1_name = "srch-index-files"
index2_name = "srch-index-csv"
index3_name = "srch-index-books"
indexes = [index1_name, index2_name, index3_name]

In [16]:
# Initialize our custom retriever 
retriever = CustomAzureSearchRetriever(indexes=indexes, topK=10, reranker_threshold=1)

If you check closely in prompts.py, there is an optional variable in the `DOCSEARCH_PROMPT` called `history`. Now it is the time to use it. It is basically a place holder were we will inject the conversation in the prompt so the LLM is aware of it before it answers.

**Now let's add memory to it:**

In [17]:
store = {} # Our first memory will be a dictionary in memory

# We have to define a custom function that takes a session_id and looks somewhere
# (in this case in a dictionary in memory) for the conversation
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


In [18]:
# We use our original chain with the retriever but removing the StrOutputParser
chain = (
    {
        "context": itemgetter("question") | retriever, 
        "question": itemgetter("question"),
        "history": itemgetter("history")
    }
    | DOCSEARCH_PROMPT
    | llm
)

## Then we pass the above chain to another chain that adds memory to it

output_parser = StrOutputParser()

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
) | output_parser

In [19]:
# This is where we configure the session id
config={"configurable": {"session_id": "abc123"}}

Notice below, that we are adding a `history` variable in the call. This variable will hold the chat historywithin the prompt.

In [20]:
printmd(chain_with_history.invoke({"question": QUESTION}, config=config))

Reinforcement learning (RL) has a wide range of applications across various domains due to its ability to learn optimal policies through trial and error. Here are some notable use cases:

1. **Games and Simulations:**
   - **Backgammon:** Tesauro applied the temporal difference algorithm to backgammon, using a backpropagation-based neural network to approximate the value function. This approach allowed the system to learn by self-play and achieve a high level of proficiency, even competing at the top level of international human play [[1]](https://datasetsgptsmartsearch.blob.core.windows.net/arxivcs/pdf/9605/9605103v1.pdf).
   - **Chess:** Similar to backgammon, chess can be learned by reinforcement learning through example games presented in the form of sensible (board-state, move) sequences. This method helps the system learn legal and good moves by evaluating its own moves after several games [[2]](https://datasetsgptsmartsearch.blob.core.windows.net/arxivcs/pdf/0004/0004001v1.pdf).

2. **Robotics:**
   - **Juggling Robot:** Schaal and Atkeson developed a two-armed robot that learns to juggle a device known as a devil-stick. The robot uses a combination of dynamic programming and locally weighted regression to improve its juggling policy from experience [[1]](https://datasetsgptsmartsearch.blob.core.windows.net/arxivcs/pdf/9605/9605103v1.pdf).
   - **Mobile Robot Navigation:** Mahadevan and Connell discussed a task where a mobile robot learns to push large boxes for extended periods, showcasing RL's applicability in physical interaction and navigation tasks [[1]](https://datasetsgptsmartsearch.blob.core.windows.net/arxivcs/pdf/9605/9605103v1.pdf).

3. **Control Systems:**
   - **Adaptive Control:** RL is used in adaptive control systems where the goal is to improve a sequence of decisions from experience. This is particularly useful in dynamic systems where states and actions are vectors, and system dynamics are smooth [[3]](https://datasetsgptsmartsearch.blob.core.windows.net/arxivcs/pdf/9605/9605103v1.pdf).

4. **Epidemic Modeling:**
   - **Microscopic Multi-Agent Epidemic Model:** RL can be used to model epidemics by simulating individual agents' decisions that affect the spread of the disease. This approach helps in predicting the spread and identifying necessary external interventions to regulate behaviors [[4]](https://arxiv.org/pdf/2004.12959v1.pdf).

These examples illustrate the versatility of reinforcement learning in tackling complex problems across different fields by enabling systems to learn optimal behaviors through interactions with their environments.

In [21]:
# Remembers
printmd(chain_with_history.invoke({"question": FOLLOW_UP_QUESTION},config=config))

Your prior question was: "Tell me some use cases for reinforcement learning."

In [22]:
# Remembers
printmd(chain_with_history.invoke({"question": "Thank you! Good bye"},config=config))

You're welcome! If you have any more questions in the future, feel free to ask. Goodbye!

## Using CosmosDB as persistent memory

In previous cell we have added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the bot user conversations, not only for Analytics and Auditing, but also if we wish to provide recommendations in the future. 

Here we will store the conversation history into CosmosDB for future auditing purpose.
We will use a class in LangChain use CosmosDBChatMessageHistory

In [23]:
# Create the function to retrieve the conversation

def get_session_history(session_id: str, user_id: str) -> CosmosDBChatMessageHistory:
    cosmos = CosmosDBChatMessageHistory(
        cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
        cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
        cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
        connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
        session_id=session_id,
        user_id=user_id
        )

    # prepare the cosmosdb instance
    cosmos.prepare_cosmos()
    return cosmos


In [24]:
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="user_id",
            annotation=str,
            name="User ID",
            description="Unique identifier for the user.",
            default="",
            is_shared=True,
        ),
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="Unique identifier for the conversation.",
            default="",
            is_shared=True,
        ),
    ],
) | output_parser

In [25]:
# This is where we configure the session id and user id
random_session_id = "session"+ str(random.randint(1, 1000))
ramdom_user_id = "user"+ str(random.randint(1, 1000))

config={"configurable": {"session_id": random_session_id, "user_id": ramdom_user_id}}

In [26]:
config

{'configurable': {'session_id': 'session57', 'user_id': 'user779'}}

In [27]:
printmd(chain_with_history.invoke({"question": QUESTION}, config=config))

Reinforcement learning (RL) has a wide range of applications across various domains due to its ability to learn optimal policies through interaction with the environment. Here are some notable use cases:

1. **Games and Simulations:**
   - **Backgammon:** Tesauro applied the temporal difference algorithm to backgammon, creating a program called TD-Gammon. This program used a neural network as a function approximator for the value function and was trained through self-play. Despite its simplistic exploration strategy, TD-Gammon achieved a high level of play, competing at the top level of international human play [[1]](https://datasetsgptsmartsearch.blob.core.windows.net/arxivcs/pdf/9605/9605103v1.pdf).
   - **Chess:** Chess can be learned by reinforcement learning, although the learning rate can be impractically slow due to the sparse feedback (only knowing if a move was good or bad at the end of the game). A more practical method involves presenting example games and asking the system to make its own moves based on these examples [[2]](https://datasetsgptsmartsearch.blob.core.windows.net/arxivcs/pdf/0004/0004001v1.pdf).

2. **Robotics and Control:**
   - **Juggling Robot:** Schaal and Atkeson developed a two-armed robot that learns to juggle a devil-stick, a complex non-linear control task. The robot learned from experience and used a function approximation scheme known as locally weighted regression to generalize to unvisited states, improving its policy through dynamic programming techniques [[3]](https://datasetsgptsmartsearch.blob.core.windows.net/arxivcs/pdf/9605/9605103v1.pdf).
   - **Mobile Robots:** Mahadevan and Connell discussed tasks where a mobile robot pushes large boxes for extended periods, showcasing the use of RL in physical tasks that require continuous learning and adaptation [[3]](https://datasetsgptsmartsearch.blob.core.windows.net/arxivcs/pdf/9605/9605103v1.pdf).

3. **Healthcare:**
   - **Epidemic Modeling:** A microscopic multi-agent epidemic model uses RL to determine optimal activity levels for individuals to minimize the spread of disease. This model can predict the spread of disease based on individual decisions and highlight the need for external interventions when infected agents do not have enough incentives to protect others [[4]](https://arxiv.org/pdf/2004.12959v1.pdf).

4. **Adaptive Control Systems:**
   - **Dynamic Systems:** In adaptive control, RL is used to improve a sequence of decisions from experience, especially in dynamic systems where states and actions are vectors and system dynamics are smooth. This approach is common in systems that require robust, practical algorithms for real-world deployment [[5]](https://datasetsgptsmartsearch.blob.core.windows.net/arxivcs/pdf/9605/9605103v1.pdf).

These examples illustrate the versatility of reinforcement learning in solving complex problems across various fields by learning optimal policies through interaction with the environment and feedback mechanisms.



In [28]:
# Remembers
printmd(chain_with_history.invoke({"question": FOLLOW_UP_QUESTION},config=config))

Your prior question was: "Tell me some use cases for reinforcement learning."

In [29]:
# Remembers
printmd(chain_with_history.invoke(
    {"question": "Can you tell me a one line summary of our conversation?"},
    config=config))

We discussed various use cases of reinforcement learning, including applications in games, robotics, healthcare, and adaptive control systems.

In [30]:
try:
    printmd(chain_with_history.invoke(
    {"question": "Thank you very much!"},
    config=config))
except Exception as e:
    print(e)

You're welcome! If you have any more questions, feel free to ask. Have a great day!

In [31]:
printmd(chain_with_history.invoke(
    {"question": "I do have one more question, why did you give me a one line summary?"},
    config=config))

I provided a one-line summary of our conversation because you requested it in your previous message. If you need more detailed information or have another question, feel free to ask!

In [32]:
printmd(chain_with_history.invoke(
    {"question": "why not 2?"},
    config=config))

I apologize for any confusion. If you would like a more detailed summary, here it is in two lines:

We discussed various use cases of reinforcement learning, including applications in games like backgammon and chess, robotics such as juggling robots and mobile robots, healthcare through epidemic modeling, and adaptive control systems. These examples illustrate the versatility of reinforcement learning in solving complex problems across various fields.

Feel free to ask if you need more information or have any other questions!

#### Let's check our Azure CosmosDB to see the whole conversation


![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, regardless of the input. This doesn't seem efficient, but regardless, we are very close to finish our first RAG-talk to your data Bot.


## <u>Important Note</u>:<br>
As we proceed, while all the code will remain compatible with GPT-3.5 models (1106 or newer), we highly recommend transitioning to GPT-4. Here's why:

**GPT-3.5-Turbo** can be likened to a 7-year-old child. You can provide it with concise instructions, but it struggles sometimes to follow them accurately (not too reliable). Additionally, its limited "memory" (token context) can make sustained conversations challenging. Its response are also simple not deep.

**GPT-4** exhibits the capabilities of a 10-12-year-old child. It possesses enhanced reasoning skills, consistently adheres to instructions and its answers are beter. It has extended memory retention (larger context size) for instructions, and it excels at following them. Its responses are deep and thorough.


# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

In the next notebook 6, we are going to build our first RAG bot. In order to do this we will introduce the concept of Agents.