# Understanding Memory in LLMs

In the previous Notebooks, we successfully explored how OpenAI models can enhance the results from Azure AI Search queries. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Bing Chat](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that LLMs (Large Language Models) have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain_community.chat_message_histories import ChatMessageHistory, CosmosDBChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables import ConfigurableFieldSpec
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from operator import itemgetter
from typing import List

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import CustomAzureSearchRetriever, get_answer
from common.prompts import DOCSEARCH_PROMPT

from dotenv import load_dotenv
load_dotenv("credentials.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "Tell me some use cases for reinforcement learning"
FOLLOW_UP_QUESTION = "What was my prior question?"

In [4]:
COMPLETION_TOKENS = 1000
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=os.environ["GPT35_DEPLOYMENT_NAME"], 
                      temperature=0.5, max_tokens=COMPLETION_TOKENS)

In [5]:
# We create a very simple prompt template, just the question as is:
output_parser = StrOutputParser()
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an assistant that give thorough responses to users."),
    ("user", "{input}")
])

In [6]:
# Let's see what the GPT model responds
chain = prompt | llm | output_parser
response_to_initial_question = chain.invoke({"input": QUESTION})
display(Markdown(response_to_initial_question))

Reinforcement learning is a type of machine learning that involves an agent learning to make decisions in an environment in order to maximize a reward. It has a wide range of use cases across various domains. Here are a few examples:

1. Game playing: Reinforcement learning has been successfully applied to games like chess, Go, and poker, where the agent learns to make optimal moves and strategies by playing against itself or human opponents.

2. Robotics: Reinforcement learning can be used to train robots to perform complex tasks, such as grasping objects, walking, or flying. The agent learns through trial and error, receiving rewards or penalties based on its actions.

3. Autonomous vehicles: Reinforcement learning can help train self-driving cars to make decisions in real-time, such as navigating through traffic, following traffic rules, and avoiding accidents.

4. Recommendation systems: Reinforcement learning can be used to personalize recommendations for users based on their preferences and feedback. The agent learns to optimize the recommendations by observing user behavior and feedback.

5. Supply chain management: Reinforcement learning can optimize decision-making in supply chain management, such as inventory management, pricing, and demand forecasting, to maximize profitability and customer satisfaction.

6. Healthcare: Reinforcement learning can assist in personalized treatment planning, drug dosage optimization, and clinical decision-making. The agent learns from patient data and medical guidelines to make informed decisions.

7. Finance: Reinforcement learning can be used for algorithmic trading, portfolio management, and risk assessment. The agent learns to make trading decisions based on market conditions and historical data.

8. Energy management: Reinforcement learning can optimize energy consumption and distribution in smart grids, helping to balance supply and demand, reduce costs, and improve efficiency.

These are just a few examples, and the applications of reinforcement learning are continually expanding as researchers and practitioners explore its potential in various fields.

In [7]:
#Now let's ask a follow up question
printmd(chain.invoke({"input": FOLLOW_UP_QUESTION}))

Your prior question was "What was my prior question?"

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = ChatPromptTemplate.from_template(
"""
    {history}
    Human: {question}
    AI:
"""
)
chain = hist_prompt | llm | output_parser

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response_to_initial_question)

In [10]:
printmd(chain.invoke({"history":Conversation_history, "question": FOLLOW_UP_QUESTION}))

Your prior question was "Tell me some use cases for reinforcement learning."

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

From Langchain website:
    
A memory system needs to support two basic actions: reading and writing. Recall that every chain defines some core execution logic that expects certain inputs. Some of these inputs come directly from the user, but some of these inputs can come from memory. A chain will interact with its memory system twice in a given run.

    AFTER receiving the initial user inputs but BEFORE executing the core logic, a chain will READ from its memory system and augment the user inputs.
    AFTER executing the core logic but BEFORE returning the answer, a chain will WRITE the inputs and outputs of the current run to memory, so that they can be referred to in future runs.
    
So this process adds delays to the response, but it is a necessary delay :)

![image](https://python.langchain.com/assets/images/memory_diagram-0627c68230aa438f9b5419064d63cbbc.png)

In [13]:
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
index3_name = "cogsrch-index-books"
indexes = [index1_name, index2_name, index3_name]

In [15]:
# Create the function to retrieve the conversation

def get_session_history(session_id: str, user_id: str) -> CosmosDBChatMessageHistory:
    cosmos = CosmosDBChatMessageHistory(
        cosmos_endpoint=os.environ['AZURE_COSMOSDB_ENDPOINT'],
        cosmos_database=os.environ['AZURE_COSMOSDB_NAME'],
        cosmos_container=os.environ['AZURE_COSMOSDB_CONTAINER_NAME'],
        connection_string=os.environ['AZURE_COMOSDB_CONNECTION_STRING'],
        session_id=session_id,
        user_id=user_id
        )

    # prepare the cosmosdb instance
    cosmos.prepare_cosmos()
    return cosmos


In [16]:
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
    history_factory_config=[
        ConfigurableFieldSpec(
            id="user_id",
            annotation=str,
            name="User ID",
            description="Unique identifier for the user.",
            default="",
            is_shared=True,
        ),
        ConfigurableFieldSpec(
            id="session_id",
            annotation=str,
            name="Session ID",
            description="Unique identifier for the conversation.",
            default="",
            is_shared=True,
        ),
    ],
) | output_parser

In [17]:
# This is where we configure the session id and user id
random_session_id = "session"+ str(random.randint(1, 1000))
ramdom_user_id = "user"+ str(random.randint(1, 1000))

config={"configurable": {"session_id": random_session_id, "user_id": ramdom_user_id}}

In [22]:
config

{'configurable': {'session_id': 'session996', 'user_id': 'user213'}}

In [18]:
printmd(chain_with_history.invoke({"question": QUESTION}, config=config))

- Game playing: Reinforcement learning has been successfully applied to various games, such as chess, Go, and poker. It can learn optimal strategies and improve its performance through trial and error.
- Robotics: Reinforcement learning can be used to train robots to perform complex tasks, such as grasping objects, walking, or flying. The robots can learn through interaction with the environment and receive rewards or penalties based on their actions.
- Autonomous vehicles: Reinforcement learning can be used to train self-driving cars or drones. The vehicles can learn to navigate through traffic, make decisions, and avoid accidents by receiving rewards or penalties based on their actions.
- Recommendation systems: Reinforcement learning can be used to personalize recommendations for users. It can learn from user feedback and adapt the recommendations to improve user satisfaction.
- Resource management: Reinforcement learning can be used to optimize the allocation of resources in various domains, such as energy management, traffic control, or supply chain management. It can learn to make decisions that maximize efficiency and minimize costs.
- Healthcare: Reinforcement learning can be used to optimize treatment plans for patients. It can learn from patient data and medical guidelines to make personalized recommendations for treatments.
- Finance: Reinforcement learning can be used to optimize trading strategies in financial markets. It can learn to make decisions based on market data and maximize profits.
- Advertising: Reinforcement learning can be used to optimize online advertising campaigns. It can learn from user interactions and feedback to improve targeting and increase conversion rates.
- Control systems: Reinforcement learning can be used to optimize control systems in various domains, such as industrial processes, power grids, or smart buildings. It can learn to make decisions that optimize performance and energy efficiency.
- Natural language processing: Reinforcement learning can be used to train dialogue systems or chatbots. It can learn to generate responses that maximize user satisfaction and engagement.

In [19]:
# Remembers
printmd(chain_with_history.invoke({"question": FOLLOW_UP_QUESTION},config=config))

Your prior question was "Tell me some use cases for reinforcement learning".

In [20]:
# Remembers
printmd(chain_with_history.invoke(
    {"question": "Can you tell me a one line summary of our conversation?"},
    config=config))

Sure! We discussed various use cases for reinforcement learning, including game playing, robotics, autonomous vehicles, recommendation systems, resource management, healthcare, finance, advertising, control systems, and natural language processing.

In [21]:
try:
    printmd(chain_with_history.invoke(
    {"question": "Thank you very much!"},
    config=config))
except Exception as e:
    print(e)

You're welcome! If you have any more questions, feel free to ask.

#### Let's check our Azure CosmosDB to see the whole conversation


![CosmosDB Memory](./images/cosmos-chathistory.png)

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using CosmosDB.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, it searches for similar docs everytime, regardless of the input. This doesn't seem efficient, but regardless, we are very close to finish our first RAG-talk to your data Bot.




# NEXT
We know now how to do a Smart Search Engine that can power a chatbot!! great!

In the next notebook 6, we are going to build our first RAG bot. In order to do this we will introduce the concept of Agents.