# Understanding Memory in LLMs

In the previous Notebooks, we successfully explored how OpenAI models can enhance the results from Azure AI Search queries. 

However, we have yet to discover how to engage in a conversation with the LLM. With [Microsoft Copilot](http://chat.bing.com/), for example, this is possible, as it can understand and reference the previous responses.

There is a common misconception that LLMs (Large Language Models) have memory. This is not true. While they possess knowledge, they do not retain information from previous questions asked to them.

In this Notebook, our goal is to illustrate how we can effectively "endow the LLM with memory" by employing prompts and context.

In [1]:
import os
import random
from langchain_community.chat_message_histories import ChatMessageHistory, CosmosDBChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables import ConfigurableFieldSpec
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from operator import itemgetter
from typing import List

from IPython.display import Markdown, HTML, display  

def printmd(string):
    display(Markdown(string))

#custom libraries that we will use later in the app
from common.utils import CustomAzureSearchRetriever, get_answer
from common.prompts import DOCSEARCH_PROMPT_TEXT

from dotenv import load_dotenv
load_dotenv("credentials.env")

import logging

# Get the root logger
logger = logging.getLogger()
# Set the logging level to a higher level to ignore INFO messages
logger.setLevel(logging.WARNING)

In [2]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

### Let's start with the basics
Let's use a very simple example to see if the GPT model of Azure OpenAI have memory. We again will be using langchain to simplify our code 

In [3]:
QUESTION = "tell me chinese medicines that help fight covid-19"
FOLLOW_UP_QUESTION = "What was my prior question?"

In [4]:
COMPLETION_TOKENS = 1000
# Create an OpenAI instance
llm = AzureChatOpenAI(deployment_name=os.environ["GPT4o_DEPLOYMENT_NAME"], 
                      temperature=0.5, max_tokens=COMPLETION_TOKENS)

In [5]:
# We create a very simple prompt template, just the question as is:
output_parser = StrOutputParser()
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an assistant that give thorough responses to users."),
    ("user", "{input}")
])

In [6]:
# Let's see what the GPT model responds
chain = prompt | llm | output_parser
response_to_initial_question = chain.invoke({"input": QUESTION})
display(Markdown(response_to_initial_question))

During the COVID-19 pandemic, several Traditional Chinese Medicine (TCM) formulations were studied and used in China as part of an integrated approach to treatment, often alongside conventional Western medicine. It is important to note that while some Chinese medicines showed potential in alleviating symptoms and supporting recovery, **no herbal remedy has been proven to cure or prevent COVID-19**. Their use should always be under the guidance of a qualified healthcare professional, and not as a substitute for vaccination or evidence-based medical care.

### Commonly Used Chinese Medicines for COVID-19 (Based on Chinese Guidelines and Studies)

#### 1. **Lianhua Qingwen (连花清瘟胶囊/颗粒)**
- **Composition:** A patented formula with ingredients like Forsythia, Honeysuckle, Ephedra, Isatis root, and others.
- **Reported Effects:** May help relieve fever, cough, and fatigue; has some anti-inflammatory and antiviral properties.
- **Evidence:** Some clinical studies in China suggested symptom improvement, but robust international evidence is still limited.

#### 2. **Jinhua Qinggan Granules (金花清感颗粒)**
- **Composition:** Comprised of herbs such as Honeysuckle, Forsythia, Ephedra, and Licorice root.
- **Reported Effects:** Used for fever, cough, and sore throat in mild cases.
- **Evidence:** Included in Chinese national guidelines for mild COVID-19; some small studies suggest benefit for symptom relief.

#### 3. **Xuebijing Injection (血必净注射液)**
- **Composition:** An injectable TCM preparation made from Carthamus, Paeonia, Salvia, Angelica, and Chuanxiong.
- **Reported Effects:** Used in hospitals for severe cases to reduce inflammation and improve organ function.
- **Evidence:** Some studies suggest it may help reduce severity in critically ill patients when used alongside standard care.

#### 4. **Qingfei Paidu Decoction (清肺排毒汤)**
- **Composition:** A complex formula based on four classic TCM prescriptions, containing over 20 herbs such as Ephedra, Licorice, Almond, and Gypsum.
- **Reported Effects:** Used in China for a range of COVID-19 severities, especially in early and moderate stages.
- **Evidence:** Some observational studies in China reported symptom improvement and lower progression to severe disease, but high-quality trials are lacking.

#### 5. **Huashi Baidu Formula (化湿败毒方)**
- **Composition:** Contains herbs like Ephedra, Atractylodes, Pogostemon, and Licorice.
- **Reported Effects:** Used for moderate to severe cases, especially with symptoms of dampness and toxin accumulation (in TCM theory).
- **Evidence:** Used in designated hospitals; evidence is mainly from Chinese clinical experience and small studies.

---

### Important Notes

- **Scientific Evidence:** Most studies on these medicines are from China, often with small sample sizes and limited controls. They are not widely endorsed by international health authorities such as the WHO.
- **Regulation and Safety:** Some formulas may interact with other medications or have side effects. Always consult a qualified practitioner.
- **Prevention:** These medicines are *not* substitutes for vaccination, masks, or other proven preventive measures.

### References
- National Health Commission of the People’s Republic of China: Diagnosis and Treatment Protocol for COVID-19 (various editions)
- Review articles in journals such as *Phytomedicine* and *Frontiers in Pharmacology*
- World Health Organization statements on traditional medicine and COVID-19

---

**If you are considering using any Chinese medicine for COVID-19, consult a licensed healthcare provider and do not rely on herbal remedies alone for prevention or treatment.**

In [7]:
#Now let's ask a follow up question
printmd(chain.invoke({"input": FOLLOW_UP_QUESTION}))

I don’t have access to your previous questions or any prior conversation unless it’s included in this current chat session. If you’d like to refer to a previous question, please restate it or provide more details, and I’ll be happy to help!

As you can see, it doesn't remember what it just responded, sometimes it responds based only on the system prompt, or just randomly. This proof that the LLM does NOT have memory and that we need to give the memory as a a conversation history as part of the prompt, like this:

In [8]:
hist_prompt = ChatPromptTemplate.from_template(
"""
    {history}
    Human: {question}
    AI:
"""
)
chain = hist_prompt | llm | output_parser

In [9]:
Conversation_history = """
Human: {question}
AI: {response}
""".format(question=QUESTION, response=response_to_initial_question)

In [10]:
printmd(chain.invoke({"history":Conversation_history, "question": FOLLOW_UP_QUESTION}))

Your prior question was: "tell me chinese medicines that help fight covid-19"

**Bingo!**, so we now know how to create a chatbot using LLMs, we just need to keep the state/history of the conversation and pass it as context every time

## Now that we understand the concept of memory via adding history as a context, let's go back to our GPT Smart Search engine

From Langchain website:
    
A memory system needs to support two basic actions: reading and writing. Recall that every chain defines some core execution logic that expects certain inputs. Some of these inputs come directly from the user, but some of these inputs can come from memory. A chain will interact with its memory system twice in a given run.

    AFTER receiving the initial user inputs but BEFORE executing the core logic, a chain will READ from its memory system and augment the user inputs.
    AFTER executing the core logic but BEFORE returning the answer, a chain will WRITE the inputs and outputs of the current run to memory, so that they can be referred to in future runs.
    
So this process adds delays to the response, but it is a necessary delay :)

![image](./images/memory_diagram.png)

In [11]:
index1_name = "srch-index-files"
index2_name = "srch-index-csv"
index3_name = "srch-index-books"
indexes = [index1_name, index2_name, index3_name]

In [12]:
# Initialize our custom retriever 
retriever = CustomAzureSearchRetriever(indexes=indexes, topK=50, reranker_threshold=1)

**Prompt Template Definition**

If you check closely below, there is an optional variable in the `DOCSEARCH_PROMPT` called `history`. It is basically a placeholder were we will inject the conversation in the prompt so the LLM is aware of it before it answers.


In [13]:

DOCSEARCH_PROMPT = ChatPromptTemplate.from_messages(
    [
        ("system", DOCSEARCH_PROMPT_TEXT + "\n\nCONTEXT:\n{context}\n\n"),
        MessagesPlaceholder(variable_name="history", optional=True),
        ("human", "{question}"),
    ]
)


**Now let's add memory to it:**

In [14]:
store = {} # Our first memory will be a dictionary in memory

# We have to define a custom function that takes a session_id and looks somewhere
# (in this case in a dictionary in memory) for the conversation
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


In [15]:
# We use our original chain with the retriever but removing the StrOutputParser
chain = (
    {
        "context": itemgetter("question") | retriever, 
        "question": itemgetter("question"),
        "history": itemgetter("history")
    }
    | DOCSEARCH_PROMPT
    | llm
)

## Then we pass the above chain to another chain that adds memory to it

output_parser = StrOutputParser()

chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
) | output_parser

In [16]:
# This is where we configure the session id
config={"configurable": {"session_id": "abc123"}}

In [17]:
printmd(chain_with_history.invoke({"question": QUESTION}, config=config))

A variety of traditional Chinese medicines (TCM) and herbal formulas have been recommended and used to help fight COVID-19, often in combination with standard care. Some of the most frequently mentioned and studied Chinese medicines and formulas include:

1. **Qingfei Paidu Decoction** – This formula was recommended by the National Health Commission of China for the treatment of COVID-19 and has shown good clinical efficacy and potential in reducing symptoms, shortening treatment duration, and reducing complications[[7](https://doi.org/10.19540/j.cnki.cjcmm.20200219.501; https://www.ncbi.nlm.nih.gov/pubmed/32281335/)].

2. **Lianhua Qingwen Capsule** – A Chinese patent medicine, often included in treatment protocols for COVID-19, and is among the most commonly used Chinese patent drugs[[25](https://www.ncbi.nlm.nih.gov/pubmed/32268018/)].

3. **Maxing Shigan Decoction** – Frequently used in various treatment plans and considered a basic formulation for certain COVID-19 syndromes[[25](https://www.ncbi.nlm.nih.gov/pubmed/32268018/)].

4. **Yin Qiao Powder** – Another basic formulation for early-stage (Weifen syndrome) COVID-19[[25](https://www.ncbi.nlm.nih.gov/pubmed/32268018/)].

5. **Xuanbai Chengqi Decoction** – Commonly used in clinical practice for COVID-19[[25](https://www.ncbi.nlm.nih.gov/pubmed/32268018/)].

6. **Angong Niuhuang Pill** and **Xuebijing Injection** – Frequently used Chinese patent medicines for COVID-19[[25](https://www.ncbi.nlm.nih.gov/pubmed/32268018/)].

7. **Ma Xing Shi Gan Decoction (MXSGD)** – Widely applied in the clinical treatment of COVID-19, with mechanisms including reducing inflammation, suppressing cytokine storms, and regulating immune response[[22](https://doi.org/10.26355/eurrev_202003_20704; https://www.ncbi.nlm.nih.gov/pubmed/32271454/)].

8. **Shuang Huang Lian Kou Fu Ye** and other TCM combinations such as Bu Huan Jin Zheng Qi San with Da Yuan Yin, Xue Bi Jing Injection, and Qing Fei Pai Du Tang – These have been used and studied for their potential to inhibit viral proliferation and modulate immune responses[[1](https://doi.org/10.1101/2020.04.10.20060376)].

9. **High-frequency single herbs** – Some of the most frequently used herbs include Astragalus membranaceus, Lonicera japonica, Glycyrrhizae Radix et Rhizoma (licorice root), Armeniacae Semen Amarum (bitter apricot seed), Gypsum Fibrosum, Scutellariae Radix, and others[[6](https://doi.org/10.19540/j.cnki.cjcmm.20200220.502; https://www.ncbi.nlm.nih.gov/pubmed/32281332/)], [[25](https://www.ncbi.nlm.nih.gov/pubmed/32268018/)].

These Chinese medicines and formulas are typically used based on syndrome differentiation and stage of disease, with different prescriptions for prevention, early intervention, severe cases, and recovery. TCM aims to strengthen body resistance, clear heat, remove toxins, resolve phlegm, and support lung function[[9](https://doi.org/10.19540/j.cnki.cjcmm.20200225.501; https://www.ncbi.nlm.nih.gov/pubmed/32281333/)], [[14](https://doi.org/10.7501/j.issn.0253-2670.2020.04.007)], [[25](https://www.ncbi.nlm.nih.gov/pubmed/32268018/)].

It is important to note that while TCM has played a positive role in fighting COVID-19 in China, further large-scale clinical studies are needed to fully verify their effectiveness[[10](https://doi.org/10.21037/apm.2020.03.27; https://www.ncbi.nlm.nih.gov/pubmed/32233641/)].

**References:**
- [[1](https://doi.org/10.1101/2020.04.10.20060376)]
- [[6](https://doi.org/10.19540/j.cnki.cjcmm.20200220.502; https://www.ncbi.nlm.nih.gov/pubmed/32281332/)]
- [[7](https://doi.org/

In [18]:
# Remembers
printmd(chain_with_history.invoke({"question": FOLLOW_UP_QUESTION},config=config))

Your prior question was: "tell me chinese medicines that help fight covid-19"

In [19]:
# Remembers
printmd(chain_with_history.invoke({"question": "Thank you! Good bye"},config=config))

You're welcome! Good bye[[9](https://doi.org/10.19540/j.cnki.cjcmm.20200225.501; https://www.ncbi.nlm.nih.gov/pubmed/32281333/)].

## Using CosmosDB as persistent memory

Previously, we  added local RAM memory to our chatbot. However, it is not persistent, it gets deleted once the app user's session is terminated. It is necessary then to use a Database for persistent storage of each of the  user conversations, not only for Analytics and Auditing, but also if we wish to provide recommendations in the future. 

In the next notebook we are going to explain how to use an external Database (CosmosDB) to keep the state of the conversation.

# Summary
##### Adding memory to our application allows the user to have a conversation, however this feature is not something that comes with the LLM, but instead, memory is something that we must provide to the LLM in form of context of the question.

We added persitent memory using local RAM.

We also can notice that the current chain that we are using is smart, but not that much. Although we have given memory to it, many times it searches for similar docs everytime, regardless of the input. This doesn't seem efficient, but regardless, we are very close to finish our first RAG talk-to-your-data bot.

Note:The use of `RunnableWithMessageHistory` in this notebook is for example purposes. We will see later (on the next notebooks), that we recomend the use of memory state and graphs in order to inject memory into an bot. 

# NEXT
We know now how to do a Retrieval System that can power a decent chatbot!! great!

In the next notebook 6, we are going to build our first reasoning RAG bot. In order to do this we will introduce the concept of Agents.