When you interact with an LLM, naturally, it doesn't remember the previous messages.
We can overcome this by creating `memory`.
`LangChain` offers multiple types of `memory` management techniques.

# `ConversationBufferMemory`

In [1]:
from langchain.chains import ConversationChain
from langchain.chat_models import AzureChatOpenAI
from langchain.memory import ConversationBufferMemory

We initialize our `model` as normal, and initialize a `ConversationChain` with the model and add `memory`.

In [2]:
api_version = "2023-12-01-preview"
deployment_id = "gpt-35-turbo-16k"

In [3]:
chat = AzureChatOpenAI(model=deployment_id, temperature=0.0, api_version=api_version)
memory = ConversationBufferMemory()
convo = ConversationChain(
    memory=memory,
    llm=chat,
)

Note that we use the `predict` method with keyword argument `input`.
This is a result of using a `chain` instead of the `chat` as in the first lesson.

In [4]:
convo.predict(input="Hi, my name is Ian")

"Hello Ian! It's nice to meet you. How can I assist you today?"

In [5]:
convo.predict(input="What is 1+1?")

'1+1 is equal to 2.'

In [6]:
convo.predict(input="What is my name?")

'Your name is Ian.'

Because we are using `memory`, the model can remember our prior messages.
If we set `verbose=True` in the `ConversationChain` we can see more of what is happening under the hood.

In [7]:
memory = ConversationBufferMemory()
convo = ConversationChain(
    memory=memory,
    verbose=True,
    llm=chat,
)

In [8]:
convo.predict(input="Hi, my name is Ian")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Ian
AI:[0m

[1m> Finished chain.[0m


"Hello Ian! It's nice to meet you. How can I assist you today?"

Everything that is in <span style="color:green">green</span> is a part of the memory and any internal prompts.

In [9]:
convo.predict(input="What is 1+1?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Ian
AI: Hello Ian! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI:[0m

[1m> Finished chain.[0m


'1+1 is equal to 2.'

As the conversation grows, you can see the <span style="color:green">Current conversation</span> being updated with prior messages.
The prompt tells the LLM what has already been said, giving it context to answer follow-up questions.

In [10]:
convo.predict(input="What is my name?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Ian
AI: Hello Ian! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI: 1+1 is equal to 2.
Human: What is my name?
AI:[0m

[1m> Finished chain.[0m


'Your name is Ian.'

Hence why it knows what my name is 🙂.

## Side Notes

Here is what it looks like if we exclude `memory`.

In [11]:
chat = AzureChatOpenAI(model=deployment_id, temperature=0.0, api_version=api_version)
convo = ConversationChain(
    verbose=True,
    llm=chat,
)

In [12]:
convo.predict(input="Hi, my name is Ian")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Ian
AI:[0m

[1m> Finished chain.[0m


"Hello Ian! It's nice to meet you. How can I assist you today?"

In [13]:
convo.predict(input="What is 1+1?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Ian
AI: Hello Ian! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI:[0m

[1m> Finished chain.[0m


'1+1 is equal to 2.'

In [14]:
convo.predict(input="What is my name?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Ian
AI: Hello Ian! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI: 1+1 is equal to 2.
Human: What is my name?
AI:[0m

[1m> Finished chain.[0m


'Your name is Ian.'

It actually looks like the `memory` argument didn't need to be set 🤔.

In [15]:
ConversationChain?

[1;31mInit signature:[0m
[0mConversationChain[0m[1;33m([0m[1;33m
[0m    [1;33m*[0m[1;33m,[0m[1;33m
[0m    [0mname[0m[1;33m:[0m [0mOptional[0m[1;33m[[0m[0mstr[0m[1;33m][0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mmemory[0m[1;33m:[0m [0mlangchain_core[0m[1;33m.[0m[0mmemory[0m[1;33m.[0m[0mBaseMemory[0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mcallbacks[0m[1;33m:[0m [0mUnion[0m[1;33m[[0m[0mList[0m[1;33m[[0m[0mlangchain_core[0m[1;33m.[0m[0mcallbacks[0m[1;33m.[0m[0mbase[0m[1;33m.[0m[0mBaseCallbackHandler[0m[1;33m][0m[1;33m,[0m [0mlangchain_core[0m[1;33m.[0m[0mcallbacks[0m[1;33m.[0m[0mbase[0m[1;33m.[0m[0mBaseCallbackManager[0m[1;33m,[0m [0mNoneType[0m[1;33m][0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mcallback_manager[0m[1;33m:[0m [0mOptional[0m[1;33m[[0m[0mlangchain_core[0m[1;33m.[0m[0mcallbacks[0m[1;33m.[0m[0mbase[0m[1;33m.[0m[0

The docs say that the `memory` argument should subclass from `langchain_core.memory.BaseMemory` but defaults to `None.

In
```python
from inspect import getsource

print(getsource(ConversationChain))
```

Out
```python
class ConversationChain(LLMChain):
    """Chain to have a conversation and load context from memory.

    Example:
        .. code-block:: python

            from langchain.chains import ConversationChain
            from langchain_community.llms import OpenAI

            conversation = ConversationChain(llm=OpenAI())
    """

    memory: BaseMemory = Field(default_factory=ConversationBufferMemory)
    """Default memory store."""
    prompt: BasePromptTemplate = PROMPT
    """Default conversation prompt to use."""

    input_key: str = "input"  #: :meta private:
    output_key: str = "response"  #: :meta private:

    class Config:
        """Configuration for this pydantic object."""

        extra = Extra.forbid
        arbitrary_types_allowed = True

    @classmethod
    def is_lc_serializable(cls) -> bool:
        return False

    @property
    def input_keys(self) -> List[str]:
        """Use this since so some prompt vars come from history."""
        return [self.input_key]

    @root_validator()
    def validate_prompt_input_variables(cls, values: Dict) -> Dict:
        """Validate that prompt input variables are consistent."""
        memory_keys = values["memory"].memory_variables
        input_key = values["input_key"]
        if input_key in memory_keys:
            raise ValueError(
                f"The input key {input_key} was also found in the memory keys "
                f"({memory_keys}) - please provide keys that don't overlap."
            )
        prompt_variables = values["prompt"].input_variables
        expected_keys = memory_keys + [input_key]
        if set(expected_keys) != set(prompt_variables):
            raise ValueError(
                "Got unexpected prompt input variables. The prompt expects "
                f"{prompt_variables}, but got {memory_keys} as inputs from "
                f"memory, and {input_key} as the normal input key."
            )
        return values
```

Looking at the source code, `memory` is set to:

```python
memory: BaseMemory = Field(default_factory=ConversationBufferMemory)
```

This means that if we don't provide an argument for `memory`, a `ConversationBufferMemory` instance is used by default.
Here is an example function.

In [16]:
from collections import Counter

from pydantic.fields import Field
from pydantic.main import BaseModel

class Example(BaseModel):
    """This is an example class.
    
    It highlights how the `default_factory` argument works in `Field`.
    """

    counter: dict = Field(default_factory=Counter)


eg = Example()
print(eg.counter)  # Counter()

Counter()


## Continuing

If we look at the `memory` instance we can view what has been added to its `buffer`.

In [17]:
memory = ConversationBufferMemory()
convo = ConversationChain(
    memory=memory,
    llm=chat,
)
convo.predict(input="Hi, my name is Ian")
convo.predict(input="What is 1+1?")
convo.predict(input="What is my name?")

print(memory.buffer)

Human: Hi, my name is Ian
AI: Hello Ian! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI: 1+1 is equal to 2.
Human: What is my name?
AI: Your name is Ian.


We can also view the memory as a dictionary of variables, with `history` holding the prior messages in string format.

In [18]:
memory.load_memory_variables({})

{'history': "Human: Hi, my name is Ian\nAI: Hello Ian! It's nice to meet you. How can I assist you today?\nHuman: What is 1+1?\nAI: 1+1 is equal to 2.\nHuman: What is my name?\nAI: Your name is Ian."}

The `memory` doesn't have to be modified by an LLM -- we can update it ourselves.

In [19]:
memory = ConversationBufferMemory()
memory.save_context(inputs={"input": "Hi"}, outputs={"output": "What's up"})

print(memory.buffer)

Human: Hi
AI: What's up


In [20]:
memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up"}

As we add to more inputs and outputs to the context, the buffer is updated.

In [21]:
memory.save_context(inputs={"input": "Not much, just hanging"}, outputs={"output": "Cool"})

memory.load_memory_variables({})

{'history': "Human: Hi\nAI: What's up\nHuman: Not much, just hanging\nAI: Cool"}

An LLM is stateless and doesn't remember the conversation by default.
Chat history appears in the context provided to the LLM with the `memory` object.
As the conversation becomes long, the cost increases.
`LangChain` provides convenient `memory` types to handle this.

# `ConversationBufferWindowMemory`

The `ConversationBufferWindowMemory` stores context within a "window".

In [22]:
from langchain.memory import ConversationBufferWindowMemory

In [23]:
memory = ConversationBufferWindowMemory(k=1)

The variable `k` tells the `memory` how many message inputs to keep in context.

In [24]:
memory.save_context(inputs={"input": "Hi"}, outputs={"output": "What's up"})

In [25]:
print(memory.buffer)

Human: Hi
AI: What's up


In [26]:
memory.save_context(inputs={"input": "Not much, just hanging"}, outputs={"output": "Cool"})

In [27]:
print(memory.buffer)

Human: Not much, just hanging
AI: Cool


Notice that we lose the prior message inputs.

In [28]:
memory = ConversationBufferWindowMemory(k=1)
convo = ConversationChain(
    memory=memory,
    llm=chat,
)

In [29]:
convo.predict(input="Hi, my name is Ian")

"Hello Ian! It's nice to meet you. How can I assist you today?"

In [30]:
convo.predict(input="What is 1+1?")

'1+1 is equal to 2.'

In [31]:
convo.predict(input="What is my name?")

"I'm sorry, but I don't have access to personal information about individuals unless it has been shared with me in the course of our conversation."

Because the memory drops off after each input with `k=1`, it makes sense that the LLM can't tell us my name.

Running with `verbose=True` we can see the context provided.

In [32]:
memory = ConversationBufferWindowMemory(k=1)
convo = ConversationChain(
    memory=memory,
    verbose=True,
    llm=chat,
)

In [33]:
convo.predict(input="Hi, my name is Ian")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Ian
AI:[0m

[1m> Finished chain.[0m


"Hello Ian! It's nice to meet you. How can I assist you today?"

In [34]:
convo.predict(input="What is 1+1?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Ian
AI: Hello Ian! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI:[0m

[1m> Finished chain.[0m


'1+1 is equal to 2.'

In [35]:
convo.predict(input="What is my name?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: What is 1+1?
AI: 1+1 is equal to 2.
Human: What is my name?
AI:[0m

[1m> Finished chain.[0m


"I'm sorry, but I don't have access to personal information about individuals unless it has been shared with me in the course of our conversation."

# `ConversationTokenBufferMemory`

The `ConverstationTokenBufferMemory` operates similar to the `ConversationBufferWindowMemory`, but limits tokens instead of input messages.
This is beneficial as we are charged by the token count rather than the message count.

In [36]:
!pip show tiktoken

Name: tiktoken
Version: 0.5.1
Summary: tiktoken is a fast BPE tokeniser for use with OpenAI's models
Home-page: 
Author: Shantanu Jain
Author-email: shantanu@openai.com
License: MIT License

Copyright (c) 2022 OpenAI, Shantanu Jain

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN 

In [37]:
from langchain.memory import ConversationTokenBufferMemory

In [38]:
import langchain

langchain.__version__

'0.0.354'

In [39]:
memory = ConversationTokenBufferMemory(llm=chat, max_token_limit=50)

We have to specify the `llm` argument because it determines how tokens are counted.

In [40]:
memory.save_context(inputs={"input": "AI is what?!"}, outputs={"output": "Amazing!"})
memory.save_context(inputs={"input": "Backpropagation is what?"}, outputs={"output": "Beautiful!"})
memory.save_context(inputs={"input": "Chatbots are what?"}, outputs={"output": "Charming!"})

memory.load_memory_variables({})

NotImplementedError: get_num_tokens_from_messages() is not presently implemented for model gpt-35-turbo-16k.See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.

The `ConversationTokenBufferMemory` doesn't currently work for `"gpt-35-turbo-16k"`, so I'm moving on.

# `ConversationSummaryBufferMemory`

Instead of determining a hard cutoff for the LLM context, we can summarize prior messages allowing us to have "infinite" memory, though with some loss the longer the conversation goes.

In [41]:
from langchain.memory import ConversationSummaryBufferMemory

We highlight the power of summarization by using a "long" string as our initial context.

In [42]:
schedule = """There is a meeting at 8am with your product team.
You will need your powerpoint presenation prepared.
9am-12pm have time to work on your LangChain
project which will go quickly because LangChain is such a powerful tool
At Noon, lunch at the italian restaurant with a customer who is driving
from over an hour away to meet you to understand the latest in AI.
Be sure to bring your laptop to show the latest LLM demo."""

In [43]:
memory = ConversationSummaryBufferMemory(llm=chat, max_token_limit=100)

memory.save_context(inputs={"input": "Hello"}, outputs={"output": "What's up"})
memory.save_context(inputs={"input": "Not much, just hanging"}, outputs={"output": "Cool"})
memory.save_context(inputs={"input": "What is on the schedule today?"}, outputs={"output": f"{schedule}"})

memory.load_memory_variables({})

NotImplementedError: get_num_tokens_from_messages() is not presently implemented for model gpt-35-turbo-16k.See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.

Apparently `ConversationSummaryBufferMemory` doesn't work with `"gpt-35-turbo-16k"` either 😑.