In [None]:
import os
import openai

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

## ConversationBufferMemory

In [None]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

Let's create a OpenAI-based chat...

In [None]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(temperature=0.0)

In [None]:


memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=False
)

In [None]:
conversation.predict(input="Hi, my name is Andrew")

In [None]:
conversation.predict(input="What is 1+1")

In [None]:
conversation.predict(input="What is my name?")

You can run the previous conversation setting verbose=True to see what LangChain is generating as the prompt.

Let's check what of the memory buffer content

In [None]:
print(memory.buffer)

Large Language Models are 'stateless'

Each transaction is independent

Chatbots appear to have memory by providing the full conversation as 'context'

As the conversation becomes long, the context becomes long as well, so you start to consume a lot of computational resources and the majority of the LLM provides charges you by token, what makes this process also more expensive

LangChain provides several kinds of 'memory' to store and accumulate the conversation 

Let's looks at a different memory provided by LangChain, the ConversationBufferWindowMemory

In [None]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=1)

Adding manually conversation context to memory

In [None]:
memory.save_context({"input": "Hi"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"}, {"output": "Cool"})

In [None]:
memory.load_memory_variable({})

That's a nice feature, because it allows you to keep track of the last few inputs. In practice you would use a higher k value to keep more context.

It prevents the memory usage going up without limits.

Let's re-run the previous conversation using the new memory

In [None]:
llm = ChatOpenAI(temperature=0.0)
memory = ConversationBufferWindowMemory(k=1)
conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=False
)

In [None]:
conversation.predict(input="Hi, my name is Andrew")

In [None]:
conversation.predict(input="What is 1+1?")

In [None]:
conversation.predict(input="What is my name?")

Because we used k=1, it remembers only the last input in our conversation. Set the verbose=True to see the conversation context.

Let's experiment another memory, the ConversationTokenBufferMemory. It allows you to keep a memory of the last max_token_limit. As tokens are used by many LLMs providers to calculate the charges, this memory provides a direct map between the prompt and how much it will cost each call.

In [None]:
from langchain.memory import ConversationalTokenBufferMemory
from langchain.llms import OpenAI

llm = ChatOpenAI(temperature=0.0)

memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=50)

memory.save_context({"input": "AI is what?"}, {"output": "Amazing!"})
memory.save_context({"input": "Backpropagation is what?"}, {"output": "Beautiful!"})
memory.save_context({"input": "Chatbots are what?"}, {"output": "Charming!"})

The conversation above uses A, B, C at the beginning statement to make it easier to keep track of them. If we check what was stored, we see almost the full conversation:

In [None]:
memory.load_memory_variables({})

The last type of memory we will se is the ConversationSummaryBufferMemory. Instead of keeping a fixed amount of statements or tokens, it creates a summary of the conversation os far and send it as the context. Le's see one example:

In [None]:
from langchain.memory import ConversationSummaryBufferMemory

schedule = "There is a meeting at 8am with your product team. \
    You will need your powerpoint presentation prepared. \
        9am-12pm have time to work on your LangChain \
            project which will go quickly because LangChain is such a powerful tool. \
            at Noon, lunch at the italian restaurant with a customer who is interested in implement a LLM application and drove \
                from over an hour away to meet you to understand the latest updates. \
                    Be sure to bring your laptop to show the latest LLM demo."

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=100)

memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"}, {"output": "Cool!"})
memory.save_context({"input": "What is on the schedule today?"}, {"output": f"{schedule}"})

Let's take a look in the memory variables.

In [None]:
memory.load_memory_variable({})

Let's create a conversation as before

In [None]:
conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

In [None]:
conversation.predict(input="What would be a good demo to show?")

In [None]:
memory.load_memory_variables({})

## Additional Memory Types

* Vector data memory: stores text (from conversation or elsewhere) in a vector database and retrieves the most relevant blocks of text.
* Entity memories: Using an LLM, it remembers details about specific entities

you can also use multiple memories at one time. E.g. conversation memory + entity memory to recall individuals

You can also store the conversation in a conventional databse (such as key-value store or SQL)
