<a href="https://colab.research.google.com/github/rastringer/promptcraft/blob/main/langchain_memory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Memory


In many applications, it is essential LLMs remember prior interactions and context.

Langchain provides several helper functions to manage and manipulate previous chat messages.


In [None]:
! pip install --upgrade google-cloud-aiplatform
! pip install shapely<2.0.0
! pip install langchain
! pip install pypdf
! pip install pydantic==1.10.8
! pip install langchain[docarray]
! pip install typing-inspect==0.8.0 typing_extensions==4.5.0
# Hugging Face transformers necessary for ConversationTokenBufferMemory
! pip install transformers

This optional cell wraps outputs, which can make them easier to digest.

In [None]:
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

In [None]:
# Automatically restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

If you're on Colab, authenticate via the following cell

In [None]:
from google.colab import auth
auth.authenticate_user()

### Initialize the SDK

In [None]:
# Add your project id and the project's region
PROJECT_ID = "<...>"
REGION = "<...>"

from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=REGION)

In [None]:
# Utils
import time
from typing import List

# Langchain
import langchain
from pydantic import BaseModel

print(f"LangChain version: {langchain.__version__}")

# Vertex AI
from google.cloud import aiplatform
from langchain.chat_models import ChatVertexAI
from langchain.llms import VertexAI
from langchain.schema import HumanMessage, SystemMessage
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory


print(f"Vertex AI SDK version: {aiplatform.__version__}")

In [None]:
# LLM model
llm = VertexAI(
    model_name="text-bison@001",
    max_output_tokens=256,
    temperature=0.1,
    top_p=0.8,
    top_k=40,
    verbose=True,
)

### ConversationBufferWindowMemory
Keeps a list of the interactions of the conversation over time. It only uses the last K interactions. This can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large

In [None]:
from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=3)

memory.save_context({"input": "Hi"},
                    {"output": "How are you?"})
memory.save_context({"input": "Fine thanks"},
                    {"output": "Great"})

memory.load_memory_variables({})

### ConversationTokenBufferMemory
Keeps a buffer of recent interactions in memory, and uses token length rather than number of interactions to determine when to flush interactions.

In [None]:
from langchain.memory import ConversationTokenBufferMemory

memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=100)
memory.save_context({"input": "All alone, she dreams of the stars!"},
                    {"output": "As she should!"})
memory.save_context({"input": "Baking cookies today?"},
                    {"output": "Behold the cookies!"})
memory.save_context({"input": "Chatbots everywhere?"},
                    {"output": "Certainly!"})

In [None]:
memory.load_memory_variables({})

In this example, we experiment with summarising the conversation at `max_token_limit`.

In [None]:
from langchain.chains import ConversationChain

conversation_with_summary = ConversationChain(
    llm=llm,
    # We set a very low max_token_limit for the purposes of testing.
    memory=ConversationTokenBufferMemory(llm=llm, max_token_limit=60),
    verbose=True,
)
conversation_with_summary.predict(input="Hi, how are you?")

### ConversationSummaryBufferMemory

Ensures conversational memory endures by summarizing old interactions to help inform chat within a new window. It uses token length to determine when to 'flush' the interactions.

In [None]:
conversation_with_summary.predict(input="I'm working on learning C++")


In [None]:
conversation_with_summary.predict(input="What's the best book to help me?")

In [None]:
# Notice the buffer here is updated and clears the earlier exchanges
conversation_with_summary.predict(input="Wish me luck!")

In [None]:
conversation_with_summary.predict(input="Would knowing C help me?")

### ConversationSummaryBufferMemory

Ensures conversational memory endures by summarizing old interactions to help inform chat within a new window. It uses token length to determine when to 'flush' the interactions.

In [None]:
from langchain.memory import ConversationSummaryBufferMemory

# create a long string
activities = "I'm due at the pool for a training session \
with the swim coach. \
Then it's straight out on the bike into the mountains for a 60-miler. \
There will be speed reps in between the mountain climbs. \
The p.m. workout will be ten miles @ 60-70% effort. \
I should need to check the bike tyres and sleep well tonight to prepare for \
the training session."

memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=30)
memory.save_context({"input": "Hello"}, {"output": "What's up"})
memory.save_context({"input": "Not much, just hanging"},
                    {"output": "Cool"})
memory.save_context({"input": "What training is on today?"},
                    {"output": f"{activities}"})

In [None]:
memory.load_memory_variables({})

In [None]:
messages = memory.chat_memory.messages
previous_summary = ""
memory.predict_new_summary(messages, previous_summary)

In [None]:
conversation = ConversationChain(
    llm=llm,
    memory = memory,
    verbose=True
)

In [None]:
conversation.predict(input="Hi, what's up?")

In [None]:
conversation.predict(input="Not much, resting while I can")

In [None]:
conversation.predict(input="What should I do to prepare for the training session?")

In [None]:
conversation.predict(input="What does the run session look like?")

In [None]:
# The memory keeps the storage of the conversation
# up to the specified 30 token limit
memory.load_memory_variables({})

### Summary

In this notebook, we explored various approaches to memory in conversations.

* ConversationBufferWindowMemory

* ConversationSummaryBufferMemory

* ConversationTokenBufferMemory
