# MemGPT implementation  

MemGPT implementation given in LangGraph tutorials, inspired by the paper cited in the [README](../README.md)

In [14]:
from langgraph.store.memory import InMemoryStore
in_memory_store = InMemoryStore()

In [None]:
import uuid
#Memories are namespaced by a tuple, which in this specific example will be (<user_id>, "memories")
user_id = '1'
namespace_for_memory = (user_id, 'memories')
#We use the store.put method to save memories to our namespace in the store. When we do this, we specify 
#the namespace and a key-value pair for the memory. The key is a unique identifier, and the value
#is the memory itself (a dictionary)
memory_id = str(uuid.uuid4())
print(memory_id)
memory_1 = {"food_preference: " : "I like pizza"}
#Wrong: the memory is associated to an id, by doing so, you are overwriting the previous memory!
#Generate a new id (memory_id) each time
memory_2 = {"Best activity: ": "I like to play basketball"}
in_memory_store.put(namespace_for_memory, memory_id, memory_1)
in_memory_store.put(namespace_for_memory, memory_id, memory_2)

e4d1d95c-0c2e-493a-a067-0a4393475f3b


In [None]:
#We can read out memories in our namespace using store.search method
#This method takes the namespace and returns a list of memories for that user
memories = in_memory_store.search(namespace_for_memory)
for mem in memories:
    print(mem.dict())

[Item(namespace=['1', 'memories'], key='e4d1d95c-0c2e-493a-a067-0a4393475f3b', value={'Best activity: ': 'I like to play basketball'}, created_at='2025-03-21T10:35:44.279694+00:00', updated_at='2025-03-21T10:35:44.279694+00:00', score=None)]

# Semantic search

In [19]:
from dotenv import load_dotenv
load_dotenv()


from langchain.embeddings import init_embeddings

store = InMemoryStore(
    index={
        "embed": init_embeddings("openai:text-embedding-3-small"),  # Embedding provider
        "dims": 1536,                              # Embedding dimensions
        "fields": ["food_preference", "$"]              # Fields to embed
    }
)

In [None]:
# Find memories about food preferences
# (This can be done after putting memories into the store)
memories = store.search(
    namespace_for_memory,
    query="What does the user like to eat?",
    limit=3  # Return top 3 matches
)
memories[-1] #Obviously empty because we didn't put any memory in the store
#We just defined the fields we want to embed, but we didn't put any memory in the store

[]

In [None]:
# Store with specific fields to embed
store.put(
    namespace_for_memory,
    str(uuid.uuid4()),
    {
        "food_preference": "I love Italian cuisine",
        "context": "Discussing dinner plans"
    },
    index=["food_preference"]  # Only embed "food_preferences" field, context cannot be searched semantically
)
#You can save in another embedding also the context by doing: index=["food_preference", "context"]
#Is it useful? It depends, if statements are very generic, no.

# Store without embedding (still retrievable, but not searchable)
store.put(
    namespace_for_memory,
    str(uuid.uuid4()),
    {"system_info": "Last updated: 2024-01-01"},
    index=False
)

# Deploy Semantic search in a graph  

Still working on a chatbot-style Graph, I didn't look yet at how to make it iterative. I am only trying to introduce the concept of memory and a context window that is token aware.

[Semantic search](https://langchain-ai.github.io/langgraph/cloud/deployment/semantic_search/)  

[Deploy application](https://langchain-ai.github.io/langgraph/cloud/deployment/setup_pyproject/#specify-dependencies)  

[Move to LangSmith](https://langchain-ai.github.io/langgraph/cloud/deployment/cloud/#create-new-deployment)  

[Save memories](https://python.langchain.com/docs/versions/migrating_memory/long_term_memory_agent/)  

[How to use tools](https://js.langchain.com/docs/how_to/tool_calling/#:~:text=Chat%20models%20that%20support%20tool%20calling%20features%20implement,tool%20schemas%20in%20its%20calls%20to%20the%20LLM.)


When creating any LangGraph graph, you can set it up to persist its state by adding a checkpointer when compiling the graph. When you compile graph with a checkpointer, the checkpointer saves a checkpoint of the graph state at every super-step. In this case, **the entire history** is passed to the LLM, making it easy to go over the context length.

What we need, on the other hand, is a prompt that is token aware and that is able to perform semantic search. Thus, in order to save memories, we can use InMemoryStore where data are stored in main memory (RAM) or on a database (not really necessary). 

We can provide a tool to the LLM to store relevant memories when it finds it is necessary. Then, when answering to a new prompt, it has the possibility to perform semantic search.   
By following the approach of MemGPT, we divide the context window into three different sections:   
1. System instructions;
2. Working context;
3. FIFO Queue;

The system instructions are automatically managed by LangGraph when binding a tool to the agent, passing the LLM something like:   

```
You have access to the following tools:

Tool name: search_memory
Description: Look up relevant memories using semantic search.

Input parameters:
- query (string): 
- limit (integer): 

You can call a tool by responding with a JSON object like:
{
  "tool": "search_memory",
  "tool_input": {
    "query": "your question",
    "limit": 3
  }
}
```  
We have to first understand what is the dimension of the context window and which model we want to use. Then, System informations can be measured precisely when we will have all the tools (Web search, memory and PCAP reader). Thus said, we have to decide which percentage of the remaining context window we want to assign to the FIFO queue containing the entire history of the conversation and which percentage of the context window. **I still have to understand how to limit the semantic search done by the LLM in memory.**  
A possible idea could be:  
1. Measure System information once you have all tools: it's fixed, and we call it total info;
2. At the beginning, the FIFO queue has a dimension that is the same as (context window)-total info
3. When we reach 85-90% of the FIFO capacity, we flush it back to 60%, save entirely what we removed from the FIFO into storage and assign a 20% of context window for data retrieved by the storage;
4. Repeat the process when the FIFO reaches again 80% of capacity

Problem: how do I manage this? I have to save info on my own, without requiring the LLM to do it, because it is based on the capacity of the FIFO. Then, I have to tell him to retrieve data from the db if necessary ONLY WHEN THERE IS SOMETHING STORED, but if I bind a tool for semantic search, then it will always be able to search in the db, even if empty.
You can dynamically bind tools to the LLM:   
```python  
tools = []
if memory_store.has_data():
    tools.append(search_tool)

llm = llm.bind_tools(tools)
```  
