# 1.3 Long-term memories

Nova has a built in memory system that uses retrieval-augmented generation to feed additional information to the model it has memorized. This notebook shows you how you can give the model access to these memories.

Run this code so python can find the scripts. This is not required when importing Nova from outside the root folder.

In [None]:
import sys
from pathlib import Path
module_path = Path().absolute().parent.parent
if str(module_path) not in sys.path:
    sys.path.append(str(module_path))

As usual, the first step is to set up the LLM.

In [None]:
from nova import *

nova = Nova()

inference_engine = InferenceEngineLlamaCPP()
conditioning = LLMConditioning(
    model="bartowski/Qwen2.5-7B-Instruct-1M-GGUF",
    file="*Q8_0.gguf"
)

nova.configure_llm(inference_engine=inference_engine, conditioning=conditioning)
nova.apply_config_llm()

nova.load_tools()

To give the LLM access to memories, we will need to create a MemoryConfig object. Similar to the LLMConditioning, this will store all parameters required to access memories.

In [None]:
memory_config = MemoryConfig(
    retrieve_memories=True
)

Now when running inference on the LLM, just parse the memory_config object to give the LLM access to memories.

In [None]:
conversation = Conversation()

message = "Your message here"
user_message = Message(author="user", content=message)
conversation.add_message(user_message)

llm_response = nova.run_llm(conversation=conversation, memory_config=memory_config)

Storing memories is enabled by default. It is one of the default tools the LLM has access to. Note that the LLM can not store new memories if tools were not loaded or "load_default_tools" is False. More on tools [here](1.2%20tool%20use.ipynb).

### How memory retrieval works

Memories use a system called "Retrieval-augmented-generation". If the LLM decides to store a new memory it will first be converted into a text embedding which is just a high-dimensional vector. You can think of a text embedding as representing the meaning of a text. This embedding is then stored in a database. When the user now prompts the LLM, their message is split by sentence and each sentence is individually converted to text embeddings. Using a similarity search, the system tries to find entries that are close in meaning to the given text. If a result is found that surpasses the threshold, it and its surrounding entries are parsed to the LLM. This is commonly known as "neighbourhood retrieval" and it serves to parse more potentially important context to the LLM. This is important to know so you understand the parameters of the MemoryConfig object:  
- retrieve_memories: Wether to even retrieve any memories.
- num_results: Only use up to x results above the threshold.
- search_area: How many entries before and after the result should also be parsed to the LLM.
- cosine_threshold: The similarity score an entry must surpass to be considered a result.