## Usage for MemoRAG

In [None]:
# initialize MemoRAG

from memorag import MemoRAG

pipe = MemoRAG(
    mem_model_name_or_path="TommyChien/memorag-mistral-7b-inst",
    ret_model_name_or_path="BAAI/bge-m3",
    gen_model_name_or_path="mistralai/Mistral-7B-Instruct-v0.2",
    cache_dir="path_to_model_cache",  # to specify local model cache directory (optional)
    access_token="hugging_face_access_token"  # to specify local model cache directory (optional)
)


This code block initializes the MemoRAG pipeline using specific model paths for memory (mem_model_name_or_path), retrieval (ret_model_name_or_path), and generation (gen_model_name_or_path). It also sets a cache directory and provides an access token to authenticate the usage of models from Hugging Face or another model-sharing platform.

### Initialize Memory
This code block demonstrates how to use the `memorize` function from the MemoRAG pipeline. It begins by loading the contents of a text file (in this case, `harry_potter.txt`), processes the text by memorizing it, and stores the results in a specified directory. In this directory, three key files are created:

- **memory.bin**: This file stores the key-value (KV) cache of the memory model, which enables fast retrieval of previously processed information.
- **index.bin**: This file contains the dense embeddings for the text corpus, facilitating efficient retrieval of relevant passages.
- **chunks.json**: This file holds the passages or chunks derived from the input context, which are used during retrieval.

If the `save_dir` parameter is set, the method saves the preprocessed data (i.e., memory, embeddings, and chunks) to disk. This allows for much faster future operations on the same context, as loading the cached data from disk is significantly more efficient than reprocessing and encoding the context from scratch. This caching mechanism is particularly useful when working with large texts or datasets that are frequently accessed.

In [2]:
import time
start = time.time()
test_txt = open("harry_potter.txt").read()
pipe.memorize(test_txt, save_dir="cache/harry_potter_mistral/", print_stats=True)
print(f"Prefilling takes {round(time.time()-start,2)} second for the full book.")

Memory file size: 4.77 GB
Encoded context length: 122591 tokens
Number of chunks in retrieval corpus: 268
Prefilling takes 86.35 second for the full book.


In [3]:
start = time.time()
test_txt = open("harry_potter.txt").read()
pipe.load("cache/harry_potter_mistral/", print_stats=True)
print(f"Loading from cache takes {round(time.time()-start,2)} for the full book.")

Memory file size: 4.77 GB
Number of chunks in retrieval corpus: 268
Loading from cache takes 3.83 for the full book.


### Performing tasks
Currently, MemoRAG primarily focuses on two key tasks: question-answering (QA) and summarization.


In [5]:
query = "how many times does the chamber be opened in Harry Potter?"
res = pipe(context=test_txt, query=query, task_type="qa", max_new_tokens=256)
print(f"Using memory to produce the answer: \n{res} \n\n")
res = pipe(context=test_txt, query=query, task_type="memorag", max_new_tokens=256)
print(f"Using MemoRAG to produce the answer: \n{res}")

Using memory to produce the answer: 
The article does not provide information on how many times the Chamber of Secrets has been opened in Harry Potter's case. 


Using MemoRAG to produce the answer: 
The Chamber of Secrets was opened at least twice in Harry Potter's world, once fifty years prior when a monster attacked students, killing one, and again during Harry's time at Hogwarts.


As demonstrated in the previous examples, relying solely on memory to answer a query can result in an inferior response. This is because the memory serves as a compact and somewhat imprecise representation of the context, leading to a less accurate response. However, when utilizing MemoRAG, the memory model is able to recall key answer clues that guide the retriever to locate more relevant and precise evidence from the original context. This results in a much higher quality response, as the retrieved evidence is more directly aligned with the query.

In [6]:
res = pipe(context=test_txt, task_type="summarize", max_new_tokens=512)
print(f"Using MemoRAG to summarize the full book:\n {res}")

Using MemoRAG to summarize the full book:
 In "Harry Potter and the Chamber of Secrets" by J.K. Rowling, Harry Potter is spending his summer break at the Dursleys' house, feeling isolated and unwanted. On his twelfth birthday, he is ignored by his family and longs to return to Hogwarts. One day, while in the garden, Harry encounters a house-elf named Dobby who warns him not to go back to school as there is a plot to cause terrible things at Hogwarts. Harry is skeptical but Dobby shows him a wad of his unopened letters from friends Ron and Hermione, which Dobby had been keeping to prevent Harry from feeling forgotten. Harry becomes angry and insists on reading his letters, causing Dobby to flee.

Harry manages to sneak away from the Dursleys and meets Ron and Hermione at the Burrow, the Weasley family home. They plan to visit Hagrid, the Hogwarts gamekeeper, on the weekend. However, Harry is haunted by the memory of Tom Riddle, the past Dark Lord, who had left a diary with Harry and had

### Using APIs as generator

In [1]:
from memorag import Agent, MemoRAG
api_dict = {
    "endpoint": "",
    "api_version": "2024-02-15-preview",
    "api_key": ""
}
model = "gpt-35-turbo-16k"
source = "azure"

## using deepseek models
# model = ""
# source = "deepseek"
# api_dict = {
#     "base_url": "",
#     "api_key": ""
# }

## using openai models#
# model = ""
# source = "openai"
# api_dict = {
#     "api_key": ""
# }


agent = Agent(model, source, api_dict)
print(agent.generate("hi!")) #  test API

pipe = MemoRAG(
    mem_model_name_or_path="TommyChien/memorag-qwen2-7b-inst",
    ret_model_name_or_path="BAAI/bge-m3",
    cache_dir="path_to_model_cache",  # to specify local model cache directory (optional)
    customized_gen_model=agent,
)
pipe.load("cache/harry_potter_qwen/", print_stats=True)

  from .autonotebook import tqdm as notebook_tqdm


You are using gpt-35-turbo-16k from azure


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


['Hello! How can I assist you today?']
[2024-09-05 16:47:28,238] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)


Loading checkpoint shards: 100%|██████████| 4/4 [00:09<00:00,  2.47s/it]
  return self.fget.__get__(instance, owner)()


Memory file size: 1.66 GB
Number of chunks in retrieval corpus: 268


In [2]:
query = "How are the mutual relationships between the main characters? "
test_txt = open("harry_potter.txt").read()

res = pipe(context=test_txt, query=query, task_type="memorag", max_new_tokens=256)
print(f"Using MemoRAG with GPT-3.5 to produce the answer: \n{res}")



Using MemoRAG with GPT-3.5 to produce the answer: 
The mutual relationships between the main characters are supportive and loyal. They care for each other and work together to solve problems and overcome challenges.


## Usage for Memory model

In [11]:
from memorag import Memory

memo_model = Memory(
    "TommyChien/memorag-qwen2-7b-inst",
    cache_dir="path_to_model_cache",  # to specify local model cache directory (optional)
    beacon_ratio=4)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00,  2.66s/it]


In [12]:
import time
start = time.time()
context = open("harry_potter.txt").read()
memo_model.memorize(context)
memo_model.save("cache/harry_potter_qwen/memory.bin")
print(f"Prefilling takes {round(time.time()-start,2)} second for the full book.")

Prefilling takes 20.06 second for the full book.


In [13]:
memo_model.reset()  # delete memory
start = time.time()
memo_model.load("cache/harry_potter_qwen/memory.bin")
print(f"Loading from cache takes {round(time.time()-start,2)} for the full book.")

Loading from cache takes 0.8 for the full book.


In [17]:
query = "How are the mutual relationships between the main characters? "

res = memo_model.answer(query)
print("Using memory to answer the query:\n", res)

Using memory to answer the query:
 Harry, Ron, and Hermione have strong friendships, with Harry and Ron being particularly close. They support each other through difficult times, such as when they are trying to solve the mystery of the Chamber of Secrets. Hermione is also shown to be fiercely loyal to Harry and Ron, often going out of her way to help them.


In [28]:
res = memo_model.recall(query)
res = [line for line in res.split("\n")[:-1] if line]
res = [f"{i+1}: {line}" for i,line in enumerate(res)]
res = "\n".join(res)
print("Using memory to recall text clues to support the evidence retrieval:\n", res)

Using memory to recall text clues to support the evidence retrieval:
 1: Harry Potter and Ron Weasley are best friends. They have been through many adventures together and are always there for each other.
2: Hermione Granger is Harry's best friend. She has helped him countless times and they share a strong bond.
3: Ron Weasley is Harry's cousin and they are like brothers. They have a close relationship and often tease each other.
4: Hogwarts is their home away from home and they miss it dearly when they are apart.
5: Harry Potter, Ron Weasley, and Hermione Granger are the main trio in the Harry Potter series. They are loyal to each other and often work together to solve problems.
6: Harry Potter and Ron Weasley are like brothers. They have a close relationship and often tease each other.
7: Hermione Granger is Harry's best friend. She has helped him countless times and they share a strong bond.
8: Ron Weasley is Harry's cousin and they are like brothers. They have a close relationship 

In [27]:
res = memo_model.rewrite(query)
res = [f"{i+1}: {line}" for i,line in enumerate(res.split("\n")[:-1]) if line]
res = "\n".join(res)
print("Using memory to rewrite the input query into more specifc surrogate queries:\n", res)

Using memory to rewrite the input query into more specifc surrogate queries:
 1: What are the interactions and relationships between Harry Potter, Ron Weasley, Hermione Granger, and Draco Malfoy?
2: How do Harry Potter and Ron Weasley support each other during challenging situations?
3: What role does Hermione Granger play in solving mysteries and puzzles?


## Usage for Memory-augmented retrieval

In [None]:
from memorag import MemoRAG

pipe = MemoRAG(
    mem_model_name_or_path="TommyChien/memorag-qwen2-7b-inst",
    ret_model_name_or_path="BAAI/bge-m3",
    cache_dir="path_to_model_cache",  # to specify local model cache directory (optional)
    access_token="hugging_face_access_token"  # to specify local model cache directory (optional)
)

In [2]:
import time
start = time.time()
test_txt = open("harry_potter.txt").read()
pipe.memorize(test_txt, save_dir="cache/harry_potter_qwen/", print_stats=True)
print(f"Prefilling takes {round(time.time()-start,2)} second for the full book.")

Memory file size: 1.66 GB
Encoded context length: 122591 tokens
Number of chunks in retrieval corpus: 268
Prefilling takes 23.85 second for the full book.


In [4]:
start = time.time()
test_txt = open("harry_potter.txt").read()
pipe.load("cache/harry_potter_qwen/", print_stats=True)
print(f"Loading from cache takes {round(time.time()-start,2)} for the full book.")

Memory file size: 1.66 GB
Number of chunks in retrieval corpus: 268
Loading from cache takes 1.3 for the full book.


In [7]:
query = "How are the mutual relationships between the main characters? "

clues = pipe.mem_model.recall(query).split("\n")
clues = [q for q in clues if len(q.split()) > 3]
print(clues)



['Harry Potter and Ron Weasley are best friends. They have been through many adventures together and are always there for each other.', "Hermione Granger is Harry's best friend. She has helped him countless times and they share a strong bond.", "Ron Weasley is Harry's cousin and they are like brothers. They have a close relationship and often tease each other.", 'Hogwarts is their home away from home and they miss it dearly when they are apart.', 'Harry Potter, Ron Weasley, and Hermione Granger are the main trio in the Harry Potter series. They are loyal to each other and often work together to solve problems.', 'Harry Potter and Ron Weasley are like brothers. They have a close relationship and often tease each other.', "Hermione Granger is Harry's best friend. She has helped him countless times and they share a strong bond.", "Ron Weasley is Harry's cousin and they are like brothers. They have a close relationship and often tease each other.", 'Harry Potter, Ron Weasley, and Hermione 

In [12]:
retrieved_passages = pipe._retrieve(clues)
print("\n======\n".join(retrieved_passages[:3]))

He missed Hogwarts so much it was like having a constant
stomachache. He missed the castle, with its secret passageways and
ghosts, his classes (though perhaps not Snape, the Potions master), the
mail arriving by owl, eating banquets in the Great Hall, sleeping in his
four-poster bed in the tower dormitory, visiting the gamekeeper,
Hagrid, in his cabin next to the Forbidden Forest in the grounds, and,
especially, Quidditch, the most popular sport in the wizarding world
(six tall goal posts, four flying balls, and fourteen players on
broomsticks).

All Harry's spellbooks, his wand, robes, cauldron, and top-of-the-line
Nimbus Two Thousand broomstick had been locked in a cupboard
under the stairs by Uncle Vernon the instant Harry had come home.
What did the Dursleys care if Harry lost his place on the House
Quidditch team because he hadn't practiced all summer? What was it
to the Dursleys if Harry went back to school without any of his
homework done? The Dursleys were what wizards called 