# Long-term Conversation without GPU

**Long-term, open-domain conversations** over multiple sessions present significant challenges, as they require the system to retain past events and user preferences to deliver coherent and personalized responses. We explore the impact of different **memory granularities** and present two key findings:

- *Turn-level*, *session-level*, and *summarization-based* methods all exhibit limitations. 

- The redundancy in natural language introduces noise, hindering precise retrieval. *LLMLingua-2*, originally designed for prompt compression, can serve as an effective denoising method to enhance memory retrieval accuracy.

Building on these insights, we propose to construct the memory bank at segment level by introducing a conversation **Se**gmentation model that partitions long-term conversations into topically coherent segments, while applying **Com**pression based denoising on memory units to enhance memory retrieval.

In [7]:
import os

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

In [8]:
import json

conversation_history = json.load(open("example.json", "r"))

requests = [
    'Implement a function to find the number of "lucky" ternary tree using the given definition.'
]

In [9]:
from secom import SeCom

# For light-weight memory management without GPU, use fast config which uses BM25 retriever without compressor.
memory_manager_seg = SeCom(
    granularity="segment", config_path="../secom/configs/fast.yaml"
)
memory_seg = memory_manager_seg.get_memory(
    requests, conversation_history, retrieve_topk=1
)

print(memory_seg["retrieved_texts"][0])

0-th session is segmented to 9 segments
1-th session is segmented to 8 segments
2-th session is segmented to 4 segments
3-th session is segmented to 4 segments
4-th session is segmented to 8 segments
Compressor not initialized, reload compress_rate to 1.0
<Turn 0>: [human]: A binary tree is full if all of its vertices have either zero or two children. Let B_n denote the number of full binary trees with n vertices. Implement a function to find B_n.
[bot]: To find the number of full binary trees with \(n\) vertices, denoted as \(B_n\), we can use a dynamic programming approach or directly apply the formula derived from Catalan numbers, given that \(n\) must be odd for a full binary tree (since every node has either 0 or 2 children, and thus, for \(n\) nodes, there are \(n-1\) edges, leading to an even number of total nodes and edges, which means \(n\) itself must be odd).

The \(n\)th Catalan number is given by:
\[ C_n = \frac{1}{n+1}\binom{2n}{n} \]

Since a full binary tree with \(n\) 

In [10]:
memory_manager_turn = SeCom(
    granularity="turn", config_path="../secom/configs/fast.yaml"
)
memory_turn = memory_manager_turn.get_memory(
    requests, conversation_history, retrieve_topk=1
)
print(memory_turn["retrieved_texts"][0])

Compressor not initialized, reload compress_rate to 1.0
<Turn 0>: [human]: A binary tree is full if all of its vertices have either zero or two children. Let B_n denote the number of full binary trees with n vertices. Implement a function to find B_n.
[bot]: To find the number of full binary trees with \(n\) vertices, denoted as \(B_n\), we can use a dynamic programming approach or directly apply the formula derived from Catalan numbers, given that \(n\) must be odd for a full binary tree (since every node has either 0 or 2 children, and thus, for \(n\) nodes, there are \(n-1\) edges, leading to an even number of total nodes and edges, which means \(n\) itself must be odd).

The \(n\)th Catalan number is given by:
\[ C_n = \frac{1}{n+1}\binom{2n}{n} \]

Since a full binary tree with \(n\) vertices corresponds to the \((n-1)/2\)th Catalan number (because a full binary tree with \(n\) vertices has \((n-1)/2\) internal nodes, each having exactly 2 children), we can find \(B_n\) using the 

## Start a conversation with SeCom

SeCom is versatile and can be adapted for various models and specific requirements in memory management. Users can specify different model names and configurations as needed for their particular use case.

In [11]:
import sys

sys.path.append("../secom")
from utils import OpenAILLM

llm = OpenAILLM()
prompt_template = """
Response to the request baesd on the conversation history. 

Conversation history:
{conversation_history}

Request:
{request}

"""

In [12]:
# using SeCom
relevant_history = memory_seg["retrieved_texts"][0]
prompt = prompt_template.format(
    conversation_history=relevant_history, request=requests[0]
)
response_seg = llm(prompt)
print(response_seg)

Given the definition of a "lucky" ternary tree where each vertex has either zero or one child, we can conclude that the structure remains similar to that of a "lucky" binary tree. In this case, a "lucky" ternary tree would also behave like a linked list, where each node has at most one child.

Therefore, just like with the "lucky" binary trees, the number of "lucky" ternary trees with \(n\) vertices, denoted as \(T_n\), is also straightforward to determine. For any given \(n\), there is exactly one way to construct a "lucky" ternary tree under this definition: start with a root node and add one child successively until \(n-1\) children have been added (for a total of \(n\) vertices). Consequently, \(T_n = 1\) for all \(n \geq 1\).

Here is a simple Python function to return the number of "lucky" ternary trees with \(n\) vertices:

```python
def T_n(n):
    if n < 1:
        return "n must be at least 1"
    return 1  # There is exactly one "lucky" ternary tree for any n >= 1.

# Exampl

In [13]:
# using Turn-level memory management
relevant_history = memory_turn["retrieved_texts"][0]
prompt = prompt_template.format(
    conversation_history=relevant_history, request=requests[0]
)
response_turn = llm(prompt)
print(response_turn)

To define a "lucky" ternary tree, we should first clarify the properties of such a tree. A ternary tree is a tree in which each node has either zero or three children. If we assume a similar approach to find the number of lucky ternary trees as we did for full binary trees, we can derive a formula based on the properties of a ternary tree.

For a ternary tree with \(n\) vertices, we can relate it to the concept of Catalan numbers as well. A lucky ternary tree with \(n\) vertices has \((n-1)/3\) internal nodes, assuming \(n\) must be congruent to \(1 \mod 3\) (since each internal node contributes three children).

The number of lucky ternary trees can be derived using a modified form of Catalan numbers. Specifically, if \(n\) is the number of vertices, the formula can be expressed as:

\[ T_n = C_{\frac{n-1}{3}} = \frac{1}{\frac{n+2}{3}} \binom{n-1}{\frac{n-1}{3}} \]

Here's a Python function to calculate the number of lucky ternary trees \(T_n\):

```python
from math import comb

def T

## Incoperate the newly evolved interaction turn to SeCom

Users can use `update_memory()` to incoperate the newly evolved user-bot interaction turn into the memory bank to serve infinitely long conversation.

In [None]:
import json
from secom import SeCom

memory_manager = SeCom(granularity="segment", config_path="../secom/configs/fast.yaml")

conversation_history = json.load(open("example.json", "r"))
requests = [
    "What advancements in molecular biology are expected to occur in the next century?"
]

memory = memory_manager.get_memory(requests, conversation_history, retrieve_topk=1)

relevant_history = memory["retrieved_texts"][0]
prompt = prompt_template.format(
    conversation_history=relevant_history, request=requests[0]
)
response = llm(prompt)

# update memory bank
new_turn = f"[Human]: {requests[0]}\n\n[Bot]: {response}"
memory_manager.update_memory(new_turn)
print(memory_manager.memory_bank[-1])

requests = [
    'Implement a function to find the number of "lucky" ternary tree using the given definition.'
]
memory = memory_manager.get_memory(requests, retrieve_topk=1)

relevant_history = memory["retrieved_texts"][0]
prompt = prompt_template.format(
    conversation_history=relevant_history, request=requests[0]
)
response = llm(prompt)

# update memory bank
new_turn = f"[Human]: {requests[0]}\n\n[Bot]: {response}"
memory_manager.update_memory(new_turn)
print(memory_manager.memory_bank[-1])