# Example usages of **chat** and **service** interfaces
This notebook shows simple usages of MemoryScope's **chat** and **service** interfaces, along with its main features.

To run this notebook, follow the **Installation** guidelines in Readme, and start the Docker image.


## Initiate a MemoryScope instance
First, we need to specify a configuration and initiate a MemoryScope instance.

In [1]:
import sys
sys.path.append(".")
from memoryscope import MemoryScope, Arguments
arguments = Arguments(
    language="cn",
    human_name="用户",
    assistant_name="AI",
    logger_to_screen=True,
    memory_chat_class="api_memory_chat",
    generation_backend="dashscope_generation",
    generation_model="qwen2-72b-instruct",
    embedding_backend="dashscope_embedding",
    embedding_model="text-embedding-v2",
    rank_backend="dashscope_rank",
    rank_model="gte-rerank",
    enable_ranker=False,
    worker_params={"get_reflection_subject": {"reflect_num_questions": 3}}
)

ms = MemoryScope(arguments=arguments)


2024-08-02 18:28:24 INFO MainThread logger:64] logger=memoryscope_20240802_182824 is inited.
2024-08-02 18:28:24 INFO MainThread config_manager:39] init by arguments mode: {'language': 'cn', 'thread_pool_max_workers': 5, 'logger_name': 'memoryscope', 'logger_name_time_suffix': '%Y%m%d_%H%M%S', 'logger_to_screen': True, 'memory_chat_class': 'api_memory_chat', 'chat_stream': None, 'human_name': '用户', 'assistant_name': 'AI', 'consolidate_memory_interval_time': 1, 'reflect_and_reconsolidate_interval_time': 15, 'worker_params': {'get_reflection_subject': {'reflect_num_questions': 3}}, 'generation_backend': 'dashscope_generation', 'generation_model': 'qwen2-72b-instruct', 'generation_params': {}, 'embedding_backend': 'dashscope_embedding', 'embedding_model': 'text-embedding-v2', 'embedding_params': {}, 'rank_backend': 'dashscope_rank', 'rank_model': 'gte-rerank', 'rank_params': {}, 'es_index_name': 'memory_index', 'es_url': 'http://localhost:9200', 'retrieve_mode': 'dense', 'enable_ranker': 

## Chat without memory
ms comes with a default **chat** interface, so it's very easy to start chatting, just as what you'll do with any LLM chatbot.

In [2]:
memory_chat = ms.default_memory_chat
memory_chat.run_service_operation("delete_all")
response = memory_chat.chat_with_memory(query="我的爱好是弹琴。")
print("回答1：\n" + response.message.content)

2024-08-02 18:28:26 INFO MainThread base_workflow:95] ----- workflow.read_message.print.begin -----
2024-08-02 18:28:26 INFO MainThread base_workflow:101] stage0: read_message
2024-08-02 18:28:26 INFO MainThread base_workflow:112] ----- workflow.read_message.print.end -----
2024-08-02 18:28:26 INFO MainThread memory_scope_service:83] service=MemoryScopeService init operation=read_message
2024-08-02 18:28:26 INFO MainThread base_workflow:95] ----- workflow.retrieve_memory.print.begin -----
2024-08-02 18:28:26 INFO MainThread base_workflow:101] stage0: set_query
2024-08-02 18:28:26 INFO MainThread base_workflow:106] stage1: extract_time | retrieve_obs_ins
2024-08-02 18:28:26 INFO MainThread base_workflow:106] stage2: - | semantic_rank
2024-08-02 18:28:26 INFO MainThread base_workflow:101] stage3: fuse_rerank
2024-08-02 18:28:26 INFO MainThread base_workflow:112] ----- workflow.retrieve_memory.print.end -----
2024-08-02 18:28:26 INFO MainThread memory_scope_service:83] service=MemoryScope

回答1：
那真是一个优雅的爱好！弹琴不仅能够陶冶情操，还能提升音乐素养和手指灵活性。你最喜欢弹奏哪种类型的曲子呢？


You can choose to chat with or without multi-round conversation contexts. However, since *Memory Consolidation* has not been called, there's no memory pieces in the system yet.

In [3]:
response = memory_chat.chat_with_memory(query="你知道我有什么乐器爱好吗？")
print("回答2：\n" + response.message.content)
response = memory_chat.chat_with_memory(query="你知道我有什么乐器爱好吗？",
                                        history_message_strategy=None)
print("回答3：\n" + response.message.content)

2024-08-02 18:28:29 INFO MainThread timer:75] ----- workflow.retrieve_memory.begin -----
2024-08-02 18:28:29 INFO MainThread timer:75] ----- worker.set_query.begin -----
2024-08-02 18:28:29 INFO MainThread base_worker:145] ----- worker.set_query.end ----- cost=0.6740ms
2024-08-02 18:28:30 INFO MainThread timer:75] ----- worker.fuse_rerank.begin -----
2024-08-02 18:28:30 INFO MainThread base_worker:145] ----- worker.fuse_rerank.end ----- cost=0.8800ms
2024-08-02 18:28:30 INFO MainThread base_workflow:167] ----- workflow.retrieve_memory.end ----- cost=384.5589ms
2024-08-02 18:28:30 INFO MainThread timer:75] ----- workflow.read_message.begin -----
2024-08-02 18:28:30 INFO MainThread timer:75] ----- worker.read_message.begin -----
2024-08-02 18:28:30 INFO MainThread base_worker:145] ----- worker.read_message.end ----- cost=0.2897ms
2024-08-02 18:28:30 INFO MainThread base_workflow:167] ----- workflow.read_message.end ----- cost=1.0922ms
2024-08-02 18:28:30 INFO MainThread api_memory_chat:1

回答2：
您提到过的爱好是弹琴，所以我了解到您对弹奏乐器，特别是钢琴有兴趣。是否还有其他乐器爱好呢？


2024-08-02 18:28:32 INFO MainThread timer:75] ----- worker.fuse_rerank.begin -----
2024-08-02 18:28:32 INFO MainThread base_worker:145] ----- worker.fuse_rerank.end ----- cost=0.8247ms
2024-08-02 18:28:32 INFO MainThread base_workflow:167] ----- workflow.retrieve_memory.end ----- cost=303.6480ms
2024-08-02 18:28:32 INFO MainThread api_memory_chat:186] chat_messages=[Message(role='system', role_name='', content='你是一个名为MemoryScope的智能助理，请用中文简洁地回答问题。当前时间是2024-08-02 18:28:32 周五。', time_created=1722594512, memorized=False, meta_data={}), Message(role='user', role_name='用户', content='你知道我有什么乐器爱好吗？', time_created=1722594512, memorized=False, meta_data={})]


回答3：
作为MemoryScope，我无法直接获取或记忆您的个人信息，包括您的乐器爱好，除非您之前已告知我。如果您愿意分享，我可以帮助您记录或提供与乐器爱好相关的信息。


## Memory Consolidation
Now, we do a bit more chatting and try out **Memory Consolidation**.

In [None]:
response = memory_chat.chat_with_memory(query="我在阿里巴巴干活")
print("回答4：\n" + response.message.content)
response = memory_chat.chat_with_memory(query="今天下午吃什么水果好？")
print("回答5：\n" + response.message.content)
response = memory_chat.chat_with_memory(query="我喜欢吃西瓜。")
print("回答6：\n" + response.message.content)
response = memory_chat.chat_with_memory(query="帮我写一句给朋友的生日祝福语，简短一点。")
print("回答7：\n" + response.message.content)

2024-08-02 18:28:35 INFO MainThread timer:75] ----- workflow.retrieve_memory.begin -----
2024-08-02 18:28:35 INFO MainThread timer:75] ----- worker.set_query.begin -----
2024-08-02 18:28:35 INFO MainThread base_worker:145] ----- worker.set_query.end ----- cost=0.4082ms
2024-08-02 18:28:36 INFO MainThread timer:75] ----- worker.fuse_rerank.begin -----
2024-08-02 18:28:36 INFO MainThread base_worker:145] ----- worker.fuse_rerank.end ----- cost=1.5481ms
2024-08-02 18:28:36 INFO MainThread base_workflow:167] ----- workflow.retrieve_memory.end ----- cost=384.7449ms
2024-08-02 18:28:36 INFO MainThread timer:75] ----- workflow.read_message.begin -----
2024-08-02 18:28:36 INFO MainThread timer:75] ----- worker.read_message.begin -----
2024-08-02 18:28:36 INFO MainThread base_worker:145] ----- worker.read_message.end ----- cost=0.9732ms
2024-08-02 18:28:36 INFO MainThread base_workflow:167] ----- workflow.read_message.end ----- cost=1.9441ms
2024-08-02 18:28:36 INFO MainThread api_memory_chat:1

回答4：
太好了，阿里巴巴是一家知名的国际企业，涉及电子商务、零售、互联网和技术等多个领域。在阿里巴巴工作，您可能参与到了推动数字经济发展和创新的前沿工作中。希望您在那边的工作经历丰富且充满成就感！如果有任何职业发展或相关问题，欢迎随时探讨。


2024-08-02 18:28:42 INFO MainThread timer:75] ----- worker.fuse_rerank.begin -----
2024-08-02 18:28:42 INFO MainThread base_worker:145] ----- worker.fuse_rerank.end ----- cost=1.0219ms
2024-08-02 18:28:42 INFO MainThread base_workflow:167] ----- workflow.retrieve_memory.end ----- cost=1879.3590ms
2024-08-02 18:28:42 INFO MainThread timer:75] ----- workflow.read_message.begin -----
2024-08-02 18:28:42 INFO MainThread timer:75] ----- worker.read_message.begin -----
2024-08-02 18:28:42 INFO MainThread base_worker:145] ----- worker.read_message.end ----- cost=0.2019ms
2024-08-02 18:28:42 INFO MainThread base_workflow:167] ----- workflow.read_message.end ----- cost=0.8912ms
2024-08-02 18:28:42 INFO MainThread api_memory_chat:186] chat_messages=[Message(role='system', role_name='', content='你是一个名为MemoryScope的智能助理，请用中文简洁地回答问题。当前时间是2024-08-02 18:28:42 周五。', time_created=1722594522, memorized=False, meta_data={}), Message(role='user', role_name='用户', content='我的爱好是弹琴。', time_created=1722594506,

In [None]:
memory_service = ms.default_memory_service
memory_service.init_service()
result = memory_service.consolidate_memory()
print(f"consolidate_memory result={result}")

**Memory Consolidation** extracted 3 *observations* out of the 7 chat messages from the user, with the uninformative ones being filtered out.

We try more cases to test its time awareness and the ability to filter out fictitious contents from the user.

In [None]:
response = memory_chat.chat_with_memory(query="假如我去京东工作，前景怎么样？")
print("回答8：\n" + response.message.content)
response = memory_chat.chat_with_memory(query="记一下，下周我准备去北京出差")
print("回答9：\n" + response.message.content)
response = memory_chat.chat_with_memory(query="我同学李亚平现在在亚马逊工作，他下个月回上海，我要和他吃个饭")
print("回答10：\n" + response.message.content)
response = memory_chat.chat_with_memory(query="小亮是我最好的朋友，他决定去山西上大学。以这个为开头写一个80字的微剧本。")
print("回答11：\n" + response.message.content)
response = memory_chat.chat_with_memory(query="SMCI是什么公司，做什么的？")
print("回答12：\n" + response.message.content)

In [None]:
result = memory_service.consolidate_memory()
print(f"consolidate_memory result={result}")

We can see **Memory Consolidation** successfully filtered out fictitious contents, and shows good time sensitivity.

We try more cases to test its resolution of conflicting contents.

In [None]:
response = memory_chat.chat_with_memory(query="今天下午吃什么水果好？")
print("回答13：\n" + response.message.content)
response = memory_chat.chat_with_memory(query="西瓜确实不错，但是我也喜欢吃芒果。我今天想吃芒果。")
print("回答14：\n" + response.message.content)
response = memory_chat.chat_with_memory(query="我最近跳槽去了美团。")
print("回答15：\n" + response.message.content)
response = memory_chat.chat_with_memory(query="我还喜欢吃桃子和苹果。")
print("回答16：\n" + response.message.content)
response = memory_chat.chat_with_memory(query="我不喜欢吃椰子。")
print("回答17：\n" + response.message.content)
response = memory_chat.chat_with_memory(query="我准备下个月去海南冲浪。")
print("回答18：\n" + response.message.content)
response = memory_chat.chat_with_memory(query="明天是我生日。")
print("回答19：\n" + response.message.content)

In [None]:
result = memory_service.consolidate_memory()
print(f"consolidate_memory result={result}")

## Reflection and Re-Consolidation
Now, we have accumulated enough new *observations* in the system, so we can call **Reflection and Re-Consolidation**, let's see what will it get.

In [None]:
result = memory_service.reflect_and_reconsolidate()
print(f"consolidate_memory result={result}")

## Low response-time (RT) for the user
Here we test the RT of MemoryScope system for the user. Specifically, we test the difference of RT when responding with and without retrieving memory pieces from the system.

In [None]:
import time

start_time = time.time()
response = memory_chat.chat_with_memory(query="你知道我的乐器爱好是什么吗？",
                                        history_message_strategy=None)
end_time = time.time()
total_time = end_time - start_time
print("使用记忆检索\n回答20：\n" + response.message.content + f"\n 耗时：{total_time}秒\n")

start_time = time.time()
response = memory_chat.chat_with_memory(query="你知道我接下去的一个月内有什么计划吗？",
                                        history_message_strategy=None)
end_time = time.time()
total_time = end_time - start_time
print("使用记忆检索\n回答21：\n" + response.message.content + f"\n 耗时：{total_time}秒\n")

memory_chat.run_service_operation("delete_all")
start_time = time.time()
response = memory_chat.chat_with_memory(query="你知道我的乐器爱好是什么吗？",
                                        history_message_strategy=None)
end_time = time.time()
total_time = end_time - start_time
print("不使用记忆检索\n回答20：\n" + response.message.content + f"\n 耗时：{total_time}秒\n")

start_time = time.time()
response = memory_chat.chat_with_memory(query="你知道我接下去的一个月内有什么计划吗？\n",
                                        history_message_strategy=None)
end_time = time.time()
total_time = end_time - start_time
print("不使用记忆检索\n回答21：\n" + response.message.content + f"\n 耗时：{total_time}秒")

We can see responding with retrieving memory pieces from MemoryScope does not increase RT.