Skip to content

Chroma memory resets when store is initialized #1289

@Josh-XT

Description

@Josh-XT

Describe the bug
A clear and concise description of what the bug is.

To Reproduce

agent_name = "OpenAI"
memories_dir = os.path.join(os.getcwd(), "agents", agent_name, "memories")
chroma_client = ChromaMemoryStore(
    persist_directory=memories_dir,
    client_settings=Settings(
        chroma_db_impl="chromadb.db.duckdb.PersistentDuckDB",  # also tried 'duckdb+parquet' but it did not work at all.
        persist_directory=memories_dir,
        anonymized_telemetry=False,
    ),
)

# After initializing ChromaMemoryStore, all memories for that are wiped rather than looking at persist dir.

memories_exist = await chroma_client.does_collection_exist_async("memories")
print(f"Memories Exist? {memories_exist}")
if not memories_exist:
    print("Creating memories collection")
    await chroma_client.create_collection_async(collection_name="memories")
    memories = await chroma_client.get_collection_async(collection_name="memories")
else:
    memories = await chroma_client.get_collection_async(collection_name="memories")
    print("Retrieved memories collection") # <- This never happens on the first attempt

I also tested adding a memory before trying to reinitialize and came up with the same thing. As soon as I initialize ChromaMemoryStore, it says this:

2023-05-31 09:55:59,483 | INFO | loaded in 0 embeddings
2023-05-31 09:55:59,484 | INFO | loaded in 0 collections
2023-05-31 09:55:59,484 | INFO | Persisting DB to disk, putting it in the save folder: /home/josh/josh/Repos/AGiXT/agixt/agents/OpenAI/memories

Expected behavior
It should be initializing the connection with the persisted data instead of only saying it is persisting.

Screenshots
On first go around, it acts like it is creating something that never existed.
image

image
If I run the await chroma_client.does_collection_exist_async("memories") line, I can see it turn to true after running the 3 blocks, but if I run the one with the ChromaMemoryStore, it doesn't pull persisted data.

Desktop (please complete the following information):

  • OS: Linux (Pop!_OS 22.04)
  • IDE: VSCode
  • NuGet Package Version [e.g. 0.1.0]

Additional context
Maybe I'm doing this wrong, happy to be corrected if anyone has an idea.

Add a memory:

from hashlib import sha256
from datetime import datetime
from semantic_kernel.memory.memory_record import MemoryRecord

description = "Test memory"
external_source_name = "Test memory"
content = "Test memory"

record = MemoryRecord(
    is_reference=False,
    id=sha256((content + datetime.now().isoformat()).encode()).hexdigest(),
    text=content,
    timestamp=datetime.now().isoformat(),
    description=description,
    external_source_name=external_source_name,  # URL or File path
    embedding=await embedder(content),
)

await chroma_client.upsert_async(
    collection_name="memories",
    record=record,
)

Find the memory

results = await chroma_client.get_nearest_matches_async(
    collection_name="memories",
    embedding=await embedder(content),
    limit=1,
    min_relevance_score=0.1,
)
context = []
for memory, score in results:
    context.append(memory._text)
print(context)

My embedder function is just:

OpenAITextEmbedding(
            model_id="text-embedding-ada-002",
            api_key=self.AGENT_CONFIG["settings"]["OPENAI_API_KEY"],
            log=logging,
        ).generate_embeddings_async

Metadata

Metadata

Labels

pythonPull requests for the Python Semantic Kernel

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions