Skip to content

Memcortex is a lightweight semantic memory layer for LLMs using open source embedding model and Weaviate for vector storage and retrieval.

Notifications You must be signed in to change notification settings

sobowalebukola/memcortex

Repository files navigation

Memcortex

Persistent Memory Layer for LLMs (Memory-RAG)

Memcortex is a Proof of Concept (PoC) designed to equip conversational agents and LLM applications with persistent, long-term memory. By implementing a Memory-RAG (Retrieval-Augmented Generation) architecture, Memcortex allows agents to transcend context-window limitations, enabling them to recall past interactions and specific data points indefinitely.

Read the Deep Dive on Medium


Contents

  • README (this file)
  • Architectural diagram (Mermaid + ASCII)
  • Package structure
  • How to start

Project Overview

Memcortex stores user/application memories as both text and vectors in Weaviate and exposes a memory manager + middleware that:

  1. Embeds incoming text using nomic-embed-text embeddings.
  2. Stores memories in a Memory_idx class on Weaviate.
  3. Runs vector searches to retrieve top‑K relevant memories for a user.
  4. Injects retrieved memories into the prompt before it reaches the LLM.
  5. Optionally persists new memories asynchronously.

This pattern is ideal for building chatbots, agents, and personalization layers that must "remember" details across sessions.

memcortex drawio

Architecture (Mermaid)

flowchart LR
  A[User] -->|POST /chat| B(API Server)
  B --> C{Memory Middleware}
  C -->|retrieve| D[Weaviate Vector Store]
  D -->|top-K| C
  C -->|inject memories| E[LLM Handler]
  E -->|call LLM API| F[Ollama / Custom LLM]
  F -->|response| E
  E -->|save message| G(Background Save Worker)
  G --> D
  subgraph Infra
    D
    F
  end
Loading

ASCII Diagram

User -> API Server (/chat)
      -> Memory Middleware:
           - Embed user query via Ollama
           - Query Weaviate vector index (top-K)
           - Re-rank / filter / format
           - Inject into prompt
      -> LLM Handler -> Local or remote LLM
      -> Return response
Background worker: saves new user messages into Weaviate (embedding -> object)

Quickstart (developer)

Prereqs:

  • Go 1.20+
  • Docker & Docker Compose
  1. Copy repo and set module path (or go mod init github.com/yourname/memcortex).
  2. Create .env (see .env.example).
  3. Build docker image & run server:
docker-compose up -d --build
Screenshot 2025-12-09 at 10 54 59 PM
  1. Example request:
curl -X POST http://localhost:8080/chat \
  -H "Content-Type: application/json" \
  -H "X-User-ID: memcortex-user-x" \
  -d '{"message":"My preffered memory layer is memcortex."}'

Shown below are example requests using Thunderclient (you can use any api client of choice. Remember to set the X-User-ID in the headers)

Screenshot 2025-12-09 at 10 49 32 PM Screenshot 2025-12-09 at 10 49 07 PM Screenshot 2025-12-09 at 10 48 45 PM Screenshot 2025-12-09 at 10 48 13 PM Screenshot 2025-12-09 at 10 53 23 PM Screenshot 2025-12-09 at 10 52 20 PM Screenshot 2025-12-09 at 10 51 48 PM Screenshot 2025-12-09 at 10 51 23 PM

The first request will save the memory asynchronously. Later requests will retrieve and inject the memory.


Package structure

memcortex/
├─ cmd/server/main.go          # App entry point
├─ internal/
│  ├─ embedder/
│  │  └─ ollama.go             # Contains OpenAI Embedder Logic (for text-embedding-3-small)
│  ├─ handlers/
│  │  └─ chat.go               # Chat endpoint handler
│  ├─ memory/
│  │  ├─ manager.go            # High-level RAG orchestration
│  │  ├─ queue.go              
│  │  └─ store.go              # Weaviate storage wrapper
│  └─ middleware/
│     └─ memory_middleware.go  # Context injection middleware
├─ .env.example                      # Environment file
├─ docker-compose.yml
├─ Dockerfile
├─ Dockerfile.ollama
├─ go.mod
├─ go.sum
└─ README.md

.env.example

EMBEDDING_MODEL=nomic-embed-text
EMBEDDING_DIM=768
SERVER_ADDR=:8080
OLLAMA_ADDR=11434
MAX_MEMORY_DISTANCE=0.5 // This describes the vector search distance 
TOP_K_MEMORIES=10

docker-compose.yml

services:
  ollama:
    build:
      context: .
      dockerfile: Dockerfile.ollama
    container_name: ollama
    ports:
      - "${OLLAMA_ADDR}:11434"
    restart: unless-stopped
    entrypoint: ["/bin/sh", "-c"]
    command: >
      "ollama serve & 
      sleep 5 && 
      ollama pull ${EMBEDDING_MODEL} && 
      wait"
    volumes:
      - /root/.ollama
    healthcheck:
      test: ["CMD", "ollama", "list"]
      interval: 10s
      timeout: 5s
      retries: 5

  weaviate:
    image: semitechnologies/weaviate:1.25.3
    container_name: weaviate
    ports:
      - "6379:8080"
      - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
      PERSISTENCE_DATA_PATH: "var/lib/weaviate"
      DEFAULT_VECTORIZER_MODULE: "none"
      CLUSTER_HOSTNAME: "node1"
    volumes:
      - /var/lib/weaviate
    restart: unless-stopped
  go-server:
    build:
      context: ./
      dockerfile: Dockerfile
    container_name: go-server
    ports:
      - "${SERVER_ADDR}:8080"
    environment:
      - OLLAMA_HOST=http://ollama:11434
      - EMBEDDING_MODEL=nomic-embed-text
      - WEAVIATE_HOST=http://weaviate:8080
    depends_on:
      ollama:
        condition: service_healthy
      weaviate:
        condition: service_started
    restart: unless-stopped

About

Memcortex is a lightweight semantic memory layer for LLMs using open source embedding model and Weaviate for vector storage and retrieval.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published