Memcortex

Persistent Memory Layer for LLMs (Memory-RAG)

Memcortex is a Proof of Concept (PoC) designed to equip conversational agents and LLM applications with persistent, long-term memory. By implementing a Memory-RAG (Retrieval-Augmented Generation) architecture, Memcortex allows agents to transcend context-window limitations, enabling them to recall past interactions and specific data points indefinitely.

Project Overview

Memcortex stores user/application memories as both text and vectors in Weaviate and exposes a memory manager + middleware that:

Embeds incoming text using nomic-embed-text embeddings.
Stores memories in a Memory_idx class on Weaviate.
Runs vector searches to retrieve top‑K relevant memories for a user.
Injects retrieved memories into the prompt before it reaches the LLM.
Optionally persists new memories asynchronously.

This pattern is ideal for building chatbots, agents, and personalization layers that must "remember" details across sessions.

Architecture (Mermaid)

flowchart LR
  A[User] -->|POST /chat| B(API Server)
  B --> C{Memory Middleware}
  C -->|retrieve| D[Weaviate Vector Store]
  D -->|top-K| C
  C -->|inject memories| E[LLM Handler]
  E -->|call LLM API| F[Ollama / Custom LLM]
  F -->|response| E
  E -->|save message| G(Background Save Worker)
  G --> D
  subgraph Infra
    D
    F
  end

ASCII Diagram

User -> API Server (/chat)
      -> Memory Middleware:
           - Embed user query via Ollama
           - Query Weaviate vector index (top-K)
           - Re-rank / filter / format
           - Inject into prompt
      -> LLM Handler -> Local or remote LLM
      -> Return response
Background worker: saves new user messages into Weaviate (embedding -> object)

Quickstart (developer)

Prereqs:

Go 1.20+
Docker & Docker Compose

Copy repo and set module path (or go mod init github.com/yourname/memcortex).
Create .env (see .env.example).
Build docker image & run server:

docker-compose up -d --build

Example request:

curl -X POST http://localhost:8080/chat \
  -H "Content-Type: application/json" \
  -H "X-User-ID: memcortex-user-x" \
  -d '{"message":"My preffered memory layer is memcortex."}'

Shown below are example requests using Thunderclient (you can use any api client of choice. Remember to set the X-User-ID in the headers)

The first request will save the memory asynchronously. Later requests will retrieve and inject the memory.

Package structure

memcortex/
├─ cmd/server/main.go          # App entry point
├─ internal/
│  ├─ embedder/
│  │  └─ ollama.go             # Contains OpenAI Embedder Logic (for text-embedding-3-small)
│  ├─ handlers/
│  │  └─ chat.go               # Chat endpoint handler
│  ├─ memory/
│  │  ├─ manager.go            # High-level RAG orchestration
│  │  ├─ queue.go              
│  │  └─ store.go              # Weaviate storage wrapper
│  └─ middleware/
│     └─ memory_middleware.go  # Context injection middleware
├─ .env.example                      # Environment file
├─ docker-compose.yml
├─ Dockerfile
├─ Dockerfile.ollama
├─ go.mod
├─ go.sum
└─ README.md

.env.example

EMBEDDING_MODEL=nomic-embed-text
EMBEDDING_DIM=768
SERVER_ADDR=:8080
OLLAMA_ADDR=11434
MAX_MEMORY_DISTANCE=0.5 // This describes the vector search distance 
TOP_K_MEMORIES=10

docker-compose.yml

services:
  ollama:
    build:
      context: .
      dockerfile: Dockerfile.ollama
    container_name: ollama
    ports:
      - "${OLLAMA_ADDR}:11434"
    restart: unless-stopped
    entrypoint: ["/bin/sh", "-c"]
    command: >
      "ollama serve & 
      sleep 5 && 
      ollama pull ${EMBEDDING_MODEL} && 
      wait"
    volumes:
      - /root/.ollama
    healthcheck:
      test: ["CMD", "ollama", "list"]
      interval: 10s
      timeout: 5s
      retries: 5

  weaviate:
    image: semitechnologies/weaviate:1.25.3
    container_name: weaviate
    ports:
      - "6379:8080"
      - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
      PERSISTENCE_DATA_PATH: "var/lib/weaviate"
      DEFAULT_VECTORIZER_MODULE: "none"
      CLUSTER_HOSTNAME: "node1"
    volumes:
      - /var/lib/weaviate
    restart: unless-stopped
  go-server:
    build:
      context: ./
      dockerfile: Dockerfile
    container_name: go-server
    ports:
      - "${SERVER_ADDR}:8080"
    environment:
      - OLLAMA_HOST=http://ollama:11434
      - EMBEDDING_MODEL=nomic-embed-text
      - WEAVIATE_HOST=http://weaviate:8080
    depends_on:
      ollama:
        condition: service_healthy
      weaviate:
        condition: service_started
    restart: unless-stopped

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Memcortex

Contents

Project Overview

Architecture (Mermaid)

ASCII Diagram

Quickstart (developer)

Package structure

.env.example

docker-compose.yml

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
cmd/server		cmd/server
internal		internal
.env.example		.env.example
.gitignore		.gitignore
.golangci.yml		.golangci.yml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.ollama		Dockerfile.ollama
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

sobowalebukola/memcortex

Folders and files

Latest commit

History

Repository files navigation

Memcortex

Contents

Project Overview

Architecture (Mermaid)

ASCII Diagram

Quickstart (developer)

Package structure

.env.example

docker-compose.yml

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages