v0.1.0-rc1 — first public release candidate
Pre-releasev0.1.0-rc1 — first public release candidate
Status: Release Candidate. Open for feedback. Cut to v0.1.0 stable after ≥1 week of green CI on Win + macOS + Linux + non-trivial third-party install report.
memexa (镜我) is a self-hosted, Chinese-first personal memory graph. It ingests six categories of everyday data, runs them through a two-LLM gate-extract pipeline, stores entities + relationships + temporal evidence in PostgreSQL + pgvector, and exposes a 14-subcommand CLI plus a five-phase state inference workflow.
The Pensieve, in code. Pour the memories scattered across your life into a basin, rearrange them around the question you have right now, walk away with something usable.
What you get in rc1
Six ingestion sources
WeChat · QQ (experimental — see below) · Email · Browser history · Claude Code chats · Audio (microphone + ASR)
Two-LLM gate-extract pipeline
- Stage A — gatekeeper LLM (HIGH / MEDIUM / LOW filter)
- Stage B — extractor LLM (V2 envelope JSON)
- Stage C — BGE-M3 quorum + DeepSeek-style arbiter
- Stage D — POST →
memory_full_v5PostgreSQL bank
14 query subcommands, three tiers
- Basic:
quick,topic,arc,timeline,person,project,pending,reflect - Advanced:
types,graph-walk,summary,trends - Composite:
session-context,task-brief
Five-phase state inference
A query workflow (docs/5_phase_query.md) for going from fuzzy human input ("what was that thing my advisor mentioned in March?") to a concrete answer in <60 seconds.
Live dashboard on :8765
7 panels: Win/Mac/server CPU+GPU, API usage, memory system health, cron health, graph queries, audio pipeline progress.
Cron orchestrator
- Per-source drivers (
backfill_v5_<source>_driver.py) - Dead-letter queue with retry budgets
- PG-aware pending detector (skips already-processed batches)
AI-agent first-class (the differentiator)
Most real users invoke memexa through Claude Code, Cursor, or Cline rather than typing subcommands by hand. The 14 subcommands are a small protocol; the agent contract is docs/for_agents.md:
- 7 hard rules for query selection
- Decision table mapping user intent → subcommand
- Composition patterns for multi-source questions
- Common pitfalls catalog (e.g. don't use
topicon names, usearc)
Engineering polish
- Apache 2.0
- 3-layer abstraction (
_path_resolver/_user_aliases/_user_identity) — clean enough that no PII leaks survived the sanitizer - 50 passing tests (2 skipped due to known prompt drift, scheduled for v0.2)
- 309/309 .py syntax PASS
- CI: pytest × Python 3.10 / 3.11 / 3.12 + CodeQL + Security + dependabot + Release Drafter
- Pre-commit PII scan hook
- Branch protection on
main(required CI + 1 review)
Documentation
- Bilingual everywhere — every user-facing
.mdships with a.zh.mdmirror - 5 walkthroughs against a synthetic demo dataset (Alice / Bob / Carol)
- 2 case studies (lab-report pipeline, 5-minute meeting brief)
- 6 lessons-learned narratives (~2.7k lines bilingual)
- Per-OS deployment guides (Windows, macOS, Linux, docker-compose)
Installation (rc1)
# pip install not yet on PyPI; install from GitHub:
pip install git+https://github.com/labazhou2024/memexa.git@v0.1.0-rc1
# Or clone for the demo dataset + walkthroughs:
git clone https://github.com/labazhou2024/memexa.git
cd memexa
pip install -e ".[dev]"
memexa init
docker compose -f docker-compose.example.yml up -d
python -m examples.demo_dataset.ingest --dry-run
memexa doctor
memexa quick "Alice"Full quickstart: docs/quickstart.md
Known limitations (be honest)
| # | Limitation |
|---|---|
| 1 | PyPI not yet wired — pip install memexa returns nothing; install from GitHub for now. |
| 2 | QQ source: dual-track. The recommended db-only path (direct nt_msg.db SQLCipher v4 read — sends no protocol packets, indistinguishable from any chat-history backup tool) is LIVE in upstream JARVIS (jarvis/qq_db.py, 762 lines, standard library only) and scheduled for OSS migration in v0.2. The NapCat / OneBot adapter shipped with rc1 is deprecated and disabled by default after a 2025-09-05 NapCat public-OneBot incident caused Tencent to batch-ban every QQ that ever ran NapCat / LiteLoaderQQNT — the maintainer's own QQ was caught in this wave on 2026-05-14. Until v0.2 lands, OSS v0.1.x users either (a) copy jarvis/qq_db.py in by hand or (b) use the clipboard fallback (also LIVE upstream, also v0.2). See docs/integrations/qq.md. |
| 3 | Mac / Linux fresh-clone smoke test is not in CI on every push yet. Scheduled for v0.1.0 stable. |
| 4 | Deliverable factory (auto lab report / action card / weekly brief) is on the v0.2 roadmap. v0.1 ships the building blocks (case studies show the manual composition). |
| 5 | Two test files skipped — prompt drift after rewrites; queued for v0.2 prompt-maintenance pass. |
| 6 | CLI ruff coverage narrow (E9 only for rc1) — full ruleset gates on a dedicated make fmt pass scheduled for v0.2. |
Why this exists
The Chinese personal-data ecosystem is a six-silo patchwork (WeChat + QQ + email + browser + AI chats + audio), and every existing personal-knowledge tool either (a) demands you re-enter everything by hand, (b) only handles one silo, or (c) ships English-first prompts that mis-extract Chinese context.
memexa was built bottom-up to extract from real Chinese chat / email / voice data, with prompts tuned for the way Chinese people actually write. The two-LLM gate-extract is not a marketing line — without it the gatekeeper either lets too much trash through (single-LLM cheap) or rejects good evidence (single-LLM strict). The 14 subcommands are not a SQL DSL — they are six months of "what did the user actually ask?" condensed into a small protocol an LLM agent can compose.
Roadmap
See ROADMAP.md. Highlights:
- v0.2 — Deliverable factory:
memexa lab-report,memexa action-card,memexa brief <person>,memexa weekly. PyPI release. Full CI matrix (Win + Mac + Linux × Py 3.10–3.12). Prompt-maintenance pass. - v0.5 — Optional paid API endpoint (OpenAI-style, billed per token). The OSS core remains fully usable forever — paid is an upgrade path, never a gate.
- v1.0 — Schema commitment + QQ db-only path productionized.
How to help
Star, file an issue (even if it's "your README has a typo"), open a discussion, or push a PR. The .audit/PLACEHOLDER_INVENTORY.md lists the 25 modules still flagged TODO(memgraph-oss) for _path_resolver migration — easy first contributions. See CONTRIBUTING.md.
If you ship an AI agent that needs a Chinese-data memory layer, docs/for_agents.md is your starting point — drop in your own subcommand list and we'll co-evolve the protocol.
Acknowledgements
memexa stands on pgvector, BGE-M3, DeepSeek, Hindsight, Silero VAD, mlx-whisper, SenseVoice / FunASR, and the engineering tradition of Pensieve-shaped tools that came before it (Logseq, mem.ai, dnote, recoll, Notational Velocity). Without prior work in retrieval-augmented memory + Chinese NLP this would not exist.
Full Changelog: https://github.com/labazhou2024/memexa/commits/v0.1.0-rc1