v0.1.0-rc1 — first public release candidate

Status: Release Candidate. Open for feedback. Cut to v0.1.0 stable after ≥1 week of green CI on Win + macOS + Linux + non-trivial third-party install report.

memexa (镜我) is a self-hosted, Chinese-first personal memory graph. It ingests six categories of everyday data, runs them through a two-LLM gate-extract pipeline, stores entities + relationships + temporal evidence in PostgreSQL + pgvector, and exposes a 14-subcommand CLI plus a five-phase state inference workflow.

The Pensieve, in code. Pour the memories scattered across your life into a basin, rearrange them around the question you have right now, walk away with something usable.

What you get in rc1

Six ingestion sources

WeChat · QQ (experimental — see below) · Email · Browser history · Claude Code chats · Audio (microphone + ASR)

Two-LLM gate-extract pipeline

Stage A — gatekeeper LLM (HIGH / MEDIUM / LOW filter)
Stage B — extractor LLM (V2 envelope JSON)
Stage C — BGE-M3 quorum + DeepSeek-style arbiter
Stage D — POST → memory_full_v5 PostgreSQL bank

14 query subcommands, three tiers

Basic: quick, topic, arc, timeline, person, project, pending, reflect
Advanced: types, graph-walk, summary, trends
Composite: session-context, task-brief

Five-phase state inference

A query workflow (docs/5_phase_query.md) for going from fuzzy human input ("what was that thing my advisor mentioned in March?") to a concrete answer in <60 seconds.

Live dashboard on `:8765`

7 panels: Win/Mac/server CPU+GPU, API usage, memory system health, cron health, graph queries, audio pipeline progress.

Cron orchestrator

Per-source drivers (backfill_v5_<source>_driver.py)
Dead-letter queue with retry budgets
PG-aware pending detector (skips already-processed batches)

AI-agent first-class (the differentiator)

Most real users invoke memexa through Claude Code, Cursor, or Cline rather than typing subcommands by hand. The 14 subcommands are a small protocol; the agent contract is docs/for_agents.md:

7 hard rules for query selection
Decision table mapping user intent → subcommand
Composition patterns for multi-source questions
Common pitfalls catalog (e.g. don't use topic on names, use arc)

Engineering polish

Apache 2.0
3-layer abstraction (_path_resolver / _user_aliases / _user_identity) — clean enough that no PII leaks survived the sanitizer
50 passing tests (2 skipped due to known prompt drift, scheduled for v0.2)
309/309 .py syntax PASS
CI: pytest × Python 3.10 / 3.11 / 3.12 + CodeQL + Security + dependabot + Release Drafter
Pre-commit PII scan hook
Branch protection on main (required CI + 1 review)

Documentation

Bilingual everywhere — every user-facing .md ships with a .zh.md mirror
5 walkthroughs against a synthetic demo dataset (Alice / Bob / Carol)
2 case studies (lab-report pipeline, 5-minute meeting brief)
6 lessons-learned narratives (~2.7k lines bilingual)
Per-OS deployment guides (Windows, macOS, Linux, docker-compose)

Installation (rc1)

# pip install not yet on PyPI; install from GitHub:
pip install git+https://github.com/labazhou2024/memexa.git@v0.1.0-rc1

# Or clone for the demo dataset + walkthroughs:
git clone https://github.com/labazhou2024/memexa.git
cd memexa
pip install -e ".[dev]"
memexa init
docker compose -f docker-compose.example.yml up -d
python -m examples.demo_dataset.ingest --dry-run
memexa doctor
memexa quick "Alice"

Full quickstart: docs/quickstart.md

Known limitations (be honest)

#	Limitation
1	PyPI not yet wired — `pip install memexa` returns nothing; install from GitHub for now.
2	QQ source: dual-track. The recommended db-only path (direct `nt_msg.db` SQLCipher v4 read — sends no protocol packets, indistinguishable from any chat-history backup tool) is LIVE in upstream JARVIS (`jarvis/qq_db.py`, 762 lines, standard library only) and scheduled for OSS migration in v0.2. The NapCat / OneBot adapter shipped with rc1 is deprecated and disabled by default after a 2025-09-05 NapCat public-OneBot incident caused Tencent to batch-ban every QQ that ever ran NapCat / LiteLoaderQQNT — the maintainer's own QQ was caught in this wave on 2026-05-14. Until v0.2 lands, OSS v0.1.x users either (a) copy `jarvis/qq_db.py` in by hand or (b) use the clipboard fallback (also LIVE upstream, also v0.2). See docs/integrations/qq.md.
3	Mac / Linux fresh-clone smoke test is not in CI on every push yet. Scheduled for v0.1.0 stable.
4	Deliverable factory (auto lab report / action card / weekly brief) is on the v0.2 roadmap. v0.1 ships the building blocks (case studies show the manual composition).
5	Two test files skipped — prompt drift after rewrites; queued for v0.2 prompt-maintenance pass.
6	CLI ruff coverage narrow (E9 only for rc1) — full ruleset gates on a dedicated `make fmt` pass scheduled for v0.2.

Why this exists

The Chinese personal-data ecosystem is a six-silo patchwork (WeChat + QQ + email + browser + AI chats + audio), and every existing personal-knowledge tool either (a) demands you re-enter everything by hand, (b) only handles one silo, or (c) ships English-first prompts that mis-extract Chinese context.

memexa was built bottom-up to extract from real Chinese chat / email / voice data, with prompts tuned for the way Chinese people actually write. The two-LLM gate-extract is not a marketing line — without it the gatekeeper either lets too much trash through (single-LLM cheap) or rejects good evidence (single-LLM strict). The 14 subcommands are not a SQL DSL — they are six months of "what did the user actually ask?" condensed into a small protocol an LLM agent can compose.

Roadmap

See ROADMAP.md. Highlights:

v0.2 — Deliverable factory: memexa lab-report, memexa action-card, memexa brief <person>, memexa weekly. PyPI release. Full CI matrix (Win + Mac + Linux × Py 3.10–3.12). Prompt-maintenance pass.
v0.5 — Optional paid API endpoint (OpenAI-style, billed per token). The OSS core remains fully usable forever — paid is an upgrade path, never a gate.
v1.0 — Schema commitment + QQ db-only path productionized.

How to help

Star, file an issue (even if it's "your README has a typo"), open a discussion, or push a PR. The .audit/PLACEHOLDER_INVENTORY.md lists the 25 modules still flagged TODO(memgraph-oss) for _path_resolver migration — easy first contributions. See CONTRIBUTING.md.

If you ship an AI agent that needs a Chinese-data memory layer, docs/for_agents.md is your starting point — drop in your own subcommand list and we'll co-evolve the protocol.

Acknowledgements

memexa stands on pgvector, BGE-M3, DeepSeek, Hindsight, Silero VAD, mlx-whisper, SenseVoice / FunASR, and the engineering tradition of Pensieve-shaped tools that came before it (Logseq, mem.ai, dnote, recoll, Notational Velocity). Without prior work in retrieval-augmented memory + Chinese NLP this would not exist.

Full Changelog: https://github.com/labazhou2024/memexa/commits/v0.1.0-rc1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.0-rc1 — first public release candidate

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v0.1.0-rc1 — first public release candidate

What you get in rc1

Six ingestion sources

Two-LLM gate-extract pipeline

14 query subcommands, three tiers

Five-phase state inference

Live dashboard on `:8765`

Cron orchestrator

AI-agent first-class (the differentiator)

Engineering polish

Documentation

Installation (rc1)

Known limitations (be honest)

Why this exists

Roadmap

How to help

Acknowledgements

Uh oh!

v0.1.0-rc1 — first public release candidate

v0.1.0-rc1 — first public release candidate

What you get in rc1

Six ingestion sources

Two-LLM gate-extract pipeline

14 query subcommands, three tiers

Five-phase state inference

Live dashboard on :8765

Cron orchestrator

AI-agent first-class (the differentiator)

Engineering polish

Documentation

Installation (rc1)

Known limitations (be honest)

Why this exists

Roadmap

How to help

Acknowledgements

Uh oh!

Live dashboard on `:8765`