Skip to content

v0.1.0-rc1 — first public release candidate

Pre-release
Pre-release

Choose a tag to compare

@labazhou2024 labazhou2024 released this 14 May 15:24
· 19 commits to main since this release
c583d3a

v0.1.0-rc1 — first public release candidate

Status: Release Candidate. Open for feedback. Cut to v0.1.0 stable after ≥1 week of green CI on Win + macOS + Linux + non-trivial third-party install report.

memexa (镜我) is a self-hosted, Chinese-first personal memory graph. It ingests six categories of everyday data, runs them through a two-LLM gate-extract pipeline, stores entities + relationships + temporal evidence in PostgreSQL + pgvector, and exposes a 14-subcommand CLI plus a five-phase state inference workflow.

The Pensieve, in code. Pour the memories scattered across your life into a basin, rearrange them around the question you have right now, walk away with something usable.


What you get in rc1

Six ingestion sources

WeChat · QQ (experimental — see below) · Email · Browser history · Claude Code chats · Audio (microphone + ASR)

Two-LLM gate-extract pipeline

  • Stage A — gatekeeper LLM (HIGH / MEDIUM / LOW filter)
  • Stage B — extractor LLM (V2 envelope JSON)
  • Stage C — BGE-M3 quorum + DeepSeek-style arbiter
  • Stage D — POST → memory_full_v5 PostgreSQL bank

14 query subcommands, three tiers

  • Basic: quick, topic, arc, timeline, person, project, pending, reflect
  • Advanced: types, graph-walk, summary, trends
  • Composite: session-context, task-brief

Five-phase state inference

A query workflow (docs/5_phase_query.md) for going from fuzzy human input ("what was that thing my advisor mentioned in March?") to a concrete answer in <60 seconds.

Live dashboard on :8765

7 panels: Win/Mac/server CPU+GPU, API usage, memory system health, cron health, graph queries, audio pipeline progress.

Cron orchestrator

  • Per-source drivers (backfill_v5_<source>_driver.py)
  • Dead-letter queue with retry budgets
  • PG-aware pending detector (skips already-processed batches)

AI-agent first-class (the differentiator)

Most real users invoke memexa through Claude Code, Cursor, or Cline rather than typing subcommands by hand. The 14 subcommands are a small protocol; the agent contract is docs/for_agents.md:

  • 7 hard rules for query selection
  • Decision table mapping user intent → subcommand
  • Composition patterns for multi-source questions
  • Common pitfalls catalog (e.g. don't use topic on names, use arc)

Engineering polish

  • Apache 2.0
  • 3-layer abstraction (_path_resolver / _user_aliases / _user_identity) — clean enough that no PII leaks survived the sanitizer
  • 50 passing tests (2 skipped due to known prompt drift, scheduled for v0.2)
  • 309/309 .py syntax PASS
  • CI: pytest × Python 3.10 / 3.11 / 3.12 + CodeQL + Security + dependabot + Release Drafter
  • Pre-commit PII scan hook
  • Branch protection on main (required CI + 1 review)

Documentation

  • Bilingual everywhere — every user-facing .md ships with a .zh.md mirror
  • 5 walkthroughs against a synthetic demo dataset (Alice / Bob / Carol)
  • 2 case studies (lab-report pipeline, 5-minute meeting brief)
  • 6 lessons-learned narratives (~2.7k lines bilingual)
  • Per-OS deployment guides (Windows, macOS, Linux, docker-compose)

Installation (rc1)

# pip install not yet on PyPI; install from GitHub:
pip install git+https://github.com/labazhou2024/memexa.git@v0.1.0-rc1

# Or clone for the demo dataset + walkthroughs:
git clone https://github.com/labazhou2024/memexa.git
cd memexa
pip install -e ".[dev]"
memexa init
docker compose -f docker-compose.example.yml up -d
python -m examples.demo_dataset.ingest --dry-run
memexa doctor
memexa quick "Alice"

Full quickstart: docs/quickstart.md


Known limitations (be honest)

# Limitation
1 PyPI not yet wiredpip install memexa returns nothing; install from GitHub for now.
2 QQ source: dual-track. The recommended db-only path (direct nt_msg.db SQLCipher v4 read — sends no protocol packets, indistinguishable from any chat-history backup tool) is LIVE in upstream JARVIS (jarvis/qq_db.py, 762 lines, standard library only) and scheduled for OSS migration in v0.2. The NapCat / OneBot adapter shipped with rc1 is deprecated and disabled by default after a 2025-09-05 NapCat public-OneBot incident caused Tencent to batch-ban every QQ that ever ran NapCat / LiteLoaderQQNT — the maintainer's own QQ was caught in this wave on 2026-05-14. Until v0.2 lands, OSS v0.1.x users either (a) copy jarvis/qq_db.py in by hand or (b) use the clipboard fallback (also LIVE upstream, also v0.2). See docs/integrations/qq.md.
3 Mac / Linux fresh-clone smoke test is not in CI on every push yet. Scheduled for v0.1.0 stable.
4 Deliverable factory (auto lab report / action card / weekly brief) is on the v0.2 roadmap. v0.1 ships the building blocks (case studies show the manual composition).
5 Two test files skipped — prompt drift after rewrites; queued for v0.2 prompt-maintenance pass.
6 CLI ruff coverage narrow (E9 only for rc1) — full ruleset gates on a dedicated make fmt pass scheduled for v0.2.

Why this exists

The Chinese personal-data ecosystem is a six-silo patchwork (WeChat + QQ + email + browser + AI chats + audio), and every existing personal-knowledge tool either (a) demands you re-enter everything by hand, (b) only handles one silo, or (c) ships English-first prompts that mis-extract Chinese context.

memexa was built bottom-up to extract from real Chinese chat / email / voice data, with prompts tuned for the way Chinese people actually write. The two-LLM gate-extract is not a marketing line — without it the gatekeeper either lets too much trash through (single-LLM cheap) or rejects good evidence (single-LLM strict). The 14 subcommands are not a SQL DSL — they are six months of "what did the user actually ask?" condensed into a small protocol an LLM agent can compose.


Roadmap

See ROADMAP.md. Highlights:

  • v0.2 — Deliverable factory: memexa lab-report, memexa action-card, memexa brief <person>, memexa weekly. PyPI release. Full CI matrix (Win + Mac + Linux × Py 3.10–3.12). Prompt-maintenance pass.
  • v0.5 — Optional paid API endpoint (OpenAI-style, billed per token). The OSS core remains fully usable forever — paid is an upgrade path, never a gate.
  • v1.0 — Schema commitment + QQ db-only path productionized.

How to help

Star, file an issue (even if it's "your README has a typo"), open a discussion, or push a PR. The .audit/PLACEHOLDER_INVENTORY.md lists the 25 modules still flagged TODO(memgraph-oss) for _path_resolver migration — easy first contributions. See CONTRIBUTING.md.

If you ship an AI agent that needs a Chinese-data memory layer, docs/for_agents.md is your starting point — drop in your own subcommand list and we'll co-evolve the protocol.


Acknowledgements

memexa stands on pgvector, BGE-M3, DeepSeek, Hindsight, Silero VAD, mlx-whisper, SenseVoice / FunASR, and the engineering tradition of Pensieve-shaped tools that came before it (Logseq, mem.ai, dnote, recoll, Notational Velocity). Without prior work in retrieval-augmented memory + Chinese NLP this would not exist.


Full Changelog: https://github.com/labazhou2024/memexa/commits/v0.1.0-rc1