Releases: labazhou2024/memexa
v0.1.1 — onboarding wizards + 13 fresh-user fixes
v0.1.1 — onboarding wizards + 13 fresh-user fixes
v0.1.0 shipped the new project name (Memexa) on PyPI but the email
ingest path turned out to be broken (hard-coded to maintainer-specific
account names that did not exist in the OSS package). v0.1.1 fixes
that and rewrites onboarding around three interactive wizards, then
adds 13 more fixes caught when the whole flow was re-played from a
fresh-user perspective across Win 11 + Mac Studio + USTC Linux, with
real IMAP credentials (QQ + USTC Exmail-reverse-proxy) and real
WeChatMsg-schema JSON.
TL;DR for a brand-new user
pip install memexa==0.1.1
memexa demo # Tier 0 — see it works
memexa init # scaffold ~/.memexa/
memexa init llm # 4 providers; DeepSeek / OpenAI / Qwen / custom
memexa init email # 12+ IMAP providers auto-detected
memexa backend up # docker compose pg + Hindsight
memexa ingest email # IMAP → batch → LLM extract → POST
memexa quick "<question>" # see your own messages backNew CLI (public)
memexa init # legacy scaffold (templates ship in wheel)
memexa init llm # LLM provider wizard (4 providers)
memexa init email # IMAP wizard (12+ providers auto-detected)
memexa init wechat # WeChatMsg export wizard (Windows-only)
memexa backend up # docker compose -f ~/.memexa/docker-compose.yml up -d
memexa backend status # docker ps + curl /health
memexa backend down # compose down
memexa ingest email # fetch IMAP for all configured accounts
memexa ingest wechat # read WeChatMsg export dir → builder → extract → POST
docs/quickstart.md walks through all of these end-to-end.
Highlights
Critical fix carried from v0.1.0
memexa/extraction/email_history_fetcher.py was hard-coded to two
maintainer-specific account names (qq_email, ustc_email) and tried
to import memexa.qq_email / memexa.ustc_email — modules that
do not exist in the OSS package. v0.1.0 PyPI users who followed
the docs got ModuleNotFoundError. v0.1.1 rewrites the fetcher as a
generic IMAP client (stdlib imaplib + email.parser), reads
email.accounts.<name> from ~/.memexa/identity.yaml, supports
multiple accounts.
13 fresh-user blockers (re-verify pass)
Onboarding — install + init
memexa initshipped without the example templates in the wheel.
Fresh user got 3[warn] template missingwarnings and an empty
~/.memexa/. Templates now ship undermemexa/templates/.memexa init llmcrashed on Chinese-locale Windows console with
UnicodeEncodeError: 'gbk' codec can't encode character '\xa5'
(the¥symbol in the DeepSeek provider note). CLI entry now
reconfigures stdio to UTF-8.memexa init emailfor USTC mail printed the wrong host hint
(imap.exmail.qq.com). That endpoint rate-limits / locks new
logins. The right host ismail.ustc.edu.cn:993, LIVE-verified.
Backend — memexa backend up / memexa doctor
memexa doctorprobed/healthzbut Hindsight serves/health.memexa doctorLLM probe double-prefixed/v1, hitting
/v1/v1/chat/completions→ 404. Now detects/v1already present.memexa doctorread a non-existentnodesfield, always
reported "0 nodes" on a populated bank.memexa backend uppolled with a 60s timeout — too short for a
cold BGE-M3 load. Bumped to 180s.docker-compose.ymlroutedHINDSIGHT_API_LLM_MODELto the
EXTRACT model. Reasoning-class models (deepseek-v4-flash-ascend,
qwen-reasoner) emit content inreasoning_contentand leave
contentempty on Hindsight's strict-JSON prompts, so
fact-extraction silently failed andtotal_nodesstayed at 0.
Default switched to the GATE model.docker-compose.ymlsubstituted${HF_ENDPOINT:-}into the
container env. Empty-string substitution made huggingface_hub
raisehttpx.UnsupportedProtocol: Request URL is missing a protocolon cold start. Now loaded viaenv_file:so absent
stays absent (huggingface_hub falls back to its built-in
default). China users opt in by adding
HF_ENDPOINT=https://hf-mirror.comto~/.memexa/.env.memexa backend upno longer leaks staleHINDSIGHT_API_LLM_*
shell exports into the compose process — they were silently
shadowing~/.memexa/.env.
Ingestion — extract → POST → query
_normalize_llm_cardnow enum-coerces everyconfidencevalue
(numeric0.85, English"high"/"low", Chinese"确定"/
"模糊",bool,None) to a canonical Hindsight enum. The
four confidence fields (TimeResolution.confidence+
Entity.resolution_confidence4-value;
IdentityAssertion.confidence+RelationAssertion.confidence
3-value) are all handled. Demo dataset went from 6/18 POST OK
→ 18/18._normalize_llm_cardISO-coercesanchor_message_ts(LLM
sometimes emits a bare date"2026-05-13"). The
when_start_defaultcapture also runs after coercion so the
fallback no longer inherits a bare date._build_wechat_prompt_from_messagesno longer silently drops
every non-text message in a real WeChatMsg export. Image /
sticker / video / voice / location / appmsg / sysmsg all
survive as[图片]/[表情]/[视频]/ etc. placeholders,
with<title>extracted fromType=49appmsg XML. 30-60% of
a real chat history that was being lost is now preserved._build_wechat_prompt_from_messagespropagatesIsSender=1→
per-msgis_self_messagehint + batch-leveln_self_msgs/
n_other_msgs/is_solo_self/is_self_chat. Downstream
§SELF_NOTE_MODE now anchors commitments to the user's own
utterances.
Tests
- 25 new unit tests added:
tests/unit/test_confidence_sanitizer.py
(57 parametrised cases covering numeric / English / Chinese /
bool / None / canonical-4 / canonical-3 enum semantics);
tests/unit/test_wechat_msg_adapter.py(25 cases across 10
msg-type codes +IsSendersemantics + field aliases). - Full pytest: 140 passed, 2 skipped (both pre-existing
prompt-drift tests, queued for the next prompt-maintenance pass).
LIVE verification matrix
| Win 11 (Docker Desktop) | Mac Studio (OrbStack) | USTC Ubuntu 22.04 | |
|---|---|---|---|
| pip install | ✅ | ✅ | ✅ |
memexa demo |
✅ | ✅ | ✅ |
memexa init + wizards |
✅ | ✅ | ✅ |
memexa backend up |
✅ | ✅ (mihomo + HF mirror) | ❌ docker.io blocked by campus firewall (env, not memexa) |
| Real QQ IMAP ingest | ✅ N=10 queryable cards | (same code path) | blocked above |
| Real USTC IMAP ingest | ✅ N=10 (deduped w/ QQ) | (same code path) | blocked above |
| WeChat demo ingest | ✅ 18/18 POST | ✅ 12/12 | blocked above |
| WeChat real-schema | ✅ 7 msgs → 3 cards | (same code path) | blocked above |
| Mixed msg types | ✅ 6/6 survive | (same code path) | blocked above |
memexa doctor |
✅ 4 green | ✅ 4 green | ✅ graceful w/o backend |
Known gaps deferred to v0.1.2
- Other IMAP providers (Gmail / Outlook / iCloud / 163 / 126 / Yeah /
Hotmail / Sina / Live) — wizards have correct host/port/auth-type
hints, but only QQ and USTC have been LIVE-fetched against. Other
providers should work but are not LIVE-attested for this release. - Large-volume ingest (>500 batches) — small dataset proven; rate-
limit / dead-letter back-pressure / memory behavior at scale not
yet stress-tested. - WeChatMsg-from-the-real-tool — adapter is reverse-engineered from
WeChatMsg's documented schema (Type/IsSender/StrContent/
CreateTime/NickName/StrTalker) but a real WeChatMsg
release binary was not in this verification pass. Field aliases
cover the common emissions; truly novel field names would need a
v0.1.2 adapter patch. - Hindsight async-consolidation transparency —
memexa ingestshows
dead-letter: Nwhen verify-after-POST seestotal=0, but the
cards are usually in the document store and just waiting for the
background consolidator. v0.1.2 should auto-trigger
/consolidateand poll.
Install & upgrade
# Fresh install:
pip install memexa==0.1.1
# Upgrade from v0.1.0:
pip install -U memexaFor full quickstart (Tier 0 / Tier 1 / Tier 2), see
docs/quickstart.md.
— feat/v0.1.1-onboarding PR #18 · 15 commits · 140 pytest pass · 18/18 CI green
v0.1.0 — first stable release
v0.1.0 — first stable release
Install: pip install memexa (no more --pre flag).
One-minute demo (no Docker, no LLM key, no config):
pip install memexa
memexa demoWhat's in this release
Stable cut from rc4. The single goal of this release was to close the
six say-do gaps that the post-rc4 audit found between what the README
/ ROADMAP / CHANGELOG said the project did and what it actually
did in a fresh-install venv on three platforms.
What works today on your own data
| Source | Path | Status |
|---|---|---|
| IMAP — 10 min setup | ✅ Win/macOS/Linux | |
| Browser | Chrome / Firefox SQLite history | ✅ Win/macOS/Linux |
| Claude Code | ~/.claude/projects/*/conversations.jsonl |
✅ Win/macOS/Linux |
| Audio | Recorder → Whisper / SenseVoice → JSON | ✅ Win/macOS/Linux |
| WeChatMsg / wechatDataBackup / PyWxDump export → builder | ⚠ Windows only (upstream tool constraint) | |
| NapCat disabled (Tencent ban wave); db-only adapter pending v0.2 migration | ⚠ manual file copy required |
Full per-source status + recommended first-day order: see
docs/quickstart.md Tier 3.
Bugs fixed since rc4
memexa quick "X" --json(subcommand-level position) now works.
rc4 was rejecting it at argparse withunrecognized arguments: --json. Reproduced live on Win 11, macOS 13.12, Ubuntu 22.04
before the fix landed.memexa quick "X"now exits 1 with an English stderr hint when
the backend is unreachable, instead of silently returningN=0- exit 0. Agents subprocess-invoking memexa rely on exit codes.
Documentation corrections since rc4
docs/quickstart.mdTier 0 expected demo output now matches what
memexa demoactually prints, byte-for-byte (per-source counts
were all wrong in rc4).- macOS Python 3.9 install gap (stock macOS Python is below 3.10
project minimum) now surfaced as an explicit warning above the
install command, withbrew install python@3.11instructions. - ROADMAP / CHANGELOG aligned to actual state.
- All 49 English docs now have Chinese mirrors (
.zh.md).
Stable-cut transparency
Two of three ROADMAP §[0.1.0] release gates are unmet at the time of
this cut (no non-author PR yet, rc4 critical-fix cooldown not yet 7
days old). Cutting anyway because all six say-do gaps are closed and
LIVE-verified on three platforms; the two unmet gates are de-facto
signals (community velocity, soak time) rather than de-jure
correctness signals. v0.1.0 ships with a documented "Known
limitations" section in both quickstart Tier 3 and ROADMAP Shipped
so users hit the honesty wall on the docs page, not in their
workflow. See CHANGELOG.md §0.1.0
for full rationale.
Verification at cut time
- Win 11 + Python 3.11.9 fresh venv:
pip install memexa+memexa demo→ exit 0 - macOS + Python 3.13.12 miniforge fresh venv: same
- Linux Ubuntu 22.04 + Python 3.10.12 fresh venv: same
- pytest: 58 passed / 2 skipped (prompt-drift; tracked)
- CI matrix: 8/8 OS × Python combinations green
- Bilingual coverage: 49 / 49 / 0 missing
What's next
- v0.1.x patch line: opens when the first non-author issue or
PR lands. - v0.2: workflow-spec templates + QQ db-only adapter migration
from upstream JARVIS. - v0.3: 飞书 / 钉钉 / WeChat PC backup adapters; local document
source;--embeddedbackend mode.
Full roadmap: ROADMAP.md.
🤖 Released via Claude Code
v0.1.0-rc4 — bundle demo dataset + dynamic __version__
Patch on top of rc3 fixing two release-blocker bugs discovered by fresh-venv install verification.
What's fixed since rc3
- `memexa demo` no longer fails with `ModuleNotFoundError: No module named 'examples.demo_dataset'`. The pyproject wheel-include rule now bundles `examples/demo_dataset/` (the ingest module + 6 source JSON/JSONL files).
- `memexa version` now reports `0.1.0rc4` instead of the stale `0.1.0a0`. The version is read dynamically from installed-package metadata so it cannot drift again.
Both bugs were caused by configurations that worked in editable / source-tree installs but failed in fresh-wheel installs from PyPI — the exact scenario every new memexa user encounters. The internal release pre-flight checklist now requires a fresh-venv wheel install + run smoke test as Item 7.
Recommended install
```bash
pip install --pre memexa==0.1.0rc4
memexa demo # 30-second onboarding
memexa version # → memexa 0.1.0rc4
```
rc3 disposition
rc3 remains on PyPI (cannot be deleted). `pip install --pre memexa` resolves to rc4 automatically. Direct `pip install memexa==0.1.0rc3` still installs rc3 with the broken `memexa demo` — please use rc4.
Other content
All v0.1.0-rc3 content (agent-first brand consolidation, `--json` mode, 3-tier quickstart, docs/why.md + docs/cost.md) is included unchanged. See v0.1.0-rc3 release notes for the feature list.
Pull request: #16.
🤖 Generated with Claude Code
v0.1.0-rc3 — agent-first brand consolidation + memexa demo + --json mode
Third release candidate on the v0.1.0 line. Closes out the first-experience path for both human users and AI agents.
Note on release timing: An initial v0.1.0-rc3 release on 2026-05-16 13:59 UTC failed to publish to PyPI because pyproject.toml version had not been bumped from rc2 to rc3 — twine silently skipped the upload via `--skip-existing`. PR #15 corrected the version field and added a pre-flight checklist to internal protocol to prevent recurrence. This release supersedes that one.
Highlights
Agent-first positioning clarified. memexa is a memory backend that AI agents (Claude Code, Cursor, Cline) invoke as a subprocess on a human user's behalf. Direct CLI use is secondary. See docs/why.md for the per-capability comparison vs OpenHuman / MemPalace / ReMe and the agent-first design rationale.
New features
- `memexa demo` — thirty-second onboarding. `pip install --pre memexa && memexa demo` ingests a synthetic dataset with the stub extractor and runs five sample queries in the terminal. No Docker, no LLM API key, no configuration.
- `--json` output mode for all fourteen query subcommands. Top-level flag short-circuits text rendering and emits the raw return value as JSON. Designed for AI agents invoking memexa via shell subprocess.
New docs
- docs/why.md (bilingual) — comparison matrix, agent-first design rationale, glossary covering V2 envelope / reflow / Chinese-IM reflow / audio + voice reflow / workflow spec terms.
- docs/cost.md (bilingual) — DeepSeek / GPT-4o / Claude monthly cost estimates with three-tier user profiles. Recommended Flash-gate + Pro-extract combination at ¥0.30 per 1 000 messages.
Brand and roadmap consolidation
- README first screen restored to include the AI-agent compatible by design paragraph and a dual-path Quickstart (humans run `memexa demo`, agents call subcommands with `--json`).
- `docs/quickstart.md` Tier 0 expanded with the agent subprocess path.
- ROADMAP v0.2 redefined from "Python deliverable code + CLI subcommands" to Markdown workflow specs under `docs/templates/`. v0.5 promotes the subprocess path to a native MCP server. v0.7 formalises user-authored specs. New v0.8+ section for optional desktop GUI exploration.
- Makefile lint / format paths corrected to `memexa tests` (residue from PR #9's src→memexa rename).
Tests
- 8 new tests across two files (`tests/integration/test_demo_subcommand.py` and `tests/integration/test_subcmd_json_output.py`). Full pytest passes 58 / 2 pre-existing skips.
Stats
Pull requests: #14 (feature content, 16 files / +1994 / -1177) and #15 (version bump fix, 1 file / +1 / -1).
Install
```bash
pip install --pre memexa==0.1.0rc3
memexa demo
```
🤖 Generated with Claude Code
v0.1.0-rc2 — install bug fix
Patch over v0.1.0-rc1
Fixes the v0.1.0-rc1 install bug where import memexa raised ModuleNotFoundError because the package was installed under a literal src/ namespace.
Changed
- Full source-tree rename
src/→memexa/(289 files, 1234 substitutions) - pyproject.toml:
packages.find include = ["memexa*"](was["src*"]) - pyproject.toml: add
authors = [{name = "labazhou2024"}]+maintainers(was empty in rc1, madepip show memexalook orphan) - Bump version 0.1.0rc1 → 0.1.0rc2
Verification
pip install --pre memexa==0.1.0rc2
python -c "import memexa; print(memexa.__file__)" # OK
python -c "from memexa.core import memory_query" # OK
memexa --help # OKWhy use this over rc1
- rc1 has the
import srcanti-pattern that will not be fixed in place; rc2 supersedes it - All future development continues on the
memexapackage layout
For the full project pitch, scope, limitations, and roadmap, see v0.1.0-rc1 release notes (still applies — only the install layout changed).
v0.1.0-rc1 — first public release candidate
v0.1.0-rc1 — first public release candidate
Status: Release Candidate. Open for feedback. Cut to v0.1.0 stable after ≥1 week of green CI on Win + macOS + Linux + non-trivial third-party install report.
memexa (镜我) is a self-hosted, Chinese-first personal memory graph. It ingests six categories of everyday data, runs them through a two-LLM gate-extract pipeline, stores entities + relationships + temporal evidence in PostgreSQL + pgvector, and exposes a 14-subcommand CLI plus a five-phase state inference workflow.
The Pensieve, in code. Pour the memories scattered across your life into a basin, rearrange them around the question you have right now, walk away with something usable.
What you get in rc1
Six ingestion sources
WeChat · QQ (experimental — see below) · Email · Browser history · Claude Code chats · Audio (microphone + ASR)
Two-LLM gate-extract pipeline
- Stage A — gatekeeper LLM (HIGH / MEDIUM / LOW filter)
- Stage B — extractor LLM (V2 envelope JSON)
- Stage C — BGE-M3 quorum + DeepSeek-style arbiter
- Stage D — POST →
memory_full_v5PostgreSQL bank
14 query subcommands, three tiers
- Basic:
quick,topic,arc,timeline,person,project,pending,reflect - Advanced:
types,graph-walk,summary,trends - Composite:
session-context,task-brief
Five-phase state inference
A query workflow (docs/5_phase_query.md) for going from fuzzy human input ("what was that thing my advisor mentioned in March?") to a concrete answer in <60 seconds.
Live dashboard on :8765
7 panels: Win/Mac/server CPU+GPU, API usage, memory system health, cron health, graph queries, audio pipeline progress.
Cron orchestrator
- Per-source drivers (
backfill_v5_<source>_driver.py) - Dead-letter queue with retry budgets
- PG-aware pending detector (skips already-processed batches)
AI-agent first-class (the differentiator)
Most real users invoke memexa through Claude Code, Cursor, or Cline rather than typing subcommands by hand. The 14 subcommands are a small protocol; the agent contract is docs/for_agents.md:
- 7 hard rules for query selection
- Decision table mapping user intent → subcommand
- Composition patterns for multi-source questions
- Common pitfalls catalog (e.g. don't use
topicon names, usearc)
Engineering polish
- Apache 2.0
- 3-layer abstraction (
_path_resolver/_user_aliases/_user_identity) — clean enough that no PII leaks survived the sanitizer - 50 passing tests (2 skipped due to known prompt drift, scheduled for v0.2)
- 309/309 .py syntax PASS
- CI: pytest × Python 3.10 / 3.11 / 3.12 + CodeQL + Security + dependabot + Release Drafter
- Pre-commit PII scan hook
- Branch protection on
main(required CI + 1 review)
Documentation
- Bilingual everywhere — every user-facing
.mdships with a.zh.mdmirror - 5 walkthroughs against a synthetic demo dataset (Alice / Bob / Carol)
- 2 case studies (lab-report pipeline, 5-minute meeting brief)
- 6 lessons-learned narratives (~2.7k lines bilingual)
- Per-OS deployment guides (Windows, macOS, Linux, docker-compose)
Installation (rc1)
# pip install not yet on PyPI; install from GitHub:
pip install git+https://github.com/labazhou2024/memexa.git@v0.1.0-rc1
# Or clone for the demo dataset + walkthroughs:
git clone https://github.com/labazhou2024/memexa.git
cd memexa
pip install -e ".[dev]"
memexa init
docker compose -f docker-compose.example.yml up -d
python -m examples.demo_dataset.ingest --dry-run
memexa doctor
memexa quick "Alice"Full quickstart: docs/quickstart.md
Known limitations (be honest)
| # | Limitation |
|---|---|
| 1 | PyPI not yet wired — pip install memexa returns nothing; install from GitHub for now. |
| 2 | QQ source: dual-track. The recommended db-only path (direct nt_msg.db SQLCipher v4 read — sends no protocol packets, indistinguishable from any chat-history backup tool) is LIVE in upstream JARVIS (jarvis/qq_db.py, 762 lines, standard library only) and scheduled for OSS migration in v0.2. The NapCat / OneBot adapter shipped with rc1 is deprecated and disabled by default after a 2025-09-05 NapCat public-OneBot incident caused Tencent to batch-ban every QQ that ever ran NapCat / LiteLoaderQQNT — the maintainer's own QQ was caught in this wave on 2026-05-14. Until v0.2 lands, OSS v0.1.x users either (a) copy jarvis/qq_db.py in by hand or (b) use the clipboard fallback (also LIVE upstream, also v0.2). See docs/integrations/qq.md. |
| 3 | Mac / Linux fresh-clone smoke test is not in CI on every push yet. Scheduled for v0.1.0 stable. |
| 4 | Deliverable factory (auto lab report / action card / weekly brief) is on the v0.2 roadmap. v0.1 ships the building blocks (case studies show the manual composition). |
| 5 | Two test files skipped — prompt drift after rewrites; queued for v0.2 prompt-maintenance pass. |
| 6 | CLI ruff coverage narrow (E9 only for rc1) — full ruleset gates on a dedicated make fmt pass scheduled for v0.2. |
Why this exists
The Chinese personal-data ecosystem is a six-silo patchwork (WeChat + QQ + email + browser + AI chats + audio), and every existing personal-knowledge tool either (a) demands you re-enter everything by hand, (b) only handles one silo, or (c) ships English-first prompts that mis-extract Chinese context.
memexa was built bottom-up to extract from real Chinese chat / email / voice data, with prompts tuned for the way Chinese people actually write. The two-LLM gate-extract is not a marketing line — without it the gatekeeper either lets too much trash through (single-LLM cheap) or rejects good evidence (single-LLM strict). The 14 subcommands are not a SQL DSL — they are six months of "what did the user actually ask?" condensed into a small protocol an LLM agent can compose.
Roadmap
See ROADMAP.md. Highlights:
- v0.2 — Deliverable factory:
memexa lab-report,memexa action-card,memexa brief <person>,memexa weekly. PyPI release. Full CI matrix (Win + Mac + Linux × Py 3.10–3.12). Prompt-maintenance pass. - v0.5 — Optional paid API endpoint (OpenAI-style, billed per token). The OSS core remains fully usable forever — paid is an upgrade path, never a gate.
- v1.0 — Schema commitment + QQ db-only path productionized.
How to help
Star, file an issue (even if it's "your README has a typo"), open a discussion, or push a PR. The .audit/PLACEHOLDER_INVENTORY.md lists the 25 modules still flagged TODO(memgraph-oss) for _path_resolver migration — easy first contributions. See CONTRIBUTING.md.
If you ship an AI agent that needs a Chinese-data memory layer, docs/for_agents.md is your starting point — drop in your own subcommand list and we'll co-evolve the protocol.
Acknowledgements
memexa stands on pgvector, BGE-M3, DeepSeek, Hindsight, Silero VAD, mlx-whisper, SenseVoice / FunASR, and the engineering tradition of Pensieve-shaped tools that came before it (Logseq, mem.ai, dnote, recoll, Notational Velocity). Without prior work in retrieval-augmented memory + Chinese NLP this would not exist.
Full Changelog: https://github.com/labazhou2024/memexa/commits/v0.1.0-rc1