Skip to content

Releases: labazhou2024/memexa

v0.1.1 — onboarding wizards + 13 fresh-user fixes

17 May 15:20
88fdf6a

Choose a tag to compare

v0.1.1 — onboarding wizards + 13 fresh-user fixes

v0.1.0 shipped the new project name (Memexa) on PyPI but the email
ingest path turned out to be broken (hard-coded to maintainer-specific
account names that did not exist in the OSS package). v0.1.1 fixes
that and rewrites onboarding around three interactive wizards, then
adds 13 more fixes caught when the whole flow was re-played from a
fresh-user perspective across Win 11 + Mac Studio + USTC Linux, with
real IMAP credentials (QQ + USTC Exmail-reverse-proxy) and real
WeChatMsg-schema JSON.

TL;DR for a brand-new user

pip install memexa==0.1.1
memexa demo                                 # Tier 0 — see it works
memexa init                                 # scaffold ~/.memexa/
memexa init llm                             # 4 providers; DeepSeek / OpenAI / Qwen / custom
memexa init email                           # 12+ IMAP providers auto-detected
memexa backend up                           # docker compose pg + Hindsight
memexa ingest email                         # IMAP → batch → LLM extract → POST
memexa quick "<question>"                   # see your own messages back

New CLI (public)

memexa init                  # legacy scaffold (templates ship in wheel)
memexa init llm              # LLM provider wizard (4 providers)
memexa init email            # IMAP wizard (12+ providers auto-detected)
memexa init wechat           # WeChatMsg export wizard (Windows-only)
memexa backend up            # docker compose -f ~/.memexa/docker-compose.yml up -d
memexa backend status        # docker ps + curl /health
memexa backend down          # compose down
memexa ingest email          # fetch IMAP for all configured accounts
memexa ingest wechat         # read WeChatMsg export dir → builder → extract → POST

docs/quickstart.md walks through all of these end-to-end.

Highlights

Critical fix carried from v0.1.0

memexa/extraction/email_history_fetcher.py was hard-coded to two
maintainer-specific account names (qq_email, ustc_email) and tried
to import memexa.qq_email / memexa.ustc_email — modules that
do not exist in the OSS package. v0.1.0 PyPI users who followed
the docs got ModuleNotFoundError. v0.1.1 rewrites the fetcher as a
generic IMAP client (stdlib imaplib + email.parser), reads
email.accounts.<name> from ~/.memexa/identity.yaml, supports
multiple accounts.

13 fresh-user blockers (re-verify pass)

Onboarding — install + init

  • memexa init shipped without the example templates in the wheel.
    Fresh user got 3 [warn] template missing warnings and an empty
    ~/.memexa/. Templates now ship under memexa/templates/.
  • memexa init llm crashed on Chinese-locale Windows console with
    UnicodeEncodeError: 'gbk' codec can't encode character '\xa5'
    (the ¥ symbol in the DeepSeek provider note). CLI entry now
    reconfigures stdio to UTF-8.
  • memexa init email for USTC mail printed the wrong host hint
    (imap.exmail.qq.com). That endpoint rate-limits / locks new
    logins. The right host is mail.ustc.edu.cn:993, LIVE-verified.

Backend — memexa backend up / memexa doctor

  • memexa doctor probed /healthz but Hindsight serves /health.
  • memexa doctor LLM probe double-prefixed /v1, hitting
    /v1/v1/chat/completions → 404. Now detects /v1 already present.
  • memexa doctor read a non-existent nodes field, always
    reported "0 nodes" on a populated bank.
  • memexa backend up polled with a 60s timeout — too short for a
    cold BGE-M3 load. Bumped to 180s.
  • docker-compose.yml routed HINDSIGHT_API_LLM_MODEL to the
    EXTRACT model. Reasoning-class models (deepseek-v4-flash-ascend,
    qwen-reasoner) emit content in reasoning_content and leave
    content empty on Hindsight's strict-JSON prompts, so
    fact-extraction silently failed and total_nodes stayed at 0.
    Default switched to the GATE model.
  • docker-compose.yml substituted ${HF_ENDPOINT:-} into the
    container env. Empty-string substitution made huggingface_hub
    raise httpx.UnsupportedProtocol: Request URL is missing a protocol on cold start. Now loaded via env_file: so absent
    stays absent (huggingface_hub falls back to its built-in
    default). China users opt in by adding
    HF_ENDPOINT=https://hf-mirror.com to ~/.memexa/.env.
  • memexa backend up no longer leaks stale HINDSIGHT_API_LLM_*
    shell exports into the compose process — they were silently
    shadowing ~/.memexa/.env.

Ingestion — extract → POST → query

  • _normalize_llm_card now enum-coerces every confidence value
    (numeric 0.85, English "high" / "low", Chinese "确定" /
    "模糊", bool, None) to a canonical Hindsight enum. The
    four confidence fields (TimeResolution.confidence +
    Entity.resolution_confidence 4-value;
    IdentityAssertion.confidence + RelationAssertion.confidence
    3-value) are all handled. Demo dataset went from 6/18 POST OK
    → 18/18.
  • _normalize_llm_card ISO-coerces anchor_message_ts (LLM
    sometimes emits a bare date "2026-05-13"). The
    when_start_default capture also runs after coercion so the
    fallback no longer inherits a bare date.
  • _build_wechat_prompt_from_messages no longer silently drops
    every non-text message in a real WeChatMsg export. Image /
    sticker / video / voice / location / appmsg / sysmsg all
    survive as [图片] / [表情] / [视频] / etc. placeholders,
    with <title> extracted from Type=49 appmsg XML. 30-60% of
    a real chat history that was being lost is now preserved.
  • _build_wechat_prompt_from_messages propagates IsSender=1
    per-msg is_self_message hint + batch-level n_self_msgs /
    n_other_msgs / is_solo_self / is_self_chat. Downstream
    §SELF_NOTE_MODE now anchors commitments to the user's own
    utterances.

Tests

  • 25 new unit tests added: tests/unit/test_confidence_sanitizer.py
    (57 parametrised cases covering numeric / English / Chinese /
    bool / None / canonical-4 / canonical-3 enum semantics);
    tests/unit/test_wechat_msg_adapter.py (25 cases across 10
    msg-type codes + IsSender semantics + field aliases).
  • Full pytest: 140 passed, 2 skipped (both pre-existing
    prompt-drift tests, queued for the next prompt-maintenance pass).

LIVE verification matrix

Win 11 (Docker Desktop) Mac Studio (OrbStack) USTC Ubuntu 22.04
pip install
memexa demo
memexa init + wizards
memexa backend up ✅ (mihomo + HF mirror) ❌ docker.io blocked by campus firewall (env, not memexa)
Real QQ IMAP ingest ✅ N=10 queryable cards (same code path) blocked above
Real USTC IMAP ingest ✅ N=10 (deduped w/ QQ) (same code path) blocked above
WeChat demo ingest ✅ 18/18 POST ✅ 12/12 blocked above
WeChat real-schema ✅ 7 msgs → 3 cards (same code path) blocked above
Mixed msg types ✅ 6/6 survive (same code path) blocked above
memexa doctor ✅ 4 green ✅ 4 green ✅ graceful w/o backend

Known gaps deferred to v0.1.2

  • Other IMAP providers (Gmail / Outlook / iCloud / 163 / 126 / Yeah /
    Hotmail / Sina / Live) — wizards have correct host/port/auth-type
    hints, but only QQ and USTC have been LIVE-fetched against. Other
    providers should work but are not LIVE-attested for this release.
  • Large-volume ingest (>500 batches) — small dataset proven; rate-
    limit / dead-letter back-pressure / memory behavior at scale not
    yet stress-tested.
  • WeChatMsg-from-the-real-tool — adapter is reverse-engineered from
    WeChatMsg's documented schema (Type / IsSender / StrContent /
    CreateTime / NickName / StrTalker) but a real WeChatMsg
    release binary was not in this verification pass. Field aliases
    cover the common emissions; truly novel field names would need a
    v0.1.2 adapter patch.
  • Hindsight async-consolidation transparency — memexa ingest shows
    dead-letter: N when verify-after-POST sees total=0, but the
    cards are usually in the document store and just waiting for the
    background consolidator. v0.1.2 should auto-trigger
    /consolidate and poll.

Install & upgrade

# Fresh install:
pip install memexa==0.1.1

# Upgrade from v0.1.0:
pip install -U memexa

For full quickstart (Tier 0 / Tier 1 / Tier 2), see
docs/quickstart.md.

feat/v0.1.1-onboarding PR #18 · 15 commits · 140 pytest pass · 18/18 CI green

v0.1.0 — first stable release

17 May 04:08
7b6f796

Choose a tag to compare

v0.1.0 — first stable release

Install: pip install memexa (no more --pre flag).

One-minute demo (no Docker, no LLM key, no config):

pip install memexa
memexa demo

What's in this release

Stable cut from rc4. The single goal of this release was to close the
six say-do gaps that the post-rc4 audit found between what the README
/ ROADMAP / CHANGELOG said the project did and what it actually
did
in a fresh-install venv on three platforms.

What works today on your own data

Source Path Status
Email IMAP — 10 min setup ✅ Win/macOS/Linux
Browser Chrome / Firefox SQLite history ✅ Win/macOS/Linux
Claude Code ~/.claude/projects/*/conversations.jsonl ✅ Win/macOS/Linux
Audio Recorder → Whisper / SenseVoice → JSON ✅ Win/macOS/Linux
WeChat WeChatMsg / wechatDataBackup / PyWxDump export → builder ⚠ Windows only (upstream tool constraint)
QQ NapCat disabled (Tencent ban wave); db-only adapter pending v0.2 migration ⚠ manual file copy required

Full per-source status + recommended first-day order: see
docs/quickstart.md Tier 3.

Bugs fixed since rc4

  • memexa quick "X" --json (subcommand-level position) now works.
    rc4 was rejecting it at argparse with unrecognized arguments: --json. Reproduced live on Win 11, macOS 13.12, Ubuntu 22.04
    before the fix landed.
  • memexa quick "X" now exits 1 with an English stderr hint when
    the backend is unreachable, instead of silently returning N=0
    • exit 0. Agents subprocess-invoking memexa rely on exit codes.

Documentation corrections since rc4

  • docs/quickstart.md Tier 0 expected demo output now matches what
    memexa demo actually prints, byte-for-byte (per-source counts
    were all wrong in rc4).
  • macOS Python 3.9 install gap (stock macOS Python is below 3.10
    project minimum) now surfaced as an explicit warning above the
    install command, with brew install python@3.11 instructions.
  • ROADMAP / CHANGELOG aligned to actual state.
  • All 49 English docs now have Chinese mirrors (.zh.md).

Stable-cut transparency

Two of three ROADMAP §[0.1.0] release gates are unmet at the time of
this cut (no non-author PR yet, rc4 critical-fix cooldown not yet 7
days old). Cutting anyway because all six say-do gaps are closed and
LIVE-verified on three platforms; the two unmet gates are de-facto
signals (community velocity, soak time) rather than de-jure
correctness signals. v0.1.0 ships with a documented "Known
limitations" section in both quickstart Tier 3 and ROADMAP Shipped
so users hit the honesty wall on the docs page, not in their
workflow. See CHANGELOG.md §0.1.0
for full rationale.

Verification at cut time

  • Win 11 + Python 3.11.9 fresh venv: pip install memexa + memexa demo → exit 0
  • macOS + Python 3.13.12 miniforge fresh venv: same
  • Linux Ubuntu 22.04 + Python 3.10.12 fresh venv: same
  • pytest: 58 passed / 2 skipped (prompt-drift; tracked)
  • CI matrix: 8/8 OS × Python combinations green
  • Bilingual coverage: 49 / 49 / 0 missing

What's next

  • v0.1.x patch line: opens when the first non-author issue or
    PR lands.
  • v0.2: workflow-spec templates + QQ db-only adapter migration
    from upstream JARVIS.
  • v0.3: 飞书 / 钉钉 / WeChat PC backup adapters; local document
    source; --embedded backend mode.

Full roadmap: ROADMAP.md.

🤖 Released via Claude Code

v0.1.0-rc4 — bundle demo dataset + dynamic __version__

16 May 14:57
d0e37b2

Choose a tag to compare

Patch on top of rc3 fixing two release-blocker bugs discovered by fresh-venv install verification.

What's fixed since rc3

  • `memexa demo` no longer fails with `ModuleNotFoundError: No module named 'examples.demo_dataset'`. The pyproject wheel-include rule now bundles `examples/demo_dataset/` (the ingest module + 6 source JSON/JSONL files).
  • `memexa version` now reports `0.1.0rc4` instead of the stale `0.1.0a0`. The version is read dynamically from installed-package metadata so it cannot drift again.

Both bugs were caused by configurations that worked in editable / source-tree installs but failed in fresh-wheel installs from PyPI — the exact scenario every new memexa user encounters. The internal release pre-flight checklist now requires a fresh-venv wheel install + run smoke test as Item 7.

Recommended install

```bash
pip install --pre memexa==0.1.0rc4
memexa demo # 30-second onboarding
memexa version # → memexa 0.1.0rc4
```

rc3 disposition

rc3 remains on PyPI (cannot be deleted). `pip install --pre memexa` resolves to rc4 automatically. Direct `pip install memexa==0.1.0rc3` still installs rc3 with the broken `memexa demo` — please use rc4.

Other content

All v0.1.0-rc3 content (agent-first brand consolidation, `--json` mode, 3-tier quickstart, docs/why.md + docs/cost.md) is included unchanged. See v0.1.0-rc3 release notes for the feature list.

Pull request: #16.

🤖 Generated with Claude Code

v0.1.0-rc3 — agent-first brand consolidation + memexa demo + --json mode

16 May 14:21
9c8119a

Choose a tag to compare

Third release candidate on the v0.1.0 line. Closes out the first-experience path for both human users and AI agents.

Note on release timing: An initial v0.1.0-rc3 release on 2026-05-16 13:59 UTC failed to publish to PyPI because pyproject.toml version had not been bumped from rc2 to rc3 — twine silently skipped the upload via `--skip-existing`. PR #15 corrected the version field and added a pre-flight checklist to internal protocol to prevent recurrence. This release supersedes that one.

Highlights

Agent-first positioning clarified. memexa is a memory backend that AI agents (Claude Code, Cursor, Cline) invoke as a subprocess on a human user's behalf. Direct CLI use is secondary. See docs/why.md for the per-capability comparison vs OpenHuman / MemPalace / ReMe and the agent-first design rationale.

New features

  • `memexa demo` — thirty-second onboarding. `pip install --pre memexa && memexa demo` ingests a synthetic dataset with the stub extractor and runs five sample queries in the terminal. No Docker, no LLM API key, no configuration.
  • `--json` output mode for all fourteen query subcommands. Top-level flag short-circuits text rendering and emits the raw return value as JSON. Designed for AI agents invoking memexa via shell subprocess.

New docs

  • docs/why.md (bilingual) — comparison matrix, agent-first design rationale, glossary covering V2 envelope / reflow / Chinese-IM reflow / audio + voice reflow / workflow spec terms.
  • docs/cost.md (bilingual) — DeepSeek / GPT-4o / Claude monthly cost estimates with three-tier user profiles. Recommended Flash-gate + Pro-extract combination at ¥0.30 per 1 000 messages.

Brand and roadmap consolidation

  • README first screen restored to include the AI-agent compatible by design paragraph and a dual-path Quickstart (humans run `memexa demo`, agents call subcommands with `--json`).
  • `docs/quickstart.md` Tier 0 expanded with the agent subprocess path.
  • ROADMAP v0.2 redefined from "Python deliverable code + CLI subcommands" to Markdown workflow specs under `docs/templates/`. v0.5 promotes the subprocess path to a native MCP server. v0.7 formalises user-authored specs. New v0.8+ section for optional desktop GUI exploration.
  • Makefile lint / format paths corrected to `memexa tests` (residue from PR #9's src→memexa rename).

Tests

  • 8 new tests across two files (`tests/integration/test_demo_subcommand.py` and `tests/integration/test_subcmd_json_output.py`). Full pytest passes 58 / 2 pre-existing skips.

Stats

Pull requests: #14 (feature content, 16 files / +1994 / -1177) and #15 (version bump fix, 1 file / +1 / -1).

Install

```bash
pip install --pre memexa==0.1.0rc3
memexa demo
```

🤖 Generated with Claude Code

v0.1.0-rc2 — install bug fix

14 May 15:56
1e57aef

Choose a tag to compare

Pre-release

Patch over v0.1.0-rc1

Fixes the v0.1.0-rc1 install bug where import memexa raised ModuleNotFoundError because the package was installed under a literal src/ namespace.

Changed

  • Full source-tree rename src/memexa/ (289 files, 1234 substitutions)
  • pyproject.toml: packages.find include = ["memexa*"] (was ["src*"])
  • pyproject.toml: add authors = [{name = "labazhou2024"}] + maintainers (was empty in rc1, made pip show memexa look orphan)
  • Bump version 0.1.0rc1 → 0.1.0rc2

Verification

pip install --pre memexa==0.1.0rc2
python -c "import memexa; print(memexa.__file__)"   # OK
python -c "from memexa.core import memory_query"    # OK
memexa --help                                       # OK

Why use this over rc1

  • rc1 has the import src anti-pattern that will not be fixed in place; rc2 supersedes it
  • All future development continues on the memexa package layout

For the full project pitch, scope, limitations, and roadmap, see v0.1.0-rc1 release notes (still applies — only the install layout changed).

v0.1.0-rc1 — first public release candidate

14 May 15:24
c583d3a

Choose a tag to compare

v0.1.0-rc1 — first public release candidate

Status: Release Candidate. Open for feedback. Cut to v0.1.0 stable after ≥1 week of green CI on Win + macOS + Linux + non-trivial third-party install report.

memexa (镜我) is a self-hosted, Chinese-first personal memory graph. It ingests six categories of everyday data, runs them through a two-LLM gate-extract pipeline, stores entities + relationships + temporal evidence in PostgreSQL + pgvector, and exposes a 14-subcommand CLI plus a five-phase state inference workflow.

The Pensieve, in code. Pour the memories scattered across your life into a basin, rearrange them around the question you have right now, walk away with something usable.


What you get in rc1

Six ingestion sources

WeChat · QQ (experimental — see below) · Email · Browser history · Claude Code chats · Audio (microphone + ASR)

Two-LLM gate-extract pipeline

  • Stage A — gatekeeper LLM (HIGH / MEDIUM / LOW filter)
  • Stage B — extractor LLM (V2 envelope JSON)
  • Stage C — BGE-M3 quorum + DeepSeek-style arbiter
  • Stage D — POST → memory_full_v5 PostgreSQL bank

14 query subcommands, three tiers

  • Basic: quick, topic, arc, timeline, person, project, pending, reflect
  • Advanced: types, graph-walk, summary, trends
  • Composite: session-context, task-brief

Five-phase state inference

A query workflow (docs/5_phase_query.md) for going from fuzzy human input ("what was that thing my advisor mentioned in March?") to a concrete answer in <60 seconds.

Live dashboard on :8765

7 panels: Win/Mac/server CPU+GPU, API usage, memory system health, cron health, graph queries, audio pipeline progress.

Cron orchestrator

  • Per-source drivers (backfill_v5_<source>_driver.py)
  • Dead-letter queue with retry budgets
  • PG-aware pending detector (skips already-processed batches)

AI-agent first-class (the differentiator)

Most real users invoke memexa through Claude Code, Cursor, or Cline rather than typing subcommands by hand. The 14 subcommands are a small protocol; the agent contract is docs/for_agents.md:

  • 7 hard rules for query selection
  • Decision table mapping user intent → subcommand
  • Composition patterns for multi-source questions
  • Common pitfalls catalog (e.g. don't use topic on names, use arc)

Engineering polish

  • Apache 2.0
  • 3-layer abstraction (_path_resolver / _user_aliases / _user_identity) — clean enough that no PII leaks survived the sanitizer
  • 50 passing tests (2 skipped due to known prompt drift, scheduled for v0.2)
  • 309/309 .py syntax PASS
  • CI: pytest × Python 3.10 / 3.11 / 3.12 + CodeQL + Security + dependabot + Release Drafter
  • Pre-commit PII scan hook
  • Branch protection on main (required CI + 1 review)

Documentation

  • Bilingual everywhere — every user-facing .md ships with a .zh.md mirror
  • 5 walkthroughs against a synthetic demo dataset (Alice / Bob / Carol)
  • 2 case studies (lab-report pipeline, 5-minute meeting brief)
  • 6 lessons-learned narratives (~2.7k lines bilingual)
  • Per-OS deployment guides (Windows, macOS, Linux, docker-compose)

Installation (rc1)

# pip install not yet on PyPI; install from GitHub:
pip install git+https://github.com/labazhou2024/memexa.git@v0.1.0-rc1

# Or clone for the demo dataset + walkthroughs:
git clone https://github.com/labazhou2024/memexa.git
cd memexa
pip install -e ".[dev]"
memexa init
docker compose -f docker-compose.example.yml up -d
python -m examples.demo_dataset.ingest --dry-run
memexa doctor
memexa quick "Alice"

Full quickstart: docs/quickstart.md


Known limitations (be honest)

# Limitation
1 PyPI not yet wiredpip install memexa returns nothing; install from GitHub for now.
2 QQ source: dual-track. The recommended db-only path (direct nt_msg.db SQLCipher v4 read — sends no protocol packets, indistinguishable from any chat-history backup tool) is LIVE in upstream JARVIS (jarvis/qq_db.py, 762 lines, standard library only) and scheduled for OSS migration in v0.2. The NapCat / OneBot adapter shipped with rc1 is deprecated and disabled by default after a 2025-09-05 NapCat public-OneBot incident caused Tencent to batch-ban every QQ that ever ran NapCat / LiteLoaderQQNT — the maintainer's own QQ was caught in this wave on 2026-05-14. Until v0.2 lands, OSS v0.1.x users either (a) copy jarvis/qq_db.py in by hand or (b) use the clipboard fallback (also LIVE upstream, also v0.2). See docs/integrations/qq.md.
3 Mac / Linux fresh-clone smoke test is not in CI on every push yet. Scheduled for v0.1.0 stable.
4 Deliverable factory (auto lab report / action card / weekly brief) is on the v0.2 roadmap. v0.1 ships the building blocks (case studies show the manual composition).
5 Two test files skipped — prompt drift after rewrites; queued for v0.2 prompt-maintenance pass.
6 CLI ruff coverage narrow (E9 only for rc1) — full ruleset gates on a dedicated make fmt pass scheduled for v0.2.

Why this exists

The Chinese personal-data ecosystem is a six-silo patchwork (WeChat + QQ + email + browser + AI chats + audio), and every existing personal-knowledge tool either (a) demands you re-enter everything by hand, (b) only handles one silo, or (c) ships English-first prompts that mis-extract Chinese context.

memexa was built bottom-up to extract from real Chinese chat / email / voice data, with prompts tuned for the way Chinese people actually write. The two-LLM gate-extract is not a marketing line — without it the gatekeeper either lets too much trash through (single-LLM cheap) or rejects good evidence (single-LLM strict). The 14 subcommands are not a SQL DSL — they are six months of "what did the user actually ask?" condensed into a small protocol an LLM agent can compose.


Roadmap

See ROADMAP.md. Highlights:

  • v0.2 — Deliverable factory: memexa lab-report, memexa action-card, memexa brief <person>, memexa weekly. PyPI release. Full CI matrix (Win + Mac + Linux × Py 3.10–3.12). Prompt-maintenance pass.
  • v0.5 — Optional paid API endpoint (OpenAI-style, billed per token). The OSS core remains fully usable forever — paid is an upgrade path, never a gate.
  • v1.0 — Schema commitment + QQ db-only path productionized.

How to help

Star, file an issue (even if it's "your README has a typo"), open a discussion, or push a PR. The .audit/PLACEHOLDER_INVENTORY.md lists the 25 modules still flagged TODO(memgraph-oss) for _path_resolver migration — easy first contributions. See CONTRIBUTING.md.

If you ship an AI agent that needs a Chinese-data memory layer, docs/for_agents.md is your starting point — drop in your own subcommand list and we'll co-evolve the protocol.


Acknowledgements

memexa stands on pgvector, BGE-M3, DeepSeek, Hindsight, Silero VAD, mlx-whisper, SenseVoice / FunASR, and the engineering tradition of Pensieve-shaped tools that came before it (Logseq, mem.ai, dnote, recoll, Notational Velocity). Without prior work in retrieval-augmented memory + Chinese NLP this would not exist.


Full Changelog: https://github.com/labazhou2024/memexa/commits/v0.1.0-rc1