17 May 15:20

labazhou2024

88fdf6a

v0.1.1 — onboarding wizards + 13 fresh-user fixes Latest

Latest

v0.1.1 — onboarding wizards + 13 fresh-user fixes

v0.1.0 shipped the new project name (Memexa) on PyPI but the email
ingest path turned out to be broken (hard-coded to maintainer-specific
account names that did not exist in the OSS package). v0.1.1 fixes
that and rewrites onboarding around three interactive wizards, then
adds 13 more fixes caught when the whole flow was re-played from a
fresh-user perspective across Win 11 + Mac Studio + USTC Linux, with
real IMAP credentials (QQ + USTC Exmail-reverse-proxy) and real
WeChatMsg-schema JSON.

TL;DR for a brand-new user

pip install memexa==0.1.1
memexa demo                                 # Tier 0 — see it works
memexa init                                 # scaffold ~/.memexa/
memexa init llm                             # 4 providers; DeepSeek / OpenAI / Qwen / custom
memexa init email                           # 12+ IMAP providers auto-detected
memexa backend up                           # docker compose pg + Hindsight
memexa ingest email                         # IMAP → batch → LLM extract → POST
memexa quick "<question>"                   # see your own messages back

New CLI (public)

memexa init                  # legacy scaffold (templates ship in wheel)
memexa init llm              # LLM provider wizard (4 providers)
memexa init email            # IMAP wizard (12+ providers auto-detected)
memexa init wechat           # WeChatMsg export wizard (Windows-only)
memexa backend up            # docker compose -f ~/.memexa/docker-compose.yml up -d
memexa backend status        # docker ps + curl /health
memexa backend down          # compose down
memexa ingest email          # fetch IMAP for all configured accounts
memexa ingest wechat         # read WeChatMsg export dir → builder → extract → POST

docs/quickstart.md walks through all of these end-to-end.

Highlights

Critical fix carried from v0.1.0

memexa/extraction/email_history_fetcher.py was hard-coded to two
maintainer-specific account names (qq_email, ustc_email) and tried
to import memexa.qq_email / memexa.ustc_email — modules that
do not exist in the OSS package. v0.1.0 PyPI users who followed
the docs got ModuleNotFoundError. v0.1.1 rewrites the fetcher as a
generic IMAP client (stdlib imaplib + email.parser), reads
email.accounts.<name> from ~/.memexa/identity.yaml, supports
multiple accounts.

13 fresh-user blockers (re-verify pass)

Onboarding — install + init

memexa init shipped without the example templates in the wheel.
Fresh user got 3 [warn] template missing warnings and an empty
~/.memexa/. Templates now ship under memexa/templates/.
memexa init llm crashed on Chinese-locale Windows console with
UnicodeEncodeError: 'gbk' codec can't encode character '\xa5'
(the ¥ symbol in the DeepSeek provider note). CLI entry now
reconfigures stdio to UTF-8.
memexa init email for USTC mail printed the wrong host hint
(imap.exmail.qq.com). That endpoint rate-limits / locks new
logins. The right host is mail.ustc.edu.cn:993, LIVE-verified.

Backend — memexa backend up / memexa doctor

memexa doctor probed /healthz but Hindsight serves /health.
memexa doctor LLM probe double-prefixed /v1, hitting
/v1/v1/chat/completions → 404. Now detects /v1 already present.
memexa doctor read a non-existent nodes field, always
reported "0 nodes" on a populated bank.
memexa backend up polled with a 60s timeout — too short for a
cold BGE-M3 load. Bumped to 180s.
docker-compose.yml routed HINDSIGHT_API_LLM_MODEL to the
EXTRACT model. Reasoning-class models (deepseek-v4-flash-ascend,
qwen-reasoner) emit content in reasoning_content and leave
content empty on Hindsight's strict-JSON prompts, so
fact-extraction silently failed and total_nodes stayed at 0.
Default switched to the GATE model.
docker-compose.yml substituted ${HF_ENDPOINT:-} into the
container env. Empty-string substitution made huggingface_hub
raise httpx.UnsupportedProtocol: Request URL is missing a protocol on cold start. Now loaded via env_file: so absent
stays absent (huggingface_hub falls back to its built-in
default). China users opt in by adding
HF_ENDPOINT=https://hf-mirror.com to ~/.memexa/.env.
memexa backend up no longer leaks stale HINDSIGHT_API_LLM_*
shell exports into the compose process — they were silently
shadowing ~/.memexa/.env.

Ingestion — extract → POST → query

_normalize_llm_card now enum-coerces every confidence value
(numeric 0.85, English "high" / "low", Chinese "确定" /
"模糊", bool, None) to a canonical Hindsight enum. The
four confidence fields (TimeResolution.confidence +
Entity.resolution_confidence 4-value;
IdentityAssertion.confidence + RelationAssertion.confidence
3-value) are all handled. Demo dataset went from 6/18 POST OK
→ 18/18.
_normalize_llm_card ISO-coerces anchor_message_ts (LLM
sometimes emits a bare date "2026-05-13"). The
when_start_default capture also runs after coercion so the
fallback no longer inherits a bare date.
_build_wechat_prompt_from_messages no longer silently drops
every non-text message in a real WeChatMsg export. Image /
sticker / video / voice / location / appmsg / sysmsg all
survive as [图片] / [表情] / [视频] / etc. placeholders,
with <title> extracted from Type=49 appmsg XML. 30-60% of
a real chat history that was being lost is now preserved.
_build_wechat_prompt_from_messages propagates IsSender=1 →
per-msg is_self_message hint + batch-level n_self_msgs /
n_other_msgs / is_solo_self / is_self_chat. Downstream
§SELF_NOTE_MODE now anchors commitments to the user's own
utterances.

Tests

25 new unit tests added: tests/unit/test_confidence_sanitizer.py
(57 parametrised cases covering numeric / English / Chinese /
bool / None / canonical-4 / canonical-3 enum semantics);
tests/unit/test_wechat_msg_adapter.py (25 cases across 10
msg-type codes + IsSender semantics + field aliases).
Full pytest: 140 passed, 2 skipped (both pre-existing
prompt-drift tests, queued for the next prompt-maintenance pass).

LIVE verification matrix

	Win 11 (Docker Desktop)	Mac Studio (OrbStack)	USTC Ubuntu 22.04
pip install	✅	✅	✅
`memexa demo`	✅	✅	✅
`memexa init` + wizards	✅	✅	✅
`memexa backend up`	✅	✅ (mihomo + HF mirror)	❌ docker.io blocked by campus firewall (env, not memexa)
Real QQ IMAP ingest	✅ N=10 queryable cards	(same code path)	blocked above
Real USTC IMAP ingest	✅ N=10 (deduped w/ QQ)	(same code path)	blocked above
WeChat demo ingest	✅ 18/18 POST	✅ 12/12	blocked above
WeChat real-schema	✅ 7 msgs → 3 cards	(same code path)	blocked above
Mixed msg types	✅ 6/6 survive	(same code path)	blocked above
`memexa doctor`	✅ 4 green	✅ 4 green	✅ graceful w/o backend

Known gaps deferred to v0.1.2

Other IMAP providers (Gmail / Outlook / iCloud / 163 / 126 / Yeah /
Hotmail / Sina / Live) — wizards have correct host/port/auth-type
hints, but only QQ and USTC have been LIVE-fetched against. Other
providers should work but are not LIVE-attested for this release.
Large-volume ingest (>500 batches) — small dataset proven; rate-
limit / dead-letter back-pressure / memory behavior at scale not
yet stress-tested.
WeChatMsg-from-the-real-tool — adapter is reverse-engineered from
WeChatMsg's documented schema (Type / IsSender / StrContent /
CreateTime / NickName / StrTalker) but a real WeChatMsg
release binary was not in this verification pass. Field aliases
cover the common emissions; truly novel field names would need a
v0.1.2 adapter patch.
Hindsight async-consolidation transparency — memexa ingest shows
dead-letter: N when verify-after-POST sees total=0, but the
cards are usually in the document store and just waiting for the
background consolidator. v0.1.2 should auto-trigger
/consolidate and poll.

Install & upgrade

# Fresh install:
pip install memexa==0.1.1

# Upgrade from v0.1.0:
pip install -U memexa

For full quickstart (Tier 0 / Tier 1 / Tier 2), see
docs/quickstart.md.

— feat/v0.1.1-onboarding PR #18 · 15 commits · 140 pytest pass · 18/18 CI green

Assets 2

17 May 04:08

labazhou2024

v0.1.0

7b6f796

v0.1.0 — first stable release

Install: pip install memexa (no more --pre flag).

One-minute demo (no Docker, no LLM key, no config):

pip install memexa
memexa demo

What's in this release

Stable cut from rc4. The single goal of this release was to close the
six say-do gaps that the post-rc4 audit found between what the README
/ ROADMAP / CHANGELOG said the project did and what it actually
did in a fresh-install venv on three platforms.

What works today on your own data

Source	Path	Status
Email	IMAP — 10 min setup	✅ Win/macOS/Linux
Browser	Chrome / Firefox SQLite history	✅ Win/macOS/Linux
Claude Code	`~/.claude/projects/*/conversations.jsonl`	✅ Win/macOS/Linux
Audio	Recorder → Whisper / SenseVoice → JSON	✅ Win/macOS/Linux
WeChat	WeChatMsg / wechatDataBackup / PyWxDump export → builder	⚠ Windows only (upstream tool constraint)
QQ	NapCat disabled (Tencent ban wave); db-only adapter pending v0.2 migration	⚠ manual file copy required

Full per-source status + recommended first-day order: see
docs/quickstart.md Tier 3.

Bugs fixed since rc4

memexa quick "X" --json (subcommand-level position) now works.
rc4 was rejecting it at argparse with unrecognized arguments: --json. Reproduced live on Win 11, macOS 13.12, Ubuntu 22.04
before the fix landed.
memexa quick "X" now exits 1 with an English stderr hint when
the backend is unreachable, instead of silently returning N=0
- exit 0. Agents subprocess-invoking memexa rely on exit codes.

Documentation corrections since rc4

docs/quickstart.md Tier 0 expected demo output now matches what
memexa demo actually prints, byte-for-byte (per-source counts
were all wrong in rc4).
macOS Python 3.9 install gap (stock macOS Python is below 3.10
project minimum) now surfaced as an explicit warning above the
install command, with brew install python@3.11 instructions.
ROADMAP / CHANGELOG aligned to actual state.
All 49 English docs now have Chinese mirrors (.zh.md).

Stable-cut transparency

Two of three ROADMAP §[0.1.0] release gates are unmet at the time of
this cut (no non-author PR yet, rc4 critical-fix cooldown not yet 7
days old). Cutting anyway because all six say-do gaps are closed and
LIVE-verified on three platforms; the two unmet gates are de-facto
signals (community velocity, soak time) rather than de-jure
correctness signals. v0.1.0 ships with a documented "Known
limitations" section in both quickstart Tier 3 and ROADMAP Shipped
so users hit the honesty wall on the docs page, not in their
workflow. See CHANGELOG.md §0.1.0
for full rationale.

Verification at cut time

Win 11 + Python 3.11.9 fresh venv: pip install memexa + memexa demo → exit 0
macOS + Python 3.13.12 miniforge fresh venv: same
Linux Ubuntu 22.04 + Python 3.10.12 fresh venv: same
pytest: 58 passed / 2 skipped (prompt-drift; tracked)
CI matrix: 8/8 OS × Python combinations green
Bilingual coverage: 49 / 49 / 0 missing

What's next

v0.1.x patch line: opens when the first non-author issue or
PR lands.
v0.2: workflow-spec templates + QQ db-only adapter migration
from upstream JARVIS.
v0.3: 飞书 / 钉钉 / WeChat PC backup adapters; local document
source; --embedded backend mode.

Full roadmap: ROADMAP.md.

🤖 Released via Claude Code

Assets 2

16 May 14:57

labazhou2024

v0.1.0-rc4

d0e37b2

v0.1.0-rc4 — bundle demo dataset + dynamic __version__ Pre-release

Pre-release

Patch on top of rc3 fixing two release-blocker bugs discovered by fresh-venv install verification.

What's fixed since rc3

`memexa demo` no longer fails with `ModuleNotFoundError: No module named 'examples.demo_dataset'`. The pyproject wheel-include rule now bundles `examples/demo_dataset/` (the ingest module + 6 source JSON/JSONL files).
`memexa version` now reports `0.1.0rc4` instead of the stale `0.1.0a0`. The version is read dynamically from installed-package metadata so it cannot drift again.

Both bugs were caused by configurations that worked in editable / source-tree installs but failed in fresh-wheel installs from PyPI — the exact scenario every new memexa user encounters. The internal release pre-flight checklist now requires a fresh-venv wheel install + run smoke test as Item 7.

Recommended install

```bash
pip install --pre memexa==0.1.0rc4
memexa demo # 30-second onboarding
memexa version # → memexa 0.1.0rc4
```

rc3 disposition

rc3 remains on PyPI (cannot be deleted). `pip install --pre memexa` resolves to rc4 automatically. Direct `pip install memexa==0.1.0rc3` still installs rc3 with the broken `memexa demo` — please use rc4.

Highlights

Agent-first positioning clarified. memexa is a memory backend that AI agents (Claude Code, Cursor, Cline) invoke as a subprocess on a human user's behalf. Direct CLI use is secondary. See docs/why.md for the per-capability comparison vs OpenHuman / MemPalace / ReMe and the agent-first design rationale.

New features

`memexa demo` — thirty-second onboarding. `pip install --pre memexa && memexa demo` ingests a synthetic dataset with the stub extractor and runs five sample queries in the terminal. No Docker, no LLM API key, no configuration.
`--json` output mode for all fourteen query subcommands. Top-level flag short-circuits text rendering and emits the raw return value as JSON. Designed for AI agents invoking memexa via shell subprocess.

New docs

docs/why.md (bilingual) — comparison matrix, agent-first design rationale, glossary covering V2 envelope / reflow / Chinese-IM reflow / audio + voice reflow / workflow spec terms.
docs/cost.md (bilingual) — DeepSeek / GPT-4o / Claude monthly cost estimates with three-tier user profiles. Recommended Flash-gate + Pro-extract combination at ¥0.30 per 1 000 messages.

Brand and roadmap consolidation

README first screen restored to include the AI-agent compatible by design paragraph and a dual-path Quickstart (humans run `memexa demo`, agents call subcommands with `--json`).
`docs/quickstart.md` Tier 0 expanded with the agent subprocess path.
ROADMAP v0.2 redefined from "Python deliverable code + CLI subcommands" to Markdown workflow specs under `docs/templates/`. v0.5 promotes the subprocess path to a native MCP server. v0.7 formalises user-authored specs. New v0.8+ section for optional desktop GUI exploration.
Makefile lint / format paths corrected to `memexa tests` (residue from PR #9's src→memexa rename).

Tests

8 new tests across two files (`tests/integration/test_demo_subcommand.py` and `tests/integration/test_subcmd_json_output.py`). Full pytest passes 58 / 2 pre-existing skips.

Stats

Pull requests: #14 (feature content, 16 files / +1994 / -1177) and #15 (version bump fix, 1 file / +1 / -1).

Install

```bash
pip install --pre memexa==0.1.0rc3
memexa demo
```

🤖 Generated with Claude Code

Assets 2

14 May 15:56

labazhou2024

v0.1.0-rc2

1e57aef

v0.1.0-rc2 — install bug fix Pre-release

Pre-release

Patch over v0.1.0-rc1

Fixes the v0.1.0-rc1 install bug where import memexa raised ModuleNotFoundError because the package was installed under a literal src/ namespace.

Changed

Full source-tree rename src/ → memexa/ (289 files, 1234 substitutions)
pyproject.toml: packages.find include = ["memexa*"] (was ["src*"])
pyproject.toml: add authors = [{name = "labazhou2024"}] + maintainers (was empty in rc1, made pip show memexa look orphan)
Bump version 0.1.0rc1 → 0.1.0rc2

Verification

pip install --pre memexa==0.1.0rc2
python -c "import memexa; print(memexa.__file__)"   # OK
python -c "from memexa.core import memory_query"    # OK
memexa --help                                       # OK

Why use this over rc1

rc1 has the import src anti-pattern that will not be fixed in place; rc2 supersedes it
All future development continues on the memexa package layout

For the full project pitch, scope, limitations, and roadmap, see v0.1.0-rc1 release notes (still applies — only the install layout changed).

Assets 2

14 May 15:24

labazhou2024

v0.1.0-rc1

c583d3a

v0.1.0-rc1 — first public release candidate Pre-release

Pre-release

v0.1.0-rc1 — first public release candidate

Status: Release Candidate. Open for feedback. Cut to v0.1.0 stable after ≥1 week of green CI on Win + macOS + Linux + non-trivial third-party install report.

memexa (镜我) is a self-hosted, Chinese-first personal memory graph. It ingests six categories of everyday data, runs them through a two-LLM gate-extract pipeline, stores entities + relationships + temporal evidence in PostgreSQL + pgvector, and exposes a 14-subcommand CLI plus a five-phase state inference workflow.

The Pensieve, in code. Pour the memories scattered across your life into a basin, rearrange them around the question you have right now, walk away with something usable.

What you get in rc1

Six ingestion sources

WeChat · QQ (experimental — see below) · Email · Browser history · Claude Code chats · Audio (microphone + ASR)

Two-LLM gate-extract pipeline

Stage A — gatekeeper LLM (HIGH / MEDIUM / LOW filter)
Stage B — extractor LLM (V2 envelope JSON)
Stage C — BGE-M3 quorum + DeepSeek-style arbiter
Stage D — POST → memory_full_v5 PostgreSQL bank

14 query subcommands, three tiers

Basic: quick, topic, arc, timeline, person, project, pending, reflect
Advanced: types, graph-walk, summary, trends
Composite: session-context, task-brief

Five-phase state inference

A query workflow (docs/5_phase_query.md) for going from fuzzy human input ("what was that thing my advisor mentioned in March?") to a concrete answer in <60 seconds.

Live dashboard on `:8765`

7 panels: Win/Mac/server CPU+GPU, API usage, memory system health, cron health, graph queries, audio pipeline progress.

Cron orchestrator

Per-source drivers (backfill_v5_<source>_driver.py)
Dead-letter queue with retry budgets
PG-aware pending detector (skips already-processed batches)

AI-agent first-class (the differentiator)

Most real users invoke memexa through Claude Code, Cursor, or Cline rather than typing subcommands by hand. The 14 subcommands are a small protocol; the agent contract is docs/for_agents.md:

7 hard rules for query selection
Decision table mapping user intent → subcommand
Composition patterns for multi-source questions
Common pitfalls catalog (e.g. don't use topic on names, use arc)

Engineering polish

Apache 2.0
3-layer abstraction (_path_resolver / _user_aliases / _user_identity) — clean enough that no PII leaks survived the sanitizer
50 passing tests (2 skipped due to known prompt drift, scheduled for v0.2)
309/309 .py syntax PASS
CI: pytest × Python 3.10 / 3.11 / 3.12 + CodeQL + Security + dependabot + Release Drafter
Pre-commit PII scan hook
Branch protection on main (required CI + 1 review)

Documentation

Bilingual everywhere — every user-facing .md ships with a .zh.md mirror
5 walkthroughs against a synthetic demo dataset (Alice / Bob / Carol)
2 case studies (lab-report pipeline, 5-minute meeting brief)
6 lessons-learned narratives (~2.7k lines bilingual)
Per-OS deployment guides (Windows, macOS, Linux, docker-compose)

Installation (rc1)

# pip install not yet on PyPI; install from GitHub:
pip install git+https://github.com/labazhou2024/memexa.git@v0.1.0-rc1

# Or clone for the demo dataset + walkthroughs:
git clone https://github.com/labazhou2024/memexa.git
cd memexa
pip install -e ".[dev]"
memexa init
docker compose -f docker-compose.example.yml up -d
python -m examples.demo_dataset.ingest --dry-run
memexa doctor
memexa quick "Alice"

Full quickstart: docs/quickstart.md

Known limitations (be honest)

#	Limitation
1	PyPI not yet wired — `pip install memexa` returns nothing; install from GitHub for now.
2	QQ source: dual-track. The recommended db-only path (direct `nt_msg.db` SQLCipher v4 read — sends no protocol packets, indistinguishable from any chat-history backup tool) is LIVE in upstream JARVIS (`jarvis/qq_db.py`, 762 lines, standard library only) and scheduled for OSS migration in v0.2. The NapCat / OneBot adapter shipped with rc1 is deprecated and disabled by default after a 2025-09-05 NapCat public-OneBot incident caused Tencent to batch-ban every QQ that ever ran NapCat / LiteLoaderQQNT — the maintainer's own QQ was caught in this wave on 2026-05-14. Until v0.2 lands, OSS v0.1.x users either (a) copy `jarvis/qq_db.py` in by hand or (b) use the clipboard fallback (also LIVE upstream, also v0.2). See docs/integrations/qq.md.
3	Mac / Linux fresh-clone smoke test is not in CI on every push yet. Scheduled for v0.1.0 stable.
4	Deliverable factory (auto lab report / action card / weekly brief) is on the v0.2 roadmap. v0.1 ships the building blocks (case studies show the manual composition).
5	Two test files skipped — prompt drift after rewrites; queued for v0.2 prompt-maintenance pass.
6	CLI ruff coverage narrow (E9 only for rc1) — full ruleset gates on a dedicated `make fmt` pass scheduled for v0.2.

Why this exists

The Chinese personal-data ecosystem is a six-silo patchwork (WeChat + QQ + email + browser + AI chats + audio), and every existing personal-knowledge tool either (a) demands you re-enter everything by hand, (b) only handles one silo, or (c) ships English-first prompts that mis-extract Chinese context.

memexa was built bottom-up to extract from real Chinese chat / email / voice data, with prompts tuned for the way Chinese people actually write. The two-LLM gate-extract is not a marketing line — without it the gatekeeper either lets too much trash through (single-LLM cheap) or rejects good evidence (single-LLM strict). The 14 subcommands are not a SQL DSL — they are six months of "what did the user actually ask?" condensed into a small protocol an LLM agent can compose.

Roadmap

See ROADMAP.md. Highlights:

v0.2 — Deliverable factory: memexa lab-report, memexa action-card, memexa brief <person>, memexa weekly. PyPI release. Full CI matrix (Win + Mac + Linux × Py 3.10–3.12). Prompt-maintenance pass.
v0.5 — Optional paid API endpoint (OpenAI-style, billed per token). The OSS core remains fully usable forever — paid is an upgrade path, never a gate.
v1.0 — Schema commitment + QQ db-only path productionized.

How to help

Star, file an issue (even if it's "your README has a typo"), open a discussion, or push a PR. The .audit/PLACEHOLDER_INVENTORY.md lists the 25 modules still flagged TODO(memgraph-oss) for _path_resolver migration — easy first contributions. See CONTRIBUTING.md.

If you ship an AI agent that needs a Chinese-data memory layer, docs/for_agents.md is your starting point — drop in your own subcommand list and we'll co-evolve the protocol.

Acknowledgements

memexa stands on pgvector, BGE-M3, DeepSeek, Hindsight, Silero VAD, mlx-whisper, SenseVoice / FunASR, and the engineering tradition of Pensieve-shaped tools that came before it (Logseq, mem.ai, dnote, recoll, Notational Velocity). Without prior work in retrieval-augmented memory + Chinese NLP this would not exist.

Full Changelog: https://github.com/labazhou2024/memexa/commits/v0.1.0-rc1

Assets 2

Releases: labazhou2024/memexa

v0.1.1 — onboarding wizards + 13 fresh-user fixes

TL;DR for a brand-new user

New CLI (public)

Highlights

Critical fix carried from v0.1.0

13 fresh-user blockers (re-verify pass)

Tests

LIVE verification matrix

Known gaps deferred to v0.1.2

Install & upgrade

Uh oh!

v0.1.0 — first stable release

v0.1.0 — first stable release

What's in this release

What works today on your own data

Bugs fixed since rc4

Documentation corrections since rc4

Stable-cut transparency

Verification at cut time

What's next

Uh oh!

v0.1.0-rc4 — bundle demo dataset + dynamic __version__

What's fixed since rc3

Recommended install

rc3 disposition

Other content

Uh oh!

v0.1.0-rc3 — agent-first brand consolidation + memexa demo + --json mode

Highlights

New features

New docs

Brand and roadmap consolidation

Tests

Stats

Install

Uh oh!

v0.1.0-rc2 — install bug fix

Patch over v0.1.0-rc1

Changed

Verification

Why use this over rc1

Uh oh!