Skip to content

v0.1.1 — onboarding wizards + 13 fresh-user fixes

Latest

Choose a tag to compare

@labazhou2024 labazhou2024 released this 17 May 15:20
· 7 commits to main since this release
88fdf6a

v0.1.1 — onboarding wizards + 13 fresh-user fixes

v0.1.0 shipped the new project name (Memexa) on PyPI but the email
ingest path turned out to be broken (hard-coded to maintainer-specific
account names that did not exist in the OSS package). v0.1.1 fixes
that and rewrites onboarding around three interactive wizards, then
adds 13 more fixes caught when the whole flow was re-played from a
fresh-user perspective across Win 11 + Mac Studio + USTC Linux, with
real IMAP credentials (QQ + USTC Exmail-reverse-proxy) and real
WeChatMsg-schema JSON.

TL;DR for a brand-new user

pip install memexa==0.1.1
memexa demo                                 # Tier 0 — see it works
memexa init                                 # scaffold ~/.memexa/
memexa init llm                             # 4 providers; DeepSeek / OpenAI / Qwen / custom
memexa init email                           # 12+ IMAP providers auto-detected
memexa backend up                           # docker compose pg + Hindsight
memexa ingest email                         # IMAP → batch → LLM extract → POST
memexa quick "<question>"                   # see your own messages back

New CLI (public)

memexa init                  # legacy scaffold (templates ship in wheel)
memexa init llm              # LLM provider wizard (4 providers)
memexa init email            # IMAP wizard (12+ providers auto-detected)
memexa init wechat           # WeChatMsg export wizard (Windows-only)
memexa backend up            # docker compose -f ~/.memexa/docker-compose.yml up -d
memexa backend status        # docker ps + curl /health
memexa backend down          # compose down
memexa ingest email          # fetch IMAP for all configured accounts
memexa ingest wechat         # read WeChatMsg export dir → builder → extract → POST

docs/quickstart.md walks through all of these end-to-end.

Highlights

Critical fix carried from v0.1.0

memexa/extraction/email_history_fetcher.py was hard-coded to two
maintainer-specific account names (qq_email, ustc_email) and tried
to import memexa.qq_email / memexa.ustc_email — modules that
do not exist in the OSS package. v0.1.0 PyPI users who followed
the docs got ModuleNotFoundError. v0.1.1 rewrites the fetcher as a
generic IMAP client (stdlib imaplib + email.parser), reads
email.accounts.<name> from ~/.memexa/identity.yaml, supports
multiple accounts.

13 fresh-user blockers (re-verify pass)

Onboarding — install + init

  • memexa init shipped without the example templates in the wheel.
    Fresh user got 3 [warn] template missing warnings and an empty
    ~/.memexa/. Templates now ship under memexa/templates/.
  • memexa init llm crashed on Chinese-locale Windows console with
    UnicodeEncodeError: 'gbk' codec can't encode character '\xa5'
    (the ¥ symbol in the DeepSeek provider note). CLI entry now
    reconfigures stdio to UTF-8.
  • memexa init email for USTC mail printed the wrong host hint
    (imap.exmail.qq.com). That endpoint rate-limits / locks new
    logins. The right host is mail.ustc.edu.cn:993, LIVE-verified.

Backend — memexa backend up / memexa doctor

  • memexa doctor probed /healthz but Hindsight serves /health.
  • memexa doctor LLM probe double-prefixed /v1, hitting
    /v1/v1/chat/completions → 404. Now detects /v1 already present.
  • memexa doctor read a non-existent nodes field, always
    reported "0 nodes" on a populated bank.
  • memexa backend up polled with a 60s timeout — too short for a
    cold BGE-M3 load. Bumped to 180s.
  • docker-compose.yml routed HINDSIGHT_API_LLM_MODEL to the
    EXTRACT model. Reasoning-class models (deepseek-v4-flash-ascend,
    qwen-reasoner) emit content in reasoning_content and leave
    content empty on Hindsight's strict-JSON prompts, so
    fact-extraction silently failed and total_nodes stayed at 0.
    Default switched to the GATE model.
  • docker-compose.yml substituted ${HF_ENDPOINT:-} into the
    container env. Empty-string substitution made huggingface_hub
    raise httpx.UnsupportedProtocol: Request URL is missing a protocol on cold start. Now loaded via env_file: so absent
    stays absent (huggingface_hub falls back to its built-in
    default). China users opt in by adding
    HF_ENDPOINT=https://hf-mirror.com to ~/.memexa/.env.
  • memexa backend up no longer leaks stale HINDSIGHT_API_LLM_*
    shell exports into the compose process — they were silently
    shadowing ~/.memexa/.env.

Ingestion — extract → POST → query

  • _normalize_llm_card now enum-coerces every confidence value
    (numeric 0.85, English "high" / "low", Chinese "确定" /
    "模糊", bool, None) to a canonical Hindsight enum. The
    four confidence fields (TimeResolution.confidence +
    Entity.resolution_confidence 4-value;
    IdentityAssertion.confidence + RelationAssertion.confidence
    3-value) are all handled. Demo dataset went from 6/18 POST OK
    → 18/18.
  • _normalize_llm_card ISO-coerces anchor_message_ts (LLM
    sometimes emits a bare date "2026-05-13"). The
    when_start_default capture also runs after coercion so the
    fallback no longer inherits a bare date.
  • _build_wechat_prompt_from_messages no longer silently drops
    every non-text message in a real WeChatMsg export. Image /
    sticker / video / voice / location / appmsg / sysmsg all
    survive as [图片] / [表情] / [视频] / etc. placeholders,
    with <title> extracted from Type=49 appmsg XML. 30-60% of
    a real chat history that was being lost is now preserved.
  • _build_wechat_prompt_from_messages propagates IsSender=1
    per-msg is_self_message hint + batch-level n_self_msgs /
    n_other_msgs / is_solo_self / is_self_chat. Downstream
    §SELF_NOTE_MODE now anchors commitments to the user's own
    utterances.

Tests

  • 25 new unit tests added: tests/unit/test_confidence_sanitizer.py
    (57 parametrised cases covering numeric / English / Chinese /
    bool / None / canonical-4 / canonical-3 enum semantics);
    tests/unit/test_wechat_msg_adapter.py (25 cases across 10
    msg-type codes + IsSender semantics + field aliases).
  • Full pytest: 140 passed, 2 skipped (both pre-existing
    prompt-drift tests, queued for the next prompt-maintenance pass).

LIVE verification matrix

Win 11 (Docker Desktop) Mac Studio (OrbStack) USTC Ubuntu 22.04
pip install
memexa demo
memexa init + wizards
memexa backend up ✅ (mihomo + HF mirror) ❌ docker.io blocked by campus firewall (env, not memexa)
Real QQ IMAP ingest ✅ N=10 queryable cards (same code path) blocked above
Real USTC IMAP ingest ✅ N=10 (deduped w/ QQ) (same code path) blocked above
WeChat demo ingest ✅ 18/18 POST ✅ 12/12 blocked above
WeChat real-schema ✅ 7 msgs → 3 cards (same code path) blocked above
Mixed msg types ✅ 6/6 survive (same code path) blocked above
memexa doctor ✅ 4 green ✅ 4 green ✅ graceful w/o backend

Known gaps deferred to v0.1.2

  • Other IMAP providers (Gmail / Outlook / iCloud / 163 / 126 / Yeah /
    Hotmail / Sina / Live) — wizards have correct host/port/auth-type
    hints, but only QQ and USTC have been LIVE-fetched against. Other
    providers should work but are not LIVE-attested for this release.
  • Large-volume ingest (>500 batches) — small dataset proven; rate-
    limit / dead-letter back-pressure / memory behavior at scale not
    yet stress-tested.
  • WeChatMsg-from-the-real-tool — adapter is reverse-engineered from
    WeChatMsg's documented schema (Type / IsSender / StrContent /
    CreateTime / NickName / StrTalker) but a real WeChatMsg
    release binary was not in this verification pass. Field aliases
    cover the common emissions; truly novel field names would need a
    v0.1.2 adapter patch.
  • Hindsight async-consolidation transparency — memexa ingest shows
    dead-letter: N when verify-after-POST sees total=0, but the
    cards are usually in the document store and just waiting for the
    background consolidator. v0.1.2 should auto-trigger
    /consolidate and poll.

Install & upgrade

# Fresh install:
pip install memexa==0.1.1

# Upgrade from v0.1.0:
pip install -U memexa

For full quickstart (Tier 0 / Tier 1 / Tier 2), see
docs/quickstart.md.

feat/v0.1.1-onboarding PR #18 · 15 commits · 140 pytest pass · 18/18 CI green