Skip to content

Add data import toolkit (ChatGPT, Obsidian, Readwise)#58

Merged
imonroe merged 2 commits into
mainfrom
claude/ob1-import-toolkit
Jun 4, 2026
Merged

Add data import toolkit (ChatGPT, Obsidian, Readwise)#58
imonroe merged 2 commits into
mainfrom
claude/ob1-import-toolkit

Conversation

@imonroe
Copy link
Copy Markdown
Owner

@imonroe imonroe commented Jun 4, 2026

Summary

Adds a data import toolkit so a fresh memory store doesn't have to start empty. These are standalone REST clients of POST /api/v1/memories — adapted from the OB1 / Open Brain project's import recipes (backlog issue #49) and reworked for this server's single-user REST surface. Nothing here ships in the app image.

Three sources to start:

Source Script What it sends
ChatGPT export (conversations.json) scripts/import_chatgpt.py One messages payload per conversation (user/assistant turns, ordered)
Obsidian vault (folder of .md) scripts/import_obsidian.py One memory per note, YAML frontmatter stripped, dotfolders skipped
Readwise highlights (CSV export) scripts/import_readwise.py One memory per highlight (+ attached note)

All three share the same options: path, --base-url/--api-key (default $MEM0_URL/$MEM0_API_KEY), --source, --limit (trial runs), and --dry-run (preview without sending). Each imported memory is tagged agent_id=import:<source> and carries source (+ title/path/book/author) in metadata so imported memories stay distinguishable from session-written ones.

Structure

  • importers/ package: client.py (retrying REST client — backoff on transport errors and 5xx, no retry on 4xx), pure parsers chatgpt.py / obsidian.py / readwise.py (each a generator yielding MemoryClient.add kwargs), and cli.py (shared argparse + run loop).
  • scripts/import_*.py: thin CLI wrappers that add the repo root to sys.path, so they run from a checkout with no packaging step.

Tests

  • tests/test_importers.py (11 cases): ChatGPT turn ordering / system-message drop / wrapped-key / custom source; Obsidian frontmatter stripping, dotfolder skip, top-only frontmatter; Readwise highlight+note+metadata; client bearer header, dry-run, no-retry on 4xx, backoff+retry on 5xx (via respx).
  • Full suite: 84 passed, ruff check clean.
  • Also verified the CLIs end-to-end with --dry-run against sample exports and confirmed the missing-credentials guard exits non-zero.

Architecture notes

  • No server code touched; no new runtime dependency (httpx is already pinned). Respects the single-user model — everything imports under the server's default user, distinguished only by the import:<source> provenance tag.
  • Docs: new "Importing existing data" section in the User Guide (with the LLM fact-extraction cost note and --dry-run/--limit guidance) and a project-layout entry in the Developer Guide.

Follow-ups

Easy to extend with more sources (Gmail, X/Twitter, Perplexity, Grok) by adding a parser + a thin script. Server-side byte-identical dedup is tracked separately in #48.

Relates to #49.

https://claude.ai/code/session_017835DVrvURaYnbQiPQwzue


Generated by Claude Code

Standalone importer scripts that bulk-load existing personal data into the
memory store over the REST API, so a fresh server doesn't start empty. Adapts
the OB1 project's import recipes to this server's single-user REST surface.

- importers/: a retrying REST client (client.py), pure per-source parsers
  (chatgpt.py, obsidian.py, readwise.py), and a shared CLI runner (cli.py).
  Each parser yields MemoryClient.add kwargs; nothing here ships in the app
  image. Imported memories are tagged agent_id=import:<source> with source/
  title/path/book/author metadata for later provenance.
- scripts/import_{chatgpt,obsidian,readwise}.py: thin CLIs with --dry-run,
  --limit, --source, and $MEM0_URL/$MEM0_API_KEY support.
- tests/test_importers.py: parser cases + client behavior (bearer header,
  dry-run, no-retry on 4xx, backoff+retry on 5xx) via respx.
- docs: USER_GUIDE "Importing existing data" section (incl. LLM cost note),
  DEVELOPER_GUIDE project-layout entry.

https://claude.ai/code/session_017835DVrvURaYnbQiPQwzue
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a standalone data import toolkit that seeds a fresh memserv memory store by parsing common exports and POSTing them to the existing POST /api/v1/memories REST endpoint (no server/runtime behavior changes).

Changes:

  • Introduces an importers/ package with a retrying REST client, shared CLI wiring, and pure parsers for ChatGPT/Obsidian/Readwise exports.
  • Adds thin scripts/import_*.py entrypoints that run from a repo checkout (no packaging step).
  • Adds importer-focused unit tests and updates docs to describe the new import workflow and repo layout.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_importers.py New test suite covering parsers and client retry/dry-run behavior.
scripts/import_chatgpt.py CLI entrypoint for importing ChatGPT conversations.json.
scripts/import_obsidian.py CLI entrypoint for importing an Obsidian vault directory.
scripts/import_readwise.py CLI entrypoint for importing Readwise CSV highlights.
importers/init.py Package docstring and high-level intent for importer tooling.
importers/chatgpt.py Parser/loader for ChatGPT exports producing messages payloads.
importers/obsidian.py Parser for Obsidian vaults (frontmatter stripping, skip dirs).
importers/readwise.py Parser/loader for Readwise CSV highlights (+ optional notes/metadata).
importers/client.py MemoryClient REST client with retry/backoff and dry-run support.
importers/cli.py Shared argparse + import run loop used by the scripts.
docs/USER_GUIDE.md Documents how to run the import scripts and expected behavior/cost.
docs/DEVELOPER_GUIDE.md Adds importers/ + scripts/import_*.py to the project layout map.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread importers/cli.py Outdated
Comment thread importers/client.py
…_retries

- importers/cli.py: run() now returns a non-zero exit code whenever any record
  fails, even if others succeeded, so automation can detect partial failures.
- importers/client.py: validate max_retries >= 1 in __init__ and raise a clear
  ValueError, instead of running zero attempts and hitting an AssertionError.
- tests: cover both (cli.run exit codes for all-success vs partial-failure, and
  the max_retries guard).

https://claude.ai/code/session_017835DVrvURaYnbQiPQwzue
@imonroe imonroe merged commit 4579731 into main Jun 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants