Feature/validate stale bookmarks#9
Conversation
There was a problem hiding this comment.
Pull request overview
Adds URL health validation and index reset capabilities to the Mindmark CLI, including supporting index APIs, docs, and tests.
Changes:
- Introduces
mindmark validateto concurrently probe indexed bookmark URLs and report stale/unreachable links. - Introduces
mindmark drop-indexto delete the local SQLite index (with a “clear contents” fallback when deletion fails). - Extends the
IndexAPI (all_bookmarks,remove_urls) and adds CLI tests for the new validate flow.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
src/mindmark/cli.py |
Adds validate/drop-index, URL probing logic, and refactors parts of CLI behavior. |
src/mindmark/index.py |
Adds APIs to enumerate bookmarks and delete by URL(s) to support validation/management workflows. |
tests/test_validate_cli.py |
Adds unit tests covering validate behavior and CLI dispatch/arg rejection. |
README.md |
Documents new CLI commands and usage examples. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if error == "skipped (non-http URL)": | ||
| skipped += 1 |
There was a problem hiding this comment.
_cmd_validate relies on a string literal ("skipped (non-http URL)") to detect the skipped case. This is brittle—if the message changes, the logic breaks. Consider returning a dedicated sentinel/enum/boolean for skipped URLs (or treat code is None + a structured reason) and keep the human-readable message separate.
| | `mindmark open "query"` | Search and open the best match in your default browser | | ||
| | `mindmark stats` | Show index size, model info, top domains, and top folders | | ||
| | `mindmark index <file>` | Import bookmarks from an exported HTML file (legacy workflow) | | ||
| | `mindmark validate` | Check indexed bookmark URLs for stale links (HTTP 4xx/5xx or unreachable) and report them | | ||
| | `mindmark drop-index` | Delete the local SQLite index database (with confirmation unless `--yes`) | |
There was a problem hiding this comment.
The usage table still documents a mindmark open "query" command, but the CLI parser no longer defines an open subcommand (opening is now only available via mindmark find --open N). Update the README to match the actual CLI surface to avoid broken docs.
|
|
||
|
|
||
| def _cmd_drop_index(args): | ||
| db_path = Path(args.db).expanduser() if args.db else default_db_path() |
There was a problem hiding this comment.
drop-index expands ~ in --db via Path(args.db).expanduser(), but other commands pass args.db straight into Index, which does not expand ~. This can make mindmark drop-index --db ~/index.db target a different path than mindmark sync/index/find --db ~/index.db used. Consider normalizing --db consistently (e.g., expanduser in main() for all commands, or remove expanduser here and rely on a shared path-normalization helper).
| db_path = Path(args.db).expanduser() if args.db else default_db_path() | |
| db_path = Path(args.db) if args.db else default_db_path() |
| print(f"[1/2] collecting bookmarks from {', '.join(b.browser_name for b in browsers)}") | ||
| bookmarks = []; [bookmarks.extend(parse_browser_bookmarks(b)) for b in browsers] | ||
| if not bookmarks: |
There was a problem hiding this comment.
In _cmd_sync, bookmarks are collected via a list-comprehension used only for side effects. This is hard to read and also bypasses the per-profile error handling that collect_all_bookmarks() provides (src/mindmark/browsers/init.py:31-56), so a single failing browser/profile can crash the entire sync. Replace this with an explicit loop (or reuse collect_all_bookmarks) and handle parse errors per profile so sync can continue with the remaining browsers.
| if not bookmarks: | ||
| print("no bookmarks found.") | ||
| return 0 | ||
| print(f" found {len(bookmarks)} unique bookmarks") |
There was a problem hiding this comment.
This log message claims the collected bookmarks are "unique", but the code currently just concatenates results from each browser/profile without deduplication. Either dedupe (e.g., by URL) before reporting, or change the message to avoid implying uniqueness.
| print(f" found {len(bookmarks)} unique bookmarks") | |
| print(f" found {len(bookmarks)} bookmarks") |
| res = idx.sync(bookmarks) | ||
|
|
||
| print(f"done. added={res.added} updated={res.updated} removed={res.removed}") | ||
| return 0 |
There was a problem hiding this comment.
_cmd_sync opens an Index but never closes it. On some platforms (notably Windows), leaving the SQLite connection open can keep the DB file locked and can interfere with follow-up operations like drop-index. Wrap the sync in try/finally (or similar) to ensure idx.close() is always called.
| res = idx.sync(bookmarks) | |
| print(f"done. added={res.added} updated={res.updated} removed={res.removed}") | |
| return 0 | |
| try: | |
| res = idx.sync(bookmarks) | |
| print(f"done. added={res.added} updated={res.updated} removed={res.removed}") | |
| return 0 | |
| finally: | |
| idx.close() |
| con = sqlite3.connect(str(db_path), timeout=1.0) | ||
| cur = con.cursor() | ||
| cur.execute("DELETE FROM bookmark_sources") | ||
| cur.execute("DELETE FROM bookmarks") | ||
| cur.execute("DELETE FROM meta") | ||
| con.commit() | ||
| con.close() |
There was a problem hiding this comment.
_clear_index_contents opens an SQLite connection but only closes it on the success path. If any statement raises sqlite3.Error after connecting, the connection can be left open. Use a context manager (with sqlite3.connect(...) as con:) or a finally block to guarantee con.close() runs.
| con = sqlite3.connect(str(db_path), timeout=1.0) | |
| cur = con.cursor() | |
| cur.execute("DELETE FROM bookmark_sources") | |
| cur.execute("DELETE FROM bookmarks") | |
| cur.execute("DELETE FROM meta") | |
| con.commit() | |
| con.close() | |
| with sqlite3.connect(str(db_path), timeout=1.0) as con: | |
| cur = con.cursor() | |
| cur.execute("DELETE FROM bookmark_sources") | |
| cur.execute("DELETE FROM bookmarks") | |
| cur.execute("DELETE FROM meta") | |
| con.commit() |
| if not _is_http_url(url): | ||
| return url, None, "skipped (non-http URL)" | ||
|
|
||
| headers = {"User-Agent": "mindmark/0.x (+bookmark-validation)"} |
There was a problem hiding this comment.
The User-Agent header is hardcoded to mindmark/0.x, which can become misleading as versions change. Consider using the actual package version (e.g., __version__) in the header so requests are correctly identifiable.
| headers = {"User-Agent": "mindmark/0.x (+bookmark-validation)"} | |
| headers = {"User-Agent": f"mindmark/{__version__} (+bookmark-validation)"} |
What this PR does
This pull request adds two new CLI commands to Mindmark for improved index management and bookmark health checking, updates the CLI and documentation, and introduces tests for the new validation feature. The most important changes are summarized below.
New CLI features:
validatecommand to check all indexed bookmark URLs for stale links (HTTP 4xx/5xx or unreachable) and report them without modifying the index. This includes parallelized URL checking and options for timeout and worker count. (src/mindmark/cli.py,README.md, [1] [2] [3]drop-indexcommand to delete the local SQLite index database, with confirmation prompts and a fallback to clearing index contents if the file is locked. (src/mindmark/cli.py,README.md, [1] [2] [3]Index management enhancements:
all_bookmarks()andremove_urls()methods to theIndexclass to support bookmark validation and future removal operations. (src/mindmark/index.py, src/mindmark/index.pyR410-R451)CLI and workflow improvements:
sync,stats, and result display logic for clarity and consistency. Deprecated unused or redundant subcommands and arguments. (src/mindmark/cli.py, [1] [2] [3]src/mindmark/cli.py, [1] [2]Testing:
validatecommand, covering healthy URLs, detection of stale links, CLI dispatch, and argument validation. (tests/test_validate_cli.py, tests/test_validate_cli.pyR1-R124)These changes together make it easier to maintain a clean and healthy bookmark index and lay the groundwork for more advanced management features.
How to test
See updates to the README Usage section to exercise the new commands.
Checklist
pytest -q)