Feature/validate stale bookmarks by knepprath · Pull Request #9 · sukanth/mindmark

knepprath · 2026-04-22T22:39:12Z

What this PR does

This pull request adds two new CLI commands to Mindmark for improved index management and bookmark health checking, updates the CLI and documentation, and introduces tests for the new validation feature. The most important changes are summarized below.

New CLI features:

Added a validate command to check all indexed bookmark URLs for stale links (HTTP 4xx/5xx or unreachable) and report them without modifying the index. This includes parallelized URL checking and options for timeout and worker count. (src/mindmark/cli.py, README.md, [1] [2] [3]
Added a drop-index command to delete the local SQLite index database, with confirmation prompts and a fallback to clearing index contents if the file is locked. (src/mindmark/cli.py, README.md, [1] [2] [3]

Index management enhancements:

Added all_bookmarks() and remove_urls() methods to the Index class to support bookmark validation and future removal operations. (src/mindmark/index.py, src/mindmark/index.pyR410-R451)

CLI and workflow improvements:

Refactored the sync, stats, and result display logic for clarity and consistency. Deprecated unused or redundant subcommands and arguments. (src/mindmark/cli.py, [1] [2] [3]
Updated the argument parser to support the new commands and improved error handling for command-line arguments. (src/mindmark/cli.py, [1] [2]

Testing:

Added a new test module for the validate command, covering healthy URLs, detection of stale links, CLI dispatch, and argument validation. (tests/test_validate_cli.py, tests/test_validate_cli.pyR1-R124)

These changes together make it easier to maintain a clean and healthy bookmark index and lay the groundwork for more advanced management features.

How to test

See updates to the README Usage section to exercise the new commands.

Checklist

Tests pass (pytest -q)
New behavior is covered by tests (if applicable)
README or docs updated (if applicable)

…nvironments

index

Copilot

Pull request overview

Adds URL health validation and index reset capabilities to the Mindmark CLI, including supporting index APIs, docs, and tests.

Changes:

Introduces mindmark validate to concurrently probe indexed bookmark URLs and report stale/unreachable links.
Introduces mindmark drop-index to delete the local SQLite index (with a “clear contents” fallback when deletion fails).
Extends the Index API (all_bookmarks, remove_urls) and adds CLI tests for the new validate flow.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File	Description
`src/mindmark/cli.py`	Adds `validate`/`drop-index`, URL probing logic, and refactors parts of CLI behavior.
`src/mindmark/index.py`	Adds APIs to enumerate bookmarks and delete by URL(s) to support validation/management workflows.
`tests/test_validate_cli.py`	Adds unit tests covering validate behavior and CLI dispatch/arg rejection.
`README.md`	Documents new CLI commands and usage examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-23T02:19:01Z

+                if error == "skipped (non-http URL)":
+                    skipped += 1


_cmd_validate relies on a string literal ("skipped (non-http URL)") to detect the skipped case. This is brittle—if the message changes, the logic breaks. Consider returning a dedicated sentinel/enum/boolean for skipped URLs (or treat code is None + a structured reason) and keep the human-readable message separate.

Copilot · 2026-04-23T02:19:02Z

 | `mindmark open "query"` | Search and open the best match in your default browser |
 | `mindmark stats` | Show index size, model info, top domains, and top folders |
 | `mindmark index <file>` | Import bookmarks from an exported HTML file (legacy workflow) |
+| `mindmark validate` | Check indexed bookmark URLs for stale links (HTTP 4xx/5xx or unreachable) and report them |
+| `mindmark drop-index` | Delete the local SQLite index database (with confirmation unless `--yes`) |


The usage table still documents a mindmark open "query" command, but the CLI parser no longer defines an open subcommand (opening is now only available via mindmark find --open N). Update the README to match the actual CLI surface to avoid broken docs.

Copilot · 2026-04-23T02:19:02Z

+
+
+def _cmd_drop_index(args):
+    db_path = Path(args.db).expanduser() if args.db else default_db_path()


drop-index expands ~ in --db via Path(args.db).expanduser(), but other commands pass args.db straight into Index, which does not expand ~. This can make mindmark drop-index --db ~/index.db target a different path than mindmark sync/index/find --db ~/index.db used. Consider normalizing --db consistently (e.g., expanduser in main() for all commands, or remove expanduser here and rely on a shared path-normalization helper).

Suggested change

db_path = Path(args.db).expanduser() if args.db else default_db_path()

db_path = Path(args.db) if args.db else default_db_path()

Copilot · 2026-04-23T02:19:02Z

+    print(f"[1/2] collecting bookmarks from {', '.join(b.browser_name for b in browsers)}")
+    bookmarks = []; [bookmarks.extend(parse_browser_bookmarks(b)) for b in browsers]
+    if not bookmarks:


In _cmd_sync, bookmarks are collected via a list-comprehension used only for side effects. This is hard to read and also bypasses the per-profile error handling that collect_all_bookmarks() provides (src/mindmark/browsers/init.py:31-56), so a single failing browser/profile can crash the entire sync. Replace this with an explicit loop (or reuse collect_all_bookmarks) and handle parse errors per profile so sync can continue with the remaining browsers.

Copilot · 2026-04-23T02:19:02Z

+    if not bookmarks:
+        print("no bookmarks found.")
+        return 0
+    print(f"      found {len(bookmarks)} unique bookmarks")


This log message claims the collected bookmarks are "unique", but the code currently just concatenates results from each browser/profile without deduplication. Either dedupe (e.g., by URL) before reporting, or change the message to avoid implying uniqueness.

Suggested change

print(f" found {len(bookmarks)} unique bookmarks")

print(f" found {len(bookmarks)} bookmarks")

Copilot · 2026-04-23T02:19:03Z

+    res = idx.sync(bookmarks)
+
+    print(f"done. added={res.added} updated={res.updated} removed={res.removed}")
    return 0


_cmd_sync opens an Index but never closes it. On some platforms (notably Windows), leaving the SQLite connection open can keep the DB file locked and can interfere with follow-up operations like drop-index. Wrap the sync in try/finally (or similar) to ensure idx.close() is always called.

Suggested change

res = idx.sync(bookmarks)

print(f"done. added={res.added} updated={res.updated} removed={res.removed}")

return 0

try:

res = idx.sync(bookmarks)

print(f"done. added={res.added} updated={res.updated} removed={res.removed}")

return 0

finally:

idx.close()

Copilot · 2026-04-23T02:19:03Z

+        con = sqlite3.connect(str(db_path), timeout=1.0)
+        cur = con.cursor()
+        cur.execute("DELETE FROM bookmark_sources")
+        cur.execute("DELETE FROM bookmarks")
+        cur.execute("DELETE FROM meta")
+        con.commit()
+        con.close()


_clear_index_contents opens an SQLite connection but only closes it on the success path. If any statement raises sqlite3.Error after connecting, the connection can be left open. Use a context manager (with sqlite3.connect(...) as con:) or a finally block to guarantee con.close() runs.

Suggested change

con = sqlite3.connect(str(db_path), timeout=1.0)

cur = con.cursor()

cur.execute("DELETE FROM bookmark_sources")

cur.execute("DELETE FROM bookmarks")

cur.execute("DELETE FROM meta")

con.commit()

con.close()

with sqlite3.connect(str(db_path), timeout=1.0) as con:

cur = con.cursor()

cur.execute("DELETE FROM bookmark_sources")

cur.execute("DELETE FROM bookmarks")

cur.execute("DELETE FROM meta")

con.commit()

Copilot · 2026-04-23T02:19:03Z

+    if not _is_http_url(url):
+        return url, None, "skipped (non-http URL)"
+
+    headers = {"User-Agent": "mindmark/0.x (+bookmark-validation)"}


The User-Agent header is hardcoded to mindmark/0.x, which can become misleading as versions change. Consider using the actual package version (e.g., __version__) in the header so requests are correctly identifiable.

Suggested change

headers = {"User-Agent": "mindmark/0.x (+bookmark-validation)"}

headers = {"User-Agent": f"mindmark/{__version__} (+bookmark-validation)"}

David Knepprath added 3 commits April 22, 2026 13:35

feat(cli): add --validate URL checker with stale-link trim

18721d7

fix(cli): handle OSError from input when running in non-interactive e…

1b4d2b3

…nvironments

cleanup the validate command and add command for dropping the local

1dba25f

index

sukanth requested review from Copilot and sukanth April 23, 2026 02:14

Copilot started reviewing on behalf of sukanth April 23, 2026 02:14 View session

sukanth approved these changes Apr 23, 2026

View reviewed changes

sukanth merged commit 28880d5 into sukanth:master Apr 23, 2026
11 checks passed

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/validate stale bookmarks#9

Feature/validate stale bookmarks#9
sukanth merged 3 commits intosukanth:masterfrom
knepprath:feature/validate-stale-bookmarks

knepprath commented Apr 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		def _cmd_drop_index(args):
		db_path = Path(args.db).expanduser() if args.db else default_db_path()

	db_path = Path(args.db).expanduser() if args.db else default_db_path()
	db_path = Path(args.db) if args.db else default_db_path()

	print(f" found {len(bookmarks)} unique bookmarks")
	print(f" found {len(bookmarks)} bookmarks")

	headers = {"User-Agent": "mindmark/0.x (+bookmark-validation)"}
	headers = {"User-Agent": f"mindmark/{__version__} (+bookmark-validation)"}

Conversation

knepprath commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

How to test

Checklist

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

knepprath commented Apr 22, 2026 •

edited

Loading