Skip to content

Feature/validate stale bookmarks#9

Merged
sukanth merged 3 commits intosukanth:masterfrom
knepprath:feature/validate-stale-bookmarks
Apr 23, 2026
Merged

Feature/validate stale bookmarks#9
sukanth merged 3 commits intosukanth:masterfrom
knepprath:feature/validate-stale-bookmarks

Conversation

@knepprath
Copy link
Copy Markdown

@knepprath knepprath commented Apr 22, 2026

What this PR does

This pull request adds two new CLI commands to Mindmark for improved index management and bookmark health checking, updates the CLI and documentation, and introduces tests for the new validation feature. The most important changes are summarized below.

New CLI features:

  • Added a validate command to check all indexed bookmark URLs for stale links (HTTP 4xx/5xx or unreachable) and report them without modifying the index. This includes parallelized URL checking and options for timeout and worker count. (src/mindmark/cli.py, README.md, [1] [2] [3]
  • Added a drop-index command to delete the local SQLite index database, with confirmation prompts and a fallback to clearing index contents if the file is locked. (src/mindmark/cli.py, README.md, [1] [2] [3]

Index management enhancements:

  • Added all_bookmarks() and remove_urls() methods to the Index class to support bookmark validation and future removal operations. (src/mindmark/index.py, src/mindmark/index.pyR410-R451)

CLI and workflow improvements:

  • Refactored the sync, stats, and result display logic for clarity and consistency. Deprecated unused or redundant subcommands and arguments. (src/mindmark/cli.py, [1] [2] [3]
  • Updated the argument parser to support the new commands and improved error handling for command-line arguments. (src/mindmark/cli.py, [1] [2]

Testing:

  • Added a new test module for the validate command, covering healthy URLs, detection of stale links, CLI dispatch, and argument validation. (tests/test_validate_cli.py, tests/test_validate_cli.pyR1-R124)

These changes together make it easier to maintain a clean and healthy bookmark index and lay the groundwork for more advanced management features.

How to test

See updates to the README Usage section to exercise the new commands.

Checklist

  • Tests pass (pytest -q)
  • New behavior is covered by tests (if applicable)
  • README or docs updated (if applicable)

@sukanth sukanth merged commit 28880d5 into sukanth:master Apr 23, 2026
11 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds URL health validation and index reset capabilities to the Mindmark CLI, including supporting index APIs, docs, and tests.

Changes:

  • Introduces mindmark validate to concurrently probe indexed bookmark URLs and report stale/unreachable links.
  • Introduces mindmark drop-index to delete the local SQLite index (with a “clear contents” fallback when deletion fails).
  • Extends the Index API (all_bookmarks, remove_urls) and adds CLI tests for the new validate flow.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File Description
src/mindmark/cli.py Adds validate/drop-index, URL probing logic, and refactors parts of CLI behavior.
src/mindmark/index.py Adds APIs to enumerate bookmarks and delete by URL(s) to support validation/management workflows.
tests/test_validate_cli.py Adds unit tests covering validate behavior and CLI dispatch/arg rejection.
README.md Documents new CLI commands and usage examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/mindmark/cli.py
Comment on lines +76 to +77
if error == "skipped (non-http URL)":
skipped += 1
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_cmd_validate relies on a string literal ("skipped (non-http URL)") to detect the skipped case. This is brittle—if the message changes, the logic breaks. Consider returning a dedicated sentinel/enum/boolean for skipped URLs (or treat code is None + a structured reason) and keep the human-readable message separate.

Copilot uses AI. Check for mistakes.
Comment thread README.md
Comment on lines 43 to +47
| `mindmark open "query"` | Search and open the best match in your default browser |
| `mindmark stats` | Show index size, model info, top domains, and top folders |
| `mindmark index <file>` | Import bookmarks from an exported HTML file (legacy workflow) |
| `mindmark validate` | Check indexed bookmark URLs for stale links (HTTP 4xx/5xx or unreachable) and report them |
| `mindmark drop-index` | Delete the local SQLite index database (with confirmation unless `--yes`) |
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage table still documents a mindmark open "query" command, but the CLI parser no longer defines an open subcommand (opening is now only available via mindmark find --open N). Update the README to match the actual CLI surface to avoid broken docs.

Copilot uses AI. Check for mistakes.
Comment thread src/mindmark/cli.py


def _cmd_drop_index(args):
db_path = Path(args.db).expanduser() if args.db else default_db_path()
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop-index expands ~ in --db via Path(args.db).expanduser(), but other commands pass args.db straight into Index, which does not expand ~. This can make mindmark drop-index --db ~/index.db target a different path than mindmark sync/index/find --db ~/index.db used. Consider normalizing --db consistently (e.g., expanduser in main() for all commands, or remove expanduser here and rely on a shared path-normalization helper).

Suggested change
db_path = Path(args.db).expanduser() if args.db else default_db_path()
db_path = Path(args.db) if args.db else default_db_path()

Copilot uses AI. Check for mistakes.
Comment thread src/mindmark/cli.py
Comment on lines +254 to +256
print(f"[1/2] collecting bookmarks from {', '.join(b.browser_name for b in browsers)}")
bookmarks = []; [bookmarks.extend(parse_browser_bookmarks(b)) for b in browsers]
if not bookmarks:
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _cmd_sync, bookmarks are collected via a list-comprehension used only for side effects. This is hard to read and also bypasses the per-profile error handling that collect_all_bookmarks() provides (src/mindmark/browsers/init.py:31-56), so a single failing browser/profile can crash the entire sync. Replace this with an explicit loop (or reuse collect_all_bookmarks) and handle parse errors per profile so sync can continue with the remaining browsers.

Copilot uses AI. Check for mistakes.
Comment thread src/mindmark/cli.py
if not bookmarks:
print("no bookmarks found.")
return 0
print(f" found {len(bookmarks)} unique bookmarks")
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log message claims the collected bookmarks are "unique", but the code currently just concatenates results from each browser/profile without deduplication. Either dedupe (e.g., by URL) before reporting, or change the message to avoid implying uniqueness.

Suggested change
print(f" found {len(bookmarks)} unique bookmarks")
print(f" found {len(bookmarks)} bookmarks")

Copilot uses AI. Check for mistakes.
Comment thread src/mindmark/cli.py
Comment on lines +263 to 266
res = idx.sync(bookmarks)

print(f"done. added={res.added} updated={res.updated} removed={res.removed}")
return 0
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_cmd_sync opens an Index but never closes it. On some platforms (notably Windows), leaving the SQLite connection open can keep the DB file locked and can interfere with follow-up operations like drop-index. Wrap the sync in try/finally (or similar) to ensure idx.close() is always called.

Suggested change
res = idx.sync(bookmarks)
print(f"done. added={res.added} updated={res.updated} removed={res.removed}")
return 0
try:
res = idx.sync(bookmarks)
print(f"done. added={res.added} updated={res.updated} removed={res.removed}")
return 0
finally:
idx.close()

Copilot uses AI. Check for mistakes.
Comment thread src/mindmark/cli.py
Comment on lines +154 to +160
con = sqlite3.connect(str(db_path), timeout=1.0)
cur = con.cursor()
cur.execute("DELETE FROM bookmark_sources")
cur.execute("DELETE FROM bookmarks")
cur.execute("DELETE FROM meta")
con.commit()
con.close()
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_clear_index_contents opens an SQLite connection but only closes it on the success path. If any statement raises sqlite3.Error after connecting, the connection can be left open. Use a context manager (with sqlite3.connect(...) as con:) or a finally block to guarantee con.close() runs.

Suggested change
con = sqlite3.connect(str(db_path), timeout=1.0)
cur = con.cursor()
cur.execute("DELETE FROM bookmark_sources")
cur.execute("DELETE FROM bookmarks")
cur.execute("DELETE FROM meta")
con.commit()
con.close()
with sqlite3.connect(str(db_path), timeout=1.0) as con:
cur = con.cursor()
cur.execute("DELETE FROM bookmark_sources")
cur.execute("DELETE FROM bookmarks")
cur.execute("DELETE FROM meta")
con.commit()

Copilot uses AI. Check for mistakes.
Comment thread src/mindmark/cli.py
if not _is_http_url(url):
return url, None, "skipped (non-http URL)"

headers = {"User-Agent": "mindmark/0.x (+bookmark-validation)"}
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The User-Agent header is hardcoded to mindmark/0.x, which can become misleading as versions change. Consider using the actual package version (e.g., __version__) in the header so requests are correctly identifiable.

Suggested change
headers = {"User-Agent": "mindmark/0.x (+bookmark-validation)"}
headers = {"User-Agent": f"mindmark/{__version__} (+bookmark-validation)"}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants