Skip to content

refactor: per-lib database artifacts with consolidate pipeline#56

Merged
laradji merged 1 commit intomainfrom
emdash/refactor-per-lib-database-artifacts-62s
Apr 11, 2026
Merged

refactor: per-lib database artifacts with consolidate pipeline#56
laradji merged 1 commit intomainfrom
emdash/refactor-per-lib-database-artifacts-62s

Conversation

@laradji
Copy link
Copy Markdown
Owner

@laradji laradji commented Apr 11, 2026

Summary

  • Per-lib artifact databases: The scraper now writes one .db file per library into ./artifacts/ instead of inserting directly into the main database. Each artifact carries its own lib_id in the meta table and is self-contained.
  • New cmd/consolidate command: Merges all per-lib artifact .db files from ./artifacts/ into the main deadzone.db in a single atomic transaction. Idempotent — replaces existing rows per lib_id with a full delete-then-insert within one tx.
  • OpenArtifact in internal/db: New entry point for opening per-lib databases with lib_id identity tracking, validation, and mismatch detection (ErrArtifactLibIDMissing, ErrArtifactLibIDMismatch).
  • Schema version bump to v2: Adds schema_version to the meta table and cross-checks on open, surfacing ErrSchemaMismatch for incompatible databases.
  • Justfile recipes: Added scrape, consolidate, serve, and clean recipes for the new two-stage pipeline.

Motivation

Decouples per-lib scraping from the shared database so individual libraries can be re-scraped independently (just scrape lib=/org/project) without touching other libs' data. The consolidate step is explicit and atomic — partial failures leave the main DB untouched.

Test plan

  • just test — new consolidate_test.go covers validation pass (embedder mismatch, schema mismatch, missing lib_id), merge pass (single/multi artifact, idempotent re-merge), and rollback on mid-merge failure
  • just build — all three commands compile
  • just scrape lib=/some/lib && just consolidate — end-to-end artifact creation and merge

Fixes #28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Per-lib database artifacts: split scraping output for isolated updates

1 participant