Skip to content

minne v1.0.5

Latest

Choose a tag to compare

@github-actions github-actions released this 26 Jun 12:52

1.0.5 (2026-06-24)

Highlights:

  • No more headless_chrome or chromium dependency. We now use servo-fetch and pdfium-render for the previous usecases. This means the total size is lighter, but since the new dependencies are included the binaries are a bit bigger. It should use less system resources as well.
  • Another big win is that we now rebuild the indexes once a day instead of every ingestion. This saves, depending on the size of the db, a significant amount of time.

All changes:

  • Infra: CI workflow fixes. CI is now a nix flake check which includes compilation, caching and running tests, clippy, fmt, validation for ort version.
  • Docker-compose: The example now references the ghcr image, this is so we can remove the Dockerfile and reducing maintenance scope.
  • Refactor: web scraping now uses servo-fetch (pure-Rust Servo engine) and PDF rendering uses pdfium-render (direct PDFium bindings) — reduces Docker image size by ~300MB, improves startup time significantly for PDF rendering, and provides more stable output
  • Fix: added pkgs.libglvnd to LD_LIBRARY_PATH in devenv so Servo engine can find libEGL.so at runtime
  • Fix: updated Dockerfile to add libegl1 libegl-mesa0 libgles2 libfontconfig1 libfreetype6 runtime dependencies for servo-fetch
  • Docs: updated architecture, features, and installation docs to reflect the new web processing stack
  • Fix: added pre-commit hooks to further maintain code consistency.
  • Security: updated some deps because dependabot told me, good bot.
  • Refactor: deduplicated test database setup across common/src/storage/.
  • Refactor: split knowledge-graph.js monolith into focused functions.
  • Evaluations: simplified crate layout — linear pipeline, sharded-only converted store, in-memory ingestion, db/ and cli/ modules; namespace reuse state in corpus manifest (removed cache/snapshots/); no legacy JSON/history compatibility (re-run --warm after upgrade)
  • Performance: ingestion skips per-task index rebuild; worker runs scheduled REBUILD INDEX (default every 24h via index_rebuild_interval_secs, 0 disables)
  • Performance: ingestion persists all artifacts in a single SurrealDB transaction per task (atomic replace by task id)
  • Performance: entity embeddings during ingestion use batched embed_batch, matching chunk embedding
  • Fix: ingestion reclaims tasks after a successful persist without re-running the pipeline when mark_succeeded failed
  • Fix: content deletion clears graph relationships via shared TextContent::clear_ingested_children
  • Fix: regression re suggestion of relationships
  • Internal: extracted duplicate entity+embedding patterns into HasEmbedding and EmbeddingRecord traits with generic store_with_embedding, delete_by_source_id, and vector_search on SurrealDbClient.
  • Infra: ort-version file removed — version inlined in flake.nix and devenv.nix; release.yml reads it via nix eval .#lib.ortVersion from the plan job
  • Infra: screenshot-graph.webp and .dockerignore deleted