Semantra 0.2.0

A redesign of Semantra to be more efficient and versatile. With these changes, Semantra will be easily installable and able to be run stand-alone, with documents added/removed through the UI. The changes introduced will likely not be backwards-compatible with old, stored embeddings but will be a strong step towards stability.

Robustness

Un…

Semantra 0.2.0

A redesign of Semantra to be more efficient and versatile. With these changes, Semantra will be easily installable and able to be run stand-alone, with documents added/removed through the UI. The changes introduced will likely not be backwards-compatible with old, stored embeddings but will be a strong step towards stability.

Robustness

Unit tests
Linting
Pre-commit hooks
GitHub actions, including to deploy to PyPI

Faster document storage and retrieval

Using annlite and docarray
Deprecate using Annoy as it doesn't scale well for large collections of documents and poses installation problems

Additional formats

Rewrite PDF frontend renderer to use PDF.js to avoid needing backend PDF rendering
CSV with indexing certain columns
Audio and video with transcription using faster-whisper
Ability to represent different processing options per file and memoize results (potentially requires central sqlite db)

Ease of installation

Use PyInstaller to create an installer that non-technical users can employ
Ability to export document collections as entirely web-runnable demos using Transformers.js

Website

A dedicated documentation and demo website at semantra.ai (already registered)

Extensibility and documentation

A plug-in system to build additional document loaders, frontend document renderers
Well-documented APIs
Welcoming to contributors
Additional guides (contributing, installing, deploying on a server, recipes, how embeddings are stored/cached)

Probably not for this release

Add a terminal-only search UI using Textual

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantra 0.2.0

Semantra 0.2.0

Robustness

Semantra 0.2.0

Robustness

Faster document storage and retrieval

Additional formats

Ease of installation

Website

Extensibility and documentation

Probably not for this release