Semantra 0.2.0
No due date
0% complete
Semantra 0.2.0
A redesign of Semantra to be more efficient and versatile. With these changes, Semantra will be easily installable and able to be run stand-alone, with documents added/removed through the UI. The changes introduced will likely not be backwards-compatible with old, stored embeddings but will be a strong step towards stability.
Robustness
- Un…
Semantra 0.2.0
A redesign of Semantra to be more efficient and versatile. With these changes, Semantra will be easily installable and able to be run stand-alone, with documents added/removed through the UI. The changes introduced will likely not be backwards-compatible with old, stored embeddings but will be a strong step towards stability.
Robustness
- Unit tests
- Linting
- Pre-commit hooks
- GitHub actions, including to deploy to PyPI
Faster document storage and retrieval
- Using annlite and docarray
- Deprecate using Annoy as it doesn't scale well for large collections of documents and poses installation problems
Additional formats
- Rewrite PDF frontend renderer to use PDF.js to avoid needing backend PDF rendering
- CSV with indexing certain columns
- Audio and video with transcription using faster-whisper
- Ability to represent different processing options per file and memoize results (potentially requires central sqlite db)
Ease of installation
- Use PyInstaller to create an installer that non-technical users can employ
- Ability to export document collections as entirely web-runnable demos using Transformers.js
Website
- A dedicated documentation and demo website at semantra.ai (already registered)
Extensibility and documentation
- A plug-in system to build additional document loaders, frontend document renderers
- Well-documented APIs
- Welcoming to contributors
- Additional guides (contributing, installing, deploying on a server, recipes, how embeddings are stored/cached)
Probably not for this release
- Add a terminal-only search UI using Textual