Skip to content

mabo-du/lore

Repository files navigation

Local First Python 3.12+ PyQt6 MIT License

Lore 🎙️

Privacy-First, Local-Only Oral History Transcription & Archiving


Lore is a desktop application designed for historians, archivists, and researchers. It provides state-of-the-art AI transcription, speaker diarization, named entity recognition, and translation—100% offline, on your own hardware.

No data leaves your computer. No cloud subscriptions. Just powerful, open-source AI packaged into a clean, intuitive PyQt6 interface.

✨ Features

  • 🎧 Offline Transcription: Powered by faster-whisper, optimized for CPU inference with low memory overhead (< 8GB RAM).
  • 🗣️ Speaker Diarization: Automatically identifies and labels different speakers using pyannote.audio.
  • 🔍 Word-Level Confidence: Low-confidence words are visually highlighted so you can quickly spot potential hallucinations.
  • 🌍 Local Translation: Translate transcripts to over 200 languages completely offline using Meta's NLLB-200 model.
  • 📖 Custom Vocabulary: Provide local jargon, proper nouns, and historical terms to guide Whisper's decoding graph for maximum accuracy.
  • 🏷️ Named Entity Recognition: Uses GLiNER to automatically extract people, organizations, dates, and locations.
  • 📦 Archival Exporting: Export your work to the OHMS XML format or create an RFC 8493 BagIt archival package with SHA-256 checksum verification.
  • 🔎 Global Archive Search: A unified SQLite database (FTS5 + sqlite-vec) lets you instantly search across all your past projects using keyword or semantic/conceptual search.

🚀 Installation

Lore requires Python 3.12+ and is cross-platform (Windows, macOS, Linux).

  1. Clone the repository:

    git clone https://github.com/yourusername/lore.git
    cd lore
  2. Set up a virtual environment (recommended with uv or venv):

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install the application:

    pip install -e .

🎮 Usage

Start the Lore application from your terminal:

lore
  1. Select an Audio File: Click "Browse" to select any standard audio format (WAV, MP3, M4A).
  2. Configure Settings: Click the ⚙️ Settings icon to set your Custom Vocabulary.
  3. Transcribe & Diarize: Click "Transcribe" on the toolbar. If you want speaker labels, check the "Diarize" box.
  4. Edit & Review: Play the audio, click on segments to edit them, and review any low-confidence words highlighted in red.
  5. Export: Open the Metadata panel, fill out the project details, and export to OHMS XML or an Archival BagIt Package.

🏗️ Architecture

Lore is designed with strict sequential memory management to run on older hardware.

  • Models are loaded into memory one at a time (e.g., Whisper loads, transcribes, unloads -> NLLB loads, translates, unloads).
  • Heavy use of CTranslate2 (INT8 quantization) ensures models run blazingly fast without needing a dedicated GPU.
  • The UI runs asynchronously using PyQt6's QThread and Signals, keeping the interface completely responsive during heavy AI workloads.

🤝 Contributing

Lore is an open-source project. We welcome pull requests, bug reports, and feature requests. Please see our User Guide for more detailed workflows and documentation on the codebase.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Lore is a desktop application designed for historians, archivists, and researchers. It provides state-of-the-art AI transcription, speaker diarization, named entity recognition, and translation—100% offline, on your own hardware.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors