Transcriptum

A surgical tool for transforming image-only historical manuscripts into fully searchable PDFs, with a manual transcription editor and multi-format export.

What is Transcriptum?

Transcriptum is a self-hosted web application designed for historians, archivists, and genealogical researchers working with scanned historical documents. It runs entirely on your local network — no cloud, no accounts, no subscriptions.

The core workflow:

Upload a scanned PDF (image-only, or pdf)
Navigate page by page using the high-resolution viewer
Manually transcribe the handwritten content into the sidebar editor
Inject the transcription as an invisible text layer (PDF Text Rendering Mode 3) into the PDF
Download a fully searchable PDF that works with Cmd+F / Ctrl+F in any standard viewer

Features

Feature	Description
PDF Viewer	High-resolution rendering via PDF.js with selectable text layer
Per-page transcription	Each page has its own independent text editor
Invisible text injection	Injects text at render mode 3 (invisible, fully searchable — PDF spec §9.3.6)
Smart text detection	Detects existing text layers on upload and pre-fills the editor automatically
Selective injection	Build searchable PDF for the full document or just the active page
Zoom with margin scroll	Zoom in and scroll to document margins for edge annotations
Reference marker	Static horizontal guide line for tracking your reading position
Document search	Internal search across all transcribed pages with highlighted snippets
Export .txt	Full transcription with APA-style header, organised by page number
Export .docx	Word-compatible export with the same APA header and page sections
Work Notes	Ephemeral browser-local scratchpad (never sent to server)
Dark & Sepia themes	Eye-friendly themes for long research sessions
Persistence	All transcription data survives page refresh and browser restarts
Deep file deletion	Removes source, all outputs, and metadata in one action
No login required	Single-user local tool — open and use immediately
Docker-ready	Runs on TrueNAS SCALE, Proxmox LXC, Unraid, or any Linux host

Project Structure

transcriptum/
├── app.py                  ← Flask backend (all API routes)
├── requirements.txt        ← Python dependencies
├── Dockerfile              ← Container build instructions
├── docker-compose.yml      ← Service definition with volume mounts
├── .gitignore
├── README.md
├── templates/
│   └── index.html          ← Complete single-page UI
├── static/                 ← Reserved for future static assets
└── data/                   ← Created at runtime (not committed)
    ├── uploads/            ← Source PDFs uploaded by the user
    ├── outputs/            ← Generated searchable PDFs
    └── meta/               ← Per-document transcription JSON files

Installation

Prerequisites

Docker and Docker Compose installed on the host
A folder on persistent storage for the data/ directory

Option 1 — Any Linux Host (simplest)

# 1. Clone the repository
git clone https://github.com/YOUR_USERNAME/transcriptum.git
cd transcriptum

# 2. Create data directories
mkdir -p data/uploads data/outputs data/meta

# 3. Build and start
docker compose up -d --build

# 4. Open in your browser
http://localhost:5000

To stop: docker compose down To update: git pull && docker compose up -d --build

Option 2 — TrueNAS SCALE

TrueNAS SCALE supports Docker Compose via the Shell. The data directories should live on a dataset so they survive container recreation.

# 1. SSH into your TrueNAS machine (or open the Shell in the UI)

# 2. Create a dataset for the app (via UI or CLI)
#    Example path: /mnt/tank/apps/transcriptum

# 3. Navigate to the dataset
cd /mnt/tank/apps/transcriptum

# 4. Clone the repository
git clone https://github.com/YOUR_USERNAME/transcriptum.git .

# 5. Create data directories on the dataset (persistent storage)
mkdir -p data/uploads data/outputs data/meta

# 6. Build and start
docker compose up -d --build

# 7. Access from any device on your network
http://TRUENAS_IP:5000

Important on TrueNAS SCALE:

Use a ZFS dataset (not the boot pool) for the data/ folder
The docker-compose.yml already maps ./data/* to /data/* inside the container
If you use a custom dataset path, edit the volumes: section of docker-compose.yml:

volumes:
  - /mnt/tank/apps/transcriptum/data/uploads:/data/uploads
  - /mnt/tank/apps/transcriptum/data/outputs:/data/outputs
  - /mnt/tank/apps/transcriptum/data/meta:/data/meta

Option 3 — Proxmox (LXC Container)

Recommended: create an Ubuntu 24.04 LXC container with Docker installed.

# Inside the LXC container:

# 1. Install Docker (if not already present)
apt update && apt install -y docker.io docker-compose-plugin
systemctl enable --now docker

# 2. Clone the repository
git clone https://github.com/YOUR_USERNAME/transcriptum.git
cd transcriptum

# 3. Create data directories
mkdir -p data/uploads data/outputs data/meta

# 4. Build and start
docker compose up -d --build

# 5. Access from the Proxmox network
http://LXC_IP:5000

To expose on a specific port, change ports in docker-compose.yml:

ports:
  - "8080:5000"   # access on port 8080 instead

Option 4 — Unraid

Install the Community Applications plugin if not present
Install the Docker Compose Manager plugin
Create a new compose stack, paste the contents of docker-compose.yml
Update the volume paths to point to your Unraid array:

volumes:
  - /mnt/user/appdata/transcriptum/uploads:/data/uploads
  - /mnt/user/appdata/transcriptum/outputs:/data/outputs
  - /mnt/user/appdata/transcriptum/meta:/data/meta

Start the stack and access at http://UNRAID_IP:5000

Folder Permissions

The container runs as root internally. The data/ directories need to be readable and writable by the Docker process.

# Set permissions on the data directories (run on the host)
chmod -R 755 data/
# or if you encounter permission errors:
chown -R 1000:1000 data/

On TrueNAS SCALE, if you get permission errors:

Go to Storage → Datasets → transcriptum/data
Edit Permissions → set User to root, Group to root
Check "Apply permissions recursively"

Updating

# Pull latest changes
git pull

# Rebuild and restart (data is preserved in the mounted volumes)
docker compose up -d --build

Data Backup

All user data lives in three folders:

Folder	Contents	Priority
`data/meta/`	JSON transcription files (your work)	Critical
`data/uploads/`	Original source PDFs	High
`data/outputs/`	Generated searchable PDFs	Recoverable

Minimum backup: just data/meta/ — this contains all your transcription text and can regenerate outputs at any time.

Tech Stack

Layer	Technology
Backend	Python 3.11 + Flask
PDF processing	PyMuPDF (fitz)
PDF viewer	PDF.js v3.11
Image import	Pillow
Word export	python-docx
Container	Docker + Docker Compose

License

MIT License — free to use, modify, and distribute.

Screenshots

Contributing

This project was built for personal archival research. Pull requests and issues are welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcriptum

What is Transcriptum?

Features

Project Structure

Installation

Prerequisites

Option 1 — Any Linux Host (simplest)

Option 2 — TrueNAS SCALE

Option 3 — Proxmox (LXC Container)

Option 4 — Unraid

Folder Permissions

Updating

Data Backup

Tech Stack

License

Screenshots

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
screenshots		screenshots
static		static
templates		templates
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Transcriptum

What is Transcriptum?

Features

Project Structure

Installation

Prerequisites

Option 1 — Any Linux Host (simplest)

Option 2 — TrueNAS SCALE

Option 3 — Proxmox (LXC Container)

Option 4 — Unraid

Folder Permissions

Updating

Data Backup

Tech Stack

License

Screenshots

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages