A self-hosted Gmail archiver that syncs emails via IMAP to .eml files on a PVC, with a three-pane web UI. Runs on any Kubernetes cluster with persistent storage.
- Full-text search across all folders using a Whoosh index β searches subject, from, to, and body with relevance ranking and body snippets
- Attachment download β click any attachment chip in the email detail pane to open or download it directly
- Persistent header cache β email list cache is saved to disk as
.cache.jsonper folder, surviving pod restarts with no rebuild needed - Search index progress bar β top bar shows indexing progress as a percentage with folder count while the Whoosh index builds on first startup
- Bulk export β two-stage export button: click once to build the zip on the PVC (with progress bar), then click Download Ready to stream it to your browser
- Syncs all Gmail labels to
.emlfiles on a persistent volume - Incremental sync β only fetches new emails on subsequent runs (state tracked per folder)
- 24-hour automatic sync schedule, with manual full sync and per-folder sync available from the UI
- Three-pane UI (folder list / email list / email detail) mimicking a traditional email client
- Virtual scrolling on the email list β handles folders with thousands of emails without lag
- Persistent disk-backed header cache β folders load instantly after the first access, survives pod restarts
- Full-text search across all folders via Whoosh index stored on the PVC
- Export any individual email as a
.emlfile directly from the UI - Download email attachments directly from the detail pane
- Bulk export of all emails as a zip file, preserving folder structure
- HTML email rendering in a sandboxed iframe, with plaintext fallback
- IMAP modified UTF-7 encoding β correctly handles label names containing
&(e.g. Work & Projects)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Browser β
β ββββββββββββββββ¬βββββββββββββββββββ β
β β Folder List β Email List β β top 42% β
β ββββββββββββββββ΄βββββββββββββββββββ€ β
β β Email Detail (HTML rendered) β β bottom 58% β
β βββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββ ββββββββββββββββββββββββ
β Frontend (nginx) ββββββΆβ Backend (FastAPI) β
β React SPA β β IMAP sync engine β
β mailarchive-frontendβ β Whoosh search index β
ββββββββββββββββββββββββ β APScheduler (24hr) β
ββββββββββββ¬ββββββββββββ
β
βββββββββββΌβββββββββββ
β PVC 50Gi β
β /mail/ β
β .sync_state.json β
β .index/ β
β INBOX/ β
β *.eml β
β .cache.json β
β Work/ β
β Personal/ β
β ... (your labels)β
βββββββββββββββββββββββ
Before deploying, you need a Gmail App Password:
- Go to your Google Account β Security
- Enable 2-Step Verification (required for app passwords)
- Go to Security β App passwords
- Create a new app password (name it "MailArchive")
- Copy the 16-character password β you'll need it below
# Backend
cd backend/
docker build -t your-registry/mailarchive-backend:latest .
docker push your-registry/mailarchive-backend:latest
# Frontend
cd ../frontend/
docker build -t your-registry/mailarchive-frontend:latest .
docker push your-registry/mailarchive-frontend:latestEdit k8s/secret.yaml with your Gmail address and app password:
stringData:
IMAP_USER: "yourname@gmail.com"
IMAP_PASS: "xxxx xxxx xxxx xxxx" # 16-char app password (spaces are fine)Edit k8s/backend-deployment.yaml and k8s/frontend-deployment.yaml and replace the image field:
image: your-registry/mailarchive-backend:latest
image: your-registry/mailarchive-frontend:latestEdit k8s/frontend-deployment.yaml, find the Ingress section and set your hostname:
host: mail.example.comkubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/pvc.yaml
kubectl apply -f k8s/backend-deployment.yaml
kubectl apply -f k8s/frontend-deployment.yaml# Check all pods are running
kubectl get pods -n mailarchive
# Watch backend logs (first sync starts automatically if no state file exists)
kubectl logs -n mailarchive -l component=backend -f
# Check sync and index status
kubectl port-forward -n mailarchive svc/mailarchive-backend 8000:8000
curl http://localhost:8000/api/status
curl http://localhost:8000/api/search/statusAll backend config is via environment variables in k8s/backend-deployment.yaml:
| Variable | Default | Description |
|---|---|---|
IMAP_HOST |
imap.gmail.com |
IMAP server hostname |
IMAP_PORT |
993 |
IMAP SSL port |
IMAP_USER |
(from secret) | Gmail address |
IMAP_PASS |
(from secret) | 16-character app password |
MAIL_ROOT |
/mail |
PVC mount path |
SYNC_INTERVAL_HOURS |
24 |
Hours between automatic syncs |
- First run: No
.sync_state.jsonis present, so a full sync of all labels kicks off automatically on pod startup. This will take a while for large mailboxes β watch the logs. - Subsequent runs: Only fetches emails with UIDs higher than the last seen UID per folder. Progress is saved after each folder completes, so a mid-sync pod restart won't lose work.
- State file: Stored at
/mail/.sync_state.jsonon the PVC. Delete this file to force a full re-sync on next startup. - Manual full sync: Click "Sync Now" in the top bar, or
POST /api/sync. - Per-folder sync: Hover over any folder in the sidebar and click the β³ icon to sync just that folder immediately, outside the normal 24-hour cycle.
- During sync: You can browse folders and read emails normally. The UI polls status every 3 seconds and updates folder counts as each label completes.
Search is powered by a Whoosh full-text index stored at /mail/.index/ on the PVC.
- On first startup, a background thread walks all existing
.emlfiles and indexes any not yet in the index. The top bar shows a progress bar while this runs. - New emails are indexed immediately as they are downloaded during sync.
- The index persists across pod restarts β it only rebuilds entries that are missing.
- Search covers subject (2x boost), from (1.5x boost), to, and full body text.
- Whoosh query syntax is supported:
from:amazon receipt,subject:"order confirmation",holiday OR travel,"exact phrase" - Results are displayed inline in the email list pane, replacing the folder view. Each result shows the source folder, date, from, subject, and a body snippet.
- Clear the search with Escape or the β button to return to normal folder browsing.
To check index progress or document count:
kubectl exec -n mailarchive <pod-name> -- python3 -c \
"import urllib.request; print(urllib.request.urlopen('http://localhost:8000/api/search/status').read().decode())"Each folder has a .cache.json file stored alongside its .eml files on the PVC. This contains the pre-sorted email header list (date, from, subject) used to populate the email list pane.
- Built on first access to a folder, then loaded from disk on all subsequent accesses including after pod restarts
- Invalidated automatically when a sync downloads new emails into that folder
- Eliminates the need to scan
.emlheaders on every folder open
The export feature builds a zip of all .eml files preserving the folder structure, then makes it available for download.
The export button has three states:
- β¬ Export All (grey) β click to start building the zip on the PVC
- Preparing X% (green progress bar) β zip is being built in the background, polling every 2 seconds
- β¬ Download Ready (yellow) β zip is complete, click to download to your browser
The zip is stored as a temp file on the PVC (/mail/.export-TIMESTAMP.zip) and is cleaned up automatically at the start of the next export build. It remains available for download until then, so you can close the browser and come back.
The zip structure mirrors the PVC folder layout:
mailarchive-export-20260407-120000.zip
INBOX/
1_abc123.eml
...
Work/
...
Personal/
...
This format is directly importable into Thunderbird (via ImportExportTools NG), Apple Mail, or any IMAP client. See the Importing Emails section below.
The .eml format is universally supported. Options for importing into a new provider:
Thunderbird β install the ImportExportTools NG add-on, then use Tools β ImportExportTools NG β Import all messages from a directory. Point it at each folder from the zip.
Apple Mail β drag and drop .eml files onto a mailbox, or use File β Import Mailboxes.
Any IMAP provider β use a script with imaplib to APPEND each .eml back to the correct folder on the new server. This is the most universal approach and works with any provider.
Convert to Mbox β if your new provider prefers Mbox format, each folder's .eml files can be concatenated into a single .mbox file with a simple Python script.
/mail/
.sync_state.json β last synced UID per folder
.index/ β Whoosh full-text search index
INBOX/
.cache.json β persistent header cache for this folder
1_abc123def456.eml
2_fed654cba321.eml
...
Work/
.cache.json
...
Work & Projects/ β spaces and & in folder names are supported
.cache.json
...
The PVC is provisioned at 50Gi with no explicit StorageClass set, so it will use your cluster's default. If you need more space, provided your StorageClass supports online expansion:
kubectl patch pvc mailarchive-pvc -n mailarchive \
-p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'Because all state is stored on the PVC, you can safely scale down, update the image, and scale back up:
kubectl scale deployment mailarchive-backend -n mailarchive --replicas=0
# push new image
kubectl scale deployment mailarchive-backend -n mailarchive --replicas=1The new pod will read .sync_state.json from the PVC and continue from where it left off. The search index and all header caches are also preserved. No automatic sync is triggered on startup when the state file exists β use Sync Now if you want one immediately.
Note: The deployment uses
strategy: Recreaterather thanRollingUpdatebecause RWO volumes can only be mounted by one pod at a time.
| Endpoint | Method | Description |
|---|---|---|
GET /api/status |
GET | Sync status, last run time, emails downloaded, index status |
POST /api/sync |
POST | Trigger a full sync of all folders |
POST /api/sync/folder/{folder} |
POST | Trigger immediate sync of a single folder |
GET /api/folders |
GET | List all folders with email counts |
GET /api/emails/{folder} |
GET | List emails in a folder (?skip=0&limit=5000) |
GET /api/email/{folder}/{filename} |
GET | Full email detail β body, headers, attachments |
GET /api/email/{folder}/{filename}/download |
GET | Download the raw .eml file |
GET /api/email/{folder}/{filename}/attachment/{index} |
GET | Download a specific attachment by index |
GET /api/search?q={query} |
GET | Full-text search across all folders |
GET /api/search/status |
GET | Search index document count and status |
POST /api/export/build |
POST | Start building the export zip in the background |
GET /api/export/status |
GET | Export build progress β state, folders done/total, size |
GET /api/export/download |
GET | Download the completed export zip |
Edit the GMAIL_LABELS list near the top of backend/main.py, then rebuild and push the backend image. New labels will have their folders created automatically on the next sync.
# Backend
cd backend/
pip install -r requirements.txt
IMAP_USER=you@gmail.com IMAP_PASS="xxxx xxxx xxxx xxxx" MAIL_ROOT=./mail python main.py
# Frontend (in a separate terminal)
cd frontend/
npm install
npm run dev # Vite proxies /api calls to localhost:8000