Tools for extracting and formatting bibliographic data from Zotero 8 databases.
Reads directly from Zotero's local SQLite database — no API key needed. Export collections as CSV or APA-formatted references, generate PDF cover images with thumbnails, and extract annotated PDFs with baked-in highlights and markdown notes. Includes a CLI for common workflows and a Python API for custom pipelines. Built for Zotero 8; older versions are untested and likely incompatible due to schema differences.
zotlib/
├── zotlib/ # Python library
│ ├── cli.py # CLI commands
│ ├── config.py # Database path discovery
│ ├── database.py # SQLite interface
│ ├── extractors.py # Data extraction functions
│ ├── exporters.py # Collection export (annotations + PDFs)
│ ├── backup.py # Zotero directory backup
│ ├── tables.py # Zotero database table definitions
│ ├── covers.py # PDF cover generation
│ ├── paths.py # Path resolution and filename utilities
│ └── formatters/apa.py # APA citation formatter
├── scripts/ # Utility scripts
│ ├── extract-annotations.js # Annotation extractor (interactive + headless)
│ ├── create-parent-item.js # Create parents for standalone PDFs
│ └── run-extract.sh # Shell wrapper for headless extraction
├── tests/ # Test suite
└── pyproject.toml # Project configuration
uv add zotlibFrom source:
git clone https://github.com/gitronald/zotlib.git
cd zotlib
uv syncFrom a specific branch:
uv add git+https://github.com/gitronald/zotlib.git@devRun zotlib init to auto-discover Zotero paths and save them to zotlib.toml:
zotlib initdatabase: /mnt/c/Users/rer/Zotero/zotero.sqlite
pdfs_dir: /mnt/i/My Drive/zotero-pdfs
Saved to zotlib.toml
The config file stores the database and linked PDFs directory:
[zotlib]
database = "/path/to/zotero.sqlite"
pdfs_dir = "/path/to/linked-pdfs"Path resolution priority (for both database and PDFs dir):
- CLI flag:
--database,--pdfs-dir - Environment variable:
ZOTERO_DATABASE - Config file:
zotlib.toml - Auto-discovery: Checks common locations (Linux, WSL, macOS)
Browse collections and inspect database schema. The show-tables command documents Zotero's largely undocumented SQLite table structure, including column descriptions and types.
# List available collections
zotlib show-collections
# Show database tables
zotlib show-tables
zotlib show-tables itemsExport collection data in multiple formats. Supports linked attachments via --pdfs-dir for PDFs stored outside Zotero's default storage.
# Export all tables as CSV
zotlib export-csv
# Export a collection as CSV
zotlib export-csv -c publications
# Format a collection as APA references
zotlib export-apa -c publications
# Generate cover images and thumbnails
zotlib export-covers -c publications
# Export annotated PDFs and markdown notes
zotlib export-annotations -c mycollectionArchive the entire Zotero data directory as a compressed .tar.bz2 file with a progress bar. Saves to data/backups/zotero-YYYY-MM-DD.tar.bz2 by default. Use -o to specify a custom output path or -d to point to a different database.
zotlib backup
zotlib backup -o ~/backups/zotero-2026-03-21.tar.bz2output/
├── export-csv/ # Bibliographic metadata
│ └── publications.csv
├── export-apa/ # APA-formatted references
│ └── publications.md
├── export-covers/ # PDF cover images
│ └── publications/
│ ├── fullsize/
│ └── thumbnails/
└── export-annotations/ # Annotated PDFs + notes
└── mycollection/
└── author-year-title/
├── paper.pdf
└── annotations.md
- Multi-attachment support: each PDF gets only its own annotations
- Standalone attachment support: PDFs added directly to a collection
- Linked attachment resolution via
--pdfs-dir - "REVIEW: " prefix stripping from titles
from zotlib import ZoteroDatabase, extract_cv_items, format_cv_as_apa
db = ZoteroDatabase("/path/to/zotero.sqlite")
items = extract_cv_items(db, collection_name="mypapers")
apa_output = format_cv_as_apa(items, output_path="output/apa.md")WIP — Utilities for Zotero's JavaScript console (Tools > Developer > Run JavaScript). The Zotero SQLite database should never be modified directly via Python — use these JS scripts (which run through Zotero's API) for any write operations.
Creates parent document items for standalone PDF attachments in a collection. Useful when PDFs were added directly without metadata — creates a parent item using the filename as the title and re-parents the attachment.
Extracts annotations from the selected item's PDFs as markdown. Auto-detects its context:
- Interactive (Tools > Developer > Run JavaScript): shows a file save dialog
- Headless (via HTTP debug API): writes to
~/Desktop/zotero-annotations/
To run headlessly:
./scripts/run-extract.shRequires: Settings > Advanced > "Allow other applications to communicate with Zotero"
The shell script should work on macOS where Zotero and the terminal share the same localhost. On WSL, the script calls Zotero's debug HTTP endpoint on 127.0.0.1:23119, but localhost does not bridge to the Windows host by default. You may need to use the Windows host IP or run the curl command from PowerShell instead.
| Type | Extracted Data |
|---|---|
| Highlight | Text + comment + color |
| Note | Comment text |
| Underline | Text + comment |
| Image | Comment only (image not exported) |
| Hex Code | Label |
|---|---|
#ffd400 |
yellow |
#ff6666 |
red |
#5fb236 |
green |
#2ea8e5 |
blue |
#a28ae5 |
purple |
#e56eee |
magenta |
#f19837 |
orange |