- Tracks announcements of published archives.
- Serves archive list to requesting clients.
- CIDs are derived from archvies.
- Indexes CIDs on IPFS by converting their payload to text via
pdftotext(poppler-utils), and sending the text to an LLM for title, field, topic, niche, and 10 keywords.
go build -o ipfs-archive-tracker .A pre-built image is published to GHCR on every push to main and on version tags:
docker pull ghcr.io/gipplab/ipfs-archive-tracker:mainOr build locally:
docker build -t ipfs-archive-tracker .Run container (persist data and expose only public API):
docker run -d --name ipfs-archive-tracker \
-p 8385:8385 \
-v "$(pwd)/tracker-data:/data" \
-v "$(pwd)/.api_key:/data/.api_key:ro" \
ghcr.io/gipplab/ipfs-archive-tracker:main \
-o /data -public-port 8385 -port 8384See docker-compose.yml for a full Compose example. If Kubo runs in another container/network, set -kubo to that service URL instead.
Indexing needs an API key: .api_key in -o or cwd, or SAIA_API_KEY.
Two servers by default: internal (Web UI: 127.0.0.1:8384) and public (Archives API: 0.0.0.0:8385). Expose only the public port externally.
# Start both servers; CIDs are taken from archives.json (from announce / IPNS refresh):
./ipfs-archive-tracker
# CLI: index pending CIDs from archives and exit (no web UI):
./ipfs-archive-tracker -cli
# Custom ports:
./ipfs-archive-tracker -o ./index-data -port 9000 -public-port 9001All settings can be provided as CLI flags or environment variables. Flags take precedence.
| Flag | Env var | Default | Description |
|---|---|---|---|
-o |
TRACKER_DATA_DIR |
. |
Output directory for index files |
-gateway |
TRACKER_GATEWAY |
https://ipfs.io |
IPFS gateway base URL |
-kubo |
KUBO_API |
http://localhost:5001 |
Kubo API URL for IPNS resolution |
-workers |
TRACKER_WORKERS |
4 |
Number of concurrent processing workers |
-model |
TRACKER_MODEL |
meta-llama-3.1-8b-instruct |
LLM model for keyword extraction |
-fallback-model |
TRACKER_FALLBACK_MODEL |
llama-3.3-70b-instruct |
Model to try if primary returns 429 |
-api-base |
TRACKER_API_BASE |
https://chat-ai.academiccloud.de/v1 |
OpenAI-compatible API base URL |
-spacing |
TRACKER_SPACING |
100ms |
Minimum delay between dispatching CIDs |
-cli |
TRACKER_CLI |
false |
Index pending CIDs from archives and exit (no web UI) |
-port |
TRACKER_PORT |
8384 |
Web UI port (localhost only) |
-public-port |
TRACKER_PUBLIC_PORT |
8385 |
Public API port (archives only, bind all interfaces) |
-refresh |
TRACKER_REFRESH |
10m |
Interval to refresh IPNS for all archives (0 to disable) |
With the default -api-base (Chat AI / Academic Cloud), rate limits from the API are 1000 req/min, 10000/hour, 50002/day. Current models and exact API IDs: see docs/chat-ai-api.md or GET https://chat-ai.academiccloud.de/v1/models (with your API key).
PDF→text runs in a subprocess (pdftotext from poppler-utils); only one conversion runs at a time so peak RAM stays bounded. Optionally set GOMEMLIMIT=8GiB.
| Method | Path | Description |
|---|---|---|
| POST | /api/archives/announce |
Send a new archive. Response: { "status": "ok" }. Use GET /api/archives to fetch the list. |
| GET | /api/archives |
Get the full list of archive IDs and CID counts. Response: same archives array. |
Announce body: { "archive_id": "k51..." } (tracker resolves via IPNS and reads cids from the document), or { "archive_id": "...", "cids": ["Qm...", ...] }, or a rich cids array with objects. Gateway = -gateway. Example (public port):
curl -s -X POST http://localhost:8385/api/archives/announce \
-H "Content-Type: application/json" \
-d '{"archive_id":"k51qzi5uqu5dkq7ek83z2tb3muanwx7y59e5ixuk0mhume92aq98dnystqo5ih"}'Expose only port 8385 (not 8384).
| File | Description |
|---|---|
keyword_index.json |
Indexed metadata keyed by CID (only persisted file) |
Example entry in keyword_index.json:
{
"bafyrei...": {
"cid": "bafyrei...",
"title": "Attention Is All You Need",
"broad_field": "Computer Science",
"sub_topic": "Machine Learning",
"research_niche": "Transformer Architectures for Sequence Modeling",
"keywords": ["transformer", "attention mechanism", "..."],
"indexed_at": "2026-03-04T14:30:00Z"
}
}