Inspired by Komga-Enhanced’s integration of download capabilities, Archiflow reimagines the relationship between content storage and retrieval. Instead of bolting a downloader onto an existing reader, this repository provides a self-orchestrating archival pipeline that treats every file as a node in a knowledge graph, not just a static asset.
The internet’s current model is a firehose: data pours in, sits in silos, and decays. Archiflow reframes that paradigm—imagine a library where every book not only sits on the shelf but also knows where it came from, how it connects to other books, and can hand-deliver itself to you in the format you need, when you need it. This is archive-as-process, archive-as-organism.
“Stop storing. Start orchestrating.”
Archiflow is an intelligent archival middleware that bridges source detection, format normalization, semantic indexing, and adaptive distribution. It is not a downloader. It is a curation engine that watches, captures, structures, and redistributes content without requiring constant manual intervention.
Designed for archivists, researchers, media collectors, and digital historians, Archiflow transforms passive cold storage into a living, queryable, streamable system. You define the rules—it handles the rest.
| Traditional Approach | Archiflow Approach |
|---|---|
| Manual file organization | Rule-based auto-classification |
| Siloed storage | Graph-linked retrieval |
| Rigid formats | On-the-fly transformation |
| Static archives | Self-updating datasets |
Archiflow mirrors the neuron: each piece of content is a synapse, connected to others via tags, metadata, and lineage. Downloads are not endpoints—they are temperature events within a larger circuit.
Detect and capture content from heterogeneous sources without hardcoded parsers. The system learns source signatures over time and adjusts its extraction strategies accordingly.
- Source Fingerprinting – Recognizes patterns across platforms
- Smart Retry Logic – Exponential backoff with contextual awareness
- Integrity Verification – SHA-256 checksumming with Merkle tree validation
- Rate-Limit Compliance – Built-in politeness zones to avoid flagging
Converts raw incoming data into a canonical form. Whether you receive a compressed archive, a streaming video chunk, or a structured document, Archiflow unifies the representation.
- Format Unification – Auto-converts between 40+ formats
- Metadata Enrichment – Extracts and infers tags, descriptions, and relationships
- Deduplication – Content-level hashing, not just filename matching
- Language Detection – Identifies multilingual content and associates locale markers
No deep folder hierarchies. Every item becomes a node with typed edges.
- Relationship Mapping – Series, volumes, chapters, editions, derivatives
- Version Tracking – Audit trail for every transformation event
- Queryable API – SPARQL-like queries with RESTful endpoints
- Temporal Tagging – “Harvested at,” “Last accessed,” “Derived from” timestamps
The system does not merely serve files—it prepares them for consumption across devices and contexts.
- Streaming Ready – On-the-fly transcoding for bandwidth-constrained clients
- Multilingual Delivery – Serves localized versions if multiple language tracks exist
- Adaptive Resolution – Scalable output for mobile, tablet, or 4K environments
- Offline Bundles – Distills entire collections into portable archives
Humans are not required in the loop. Archiflow runs as a daemon, self-healing when network interruptions occur.
- Scheduled Scanning – Cron-free, human-readable timing DSL
- Retry Queues – Dead-letter buckets with manual review triggers
- Health Dashboard – Real-time web UI with websocket updates
- Alerts via Webhook – Integration with Slack, Discord, email, or custom endpoints
┌──────────────────┐ ┌─────────────────────┐ ┌────────────────┐
│ Ingest Sink │────▶│ Normalization Pipe │────▶│ Graph Store │
│ (listeners) │ │ (parsers/cleaners) │ │ (nodes+edges) │
└──────────────────┘ └─────────────────────┘ └───────┬────────┘
│
▼
┌──────────────────┐ ┌─────────────────────┐ ┌────────────────┐
│ Distribution │◀────│ Query Resolver │◀────│ Index Engine │
│ (stream/bundle) │ │ (API + UI) │ │ (Lucene) │
└──────────────────┘ └─────────────────────┘ └────────────────┘
Each module runs as an independent process communicating over gRPC. Failure in one does not crash the system—the graph remains intact.
- Digital Library Management – Organize research papers, books, and datasets with automatic citation extraction
- Media Collection Curation – Keep movie series, anthology books, and soundtracks linked by edition and release year
- Historical Web Archiving – Capture evolving web resources with change detection and rollback snapshots
- Enterprise Document Flow – Route ingested PDFs through OCR, summarization, and compliance tagging
- Personal Knowledge Base – Sync clippings, highlights, and annotations across devices using semantic search
- A POSIX-compatible environment (macOS, Linux, WSL2)
- Python 3.11+ runtime
- 4 GB RAM recommended for full graph indexing
- Port 8080 available for web interface
- Download the latest release bundle from this repository.
- Run the initialization script:
./archiflow init - Point the configuration wizard at your source directories.
- Access the control panel at
http://localhost:8080.
No package managers, no cloud dependencies, no external databases. The entire system runs locally with an embedded graph engine.
# archiflow.yml
ingest:
sources:
- type: local
path: /data/incoming
- type: rss
url: "https://example.com/feed.xml"
interval: 3600
normalization:
converters:
- from: *.cbr
to: cbz
- from: *.epub
to: html-single
graph:
auto_link:
- field: series
threshold: 0.85
- field: author
exact: true
distribution:
web:
port: 8080
cors_origins: ["http://localhost:3000"]Plugins allow you to add custom ingesters, normalizers, exporters, and even graph rules.
| Plugin Type | Hook | Example |
|---|---|---|
| Ingester | on_receive |
Monitor S3 bucket |
| Normalizer | on_parse |
OCR scanned PDFs |
| Exporter | on_serve |
Generate OPDS catalog |
| Rule | on_link |
Auto-tag by language family |
Plugin directory: ~/.archiflow/plugins/
- All content stays local unless explicitly configured for remote sync.
- No telemetry, no anonymous usage stats, no external analytics.
- Optional encryption-at-rest using age encryption keys.
- Access tokens (not passwords) for API authentication.
This project is released under the MIT License. You are free to use, modify, and distribute it, provided the original copyright notice is included.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
View Full License — © 2026 The Archiflow Contributors
Archiflow is a framework for organizing, transforming, and distributing your own content. Users are solely responsible for ensuring they have the legal right to archive, modify, or redistribute any material processed through this system. The authors assume no liability for misuse, copyright infringement, or unauthorized access facilitated by configuration errors. This software is provided “as is” without warranty of any kind, express or implied.
- Issues: Use the GitHub issue tracker for bugs and feature requests.
- Discussions: Start a Q&A thread for configuration help.
- Wiki: Contribute guides, recipes, and plugin documentation.