TopicKnowledgeCrawler

TopicKnowledgeCrawler is the crawler and preparation layer behind infl0. It discovers content from sources such as RSS feeds, HTML listings and podcast feeds, turns them into normalized content items and prepares payloads for infl0.

The canonical implementation is the portable step layer in tkcrawler.steps. n8n uses these steps in Python Code nodes, but the same steps can be composed from plain Python or another workflow system.

What This Project Does

Normalize source rows from infl0 or local tables.
Inspect source policies such as HTTP cache headers, RSS TTL and retry hints.
Plan crawl dispatch based on source status and policy.
List candidates without fetching every detail page.
Filter candidates against history and refresh windows.
Fetch and finalize articles or podcast episodes.
Build POST /api/crawler/ingest payloads for infl0.
Build POST /api/crawler/source-status payloads for infl0 operator/source health views.

All crawler implementation lives under tkcrawler. New integrations should use tkcrawler.steps for explicit orchestration or tkcrawler.pipeline for the reference Python flow.

Installation

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .

Run tests:

TLDEXTRACT_CACHE=/private/tmp/tkcrawler-tldextract-cache .venv/bin/python3.13 -m pytest

Use the Python version available in your environment if it differs from .venv/bin/python3.13.

Step CLI

Every portable step can be called through the CLI runner:

python -m tkcrawler.cli.run_step normalize_source < input.json
python -m tkcrawler.cli.run_step plan_dispatch < input.json
python -m tkcrawler.cli.run_step list_candidates < input.json
python -m tkcrawler.cli.run_step fetch_detail < input.json
python -m tkcrawler.cli.run_step finalize_item < input.json
python -m tkcrawler.cli.run_step build_ingest_body < input.json

CLI input accepts either a raw item object or an envelope:

{
  "item": {
    "crawl_key": "https://example.com/feed.xml",
    "type": "rss",
    "url": "https://example.com/feed.xml"
  },
  "context": {
    "now": "2026-05-13T10:00:00+00:00"
  }
}

Python Flow Example

For simple Python usage, compose the step flow through tkcrawler.pipeline:

from tkcrawler.pipeline import crawl_source_ingest_bodies

source = {
    "crawl_key": "https://example.com/feed.xml",
    "type": "rss",
    "url": "https://example.com/feed.xml",
    "source_status": "ready",
}

payloads = crawl_source_ingest_bodies(source)

A runnable no-network example is available at examples/python_step_flow.py.

n8n

n8n orchestration is documented in n8n/README.md. The n8n workflows use the same *_item(...) functions from tkcrawler.steps that a plain Python flow would use.

Development

Local environment template: .env.example
Contributor guide: CONTRIBUTING.md

Workflow Diagram

The crawler workflow is documented as domain steps so it can be implemented by n8n, the CLI or another orchestrator. See docs/WORKFLOW_DIAGRAM.md for the Mermaid diagram.

infl0 Contracts

Ingest payload: docs/INGEST_API.md
Source health payload: docs/SOURCE_STATUS_API.md
Content item model: docs/CONTENT_ITEM_MODEL.md

Architecture Notes

Target architecture: docs/TARGET_ARCHITECTURE.md
Current migration notes: docs/IMPLEMENTATION_PLAN.md
Planned changes: docs/PLANNED_CHANGES.md
Architecture decisions: docs/adr/

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.github		.github
config		config
docs		docs
examples		examples
n8n		n8n
src/tkcrawler		src/tkcrawler
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TopicKnowledgeCrawler

What This Project Does

Installation

Step CLI

Python Flow Example

n8n

Development

Workflow Diagram

infl0 Contracts

Architecture Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TopicKnowledgeCrawler

What This Project Does

Installation

Step CLI

Python Flow Example

n8n

Development

Workflow Diagram

infl0 Contracts

Architecture Notes

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages