Skip to content

source feeds

GitHub Actions edited this page Jun 2, 2026 · 3 revisions

Source Feeds

You will learn how to configure a direct upstream feed that downloads from HTTP, HTTPS, or a local file.

What a source feed is

A source feed fetches content from an external URL on a fixed cadence, processes the raw download through a pipeline of transformations, and produces a normalized IP set.

The current feed pipeline is IPv4-oriented. Use ipv: ipv4 for source feeds in normal operation. The standalone iprange CLI has IPv6 mode, but public feed search, enrichment, and critical-infrastructure overlap are IPv4-only in this release.

Key fields

Field Required Description
name yes (YAML key) Unique feed identifier — used as filename, URL slug, and reference key
url yes Download URL — https://, http://, or file:///
frequency no Minutes between automatic checks. If omitted or 0, the source is not auto-scheduled.
output yes ipset (one IP per line) or netset (one CIDR per line)
category yes for normal public feeds Category key from categories.yaml; required for normal public taxonomy participation.
processor yes List of transformation steps applied to the download
info recommended Markdown description shown on the public feed-detail page
license recommended SPDX identifier or free-text license
maintainer recommended Feed maintainer name

URL types

URL form Behavior
https://... Standard HTTPS download
http://... HTTP download (use HTTPS when available)
file:///absolute/path Read a local file

Environment variable interpolation is supported in URLs:

url: https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-ASN&license_key=${MAXMIND_LICENSE_KEY}&suffix=tar.gz

If the real URL contains a secret, set attributes.public_url to the sanitized URL you want users and APIs to see.

Custom download options

Most feeds use the default HTTP downloader. For form-style exports or APIs that require headers, set curl-like options under attributes.downloader_options:

attributes:
  downloader_options: >-
    --data 'export_type=text'
    --header 'Accept: text/plain'

downloader_options are parsed literally. URL fields support environment variable templates, but downloader option values are not environment-expanded. Do not put real secrets in catalog YAML.

Supported options are:

  • --data, --data-raw, or -d
  • --request or -X
  • --referer
  • --user or -u
  • --header or -H

The --data=..., --request=..., --referer=..., and --user=... forms are also accepted. Header values must use the separated form, for example --header 'Accept: text/plain'.

Frequency

frequency sets the number of minutes between automatic checks. Common values:

Value Meaning
5 Check every 5 minutes (aggressive, for fast-changing feeds)
30 Every 30 minutes
1440 Daily
10080 Weekly
0 No auto-scheduling (static feeds, artifact children)

A frequency of 0 does not mean the feed is dead. The scheduler still detects configuration changes and queues reprocessing when the source definition changes.

Processors

Processors form a pipeline — each step transforms the data before passing it to the next. Common processors:

Processor Purpose
remove_comments Strip comment lines
extract_ipv4_cidr Extract IPv4 CIDRs from structured text
dshield_format Parse DShield block.txt format
torproject_exits Parse Tor exit-addresses format
passthrough No transformation

The processor field sets the normalized-output pipeline. When processor is present, it is the pipeline the daemon runs. processor_raw is retained as a legacy catalog field: if processor is omitted, the daemon treats processor_raw as one legacy processor name; otherwise it is preserved as metadata for compatibility and auditing. It is not a separate raw-archive pipeline.

See Processor Reference for the full supported processor list and arguments.

Simple source example

sources:
  feodo:
    license: CC0 1.0
    url: https://feodotracker.abuse.ch/downloads/ipblocklist_recommended.txt
    frequency: 30
    ipv: ipv4
    output: ipset
    processor:
      - remove_comments
    processor_raw: remove_comments
    category: malware_infrastructure
    info: '[Abuse.ch Feodo tracker](https://feodotracker.abuse.ch) trojan IP blocklist'
    maintainer: Abuse.ch
    maintainer_url: https://feodotracker.abuse.ch/

Complex source example

sources:
  dshield:
    license: CC BY-NC-SA 4.0
    url: https://feeds.dshield.org/block.txt
    frequency: 10
    history:
      - 1440
      - 10080
      - 43200
    ipv: ipv4
    output: netset
    processor:
      - dshield_format
    processor_raw: dshield_parser
    category: intrusion
    info: '[DShield.org](https://dshield.org/) top 20 attacking class C subnets'
    maintainer: DShield.org
    maintainer_url: https://dshield.org/

This source checks every 10 minutes, produces a netset, declares history windows (1 day, 7 days, 30 days), and uses a custom DShield parser.

File location

Each source feed lives in sources/<category>/<name>.yaml. The category subdirectory must match the category: field in the source definition.

Getting Started

Installation

Running the Daemon

Configuration

Feed Configuration

Pipeline

Admin UI

Integrity

API Reference

Monitoring

CLI Tools

Troubleshooting

Updating

Catalog Maintenance

Security

Reference

Clone this wiki locally