Skip to content

purijs/blazerules

Repository files navigation

BlazeRules

Docs GitHub C++20 Python Build License

BlazeRules evaluates YAML rules over high-volume event batches. Use it from Python, embed it in C++, or run the local agent to read logs and event streams from HTTP, stdin, file tails, Kafka, Arrow, Avro, Protobuf, S3, or local files.

Repository: github.com/purijs/blazerules · Documentation: blazerules.readme.io

License: source-available under FSL-1.1-ALv2; commercial/OEM licensing is available for managed-service, competing SaaS, and enterprise use.

The engine is batch-first internally. Ingestion adapters collect events into microbatches, infer or bind a schema, evaluate rules, and emit compact decisions or dead-letter records.

Install

pip install blazerules

The Python package exposes blazerules and blazerules_io. It includes the core rule engine, IO helpers, ONNX scoring, the local ingest agent, and the local dashboard. numpy and pyarrow are installed as Python dependencies.

python -c "import blazerules, blazerules_io; print(blazerules.__version__, blazerules.simd_backend())"

What BlazeRules Can Ingest

Input How to use it Typical use
JSON / NDJSON bytes RuleEngine.evaluate_ndjson(...) API payloads, application events, log lines already formatted as JSON.
Python lists of JSON strings RuleEngine.evaluate_messages(...) Small integrations and local scripts.
PyArrow / Arrow batches RuleEngine.evaluate_batch(...) Typed pipelines, Parquet/Arrow data, high-throughput paths.
Kafka blazerules_io.KafkaConsumer or run_stream(...) Microbatch consume → evaluate → produce decisions.
HTTP logs/events blazerules_agent --input http or instances[].input.type: http Apps POST NDJSON to /v1/logs.
stdin blazerules_agent --input stdin Pipe terminal output or process logs into BlazeRules.
File tail blazerules_agent --input file_tail --path app.log Pod logs, stdout/stderr files, node-local log files.
Plain text logs wrap each line as JSON first Unstructured terminal/stdout/stderr text.
Kubernetes logs Helm chart / DaemonSet file-tail mode Tail /var/log/containers/... and write decisions/DLQ.
Debezium CDC blazerules_io.unwrap_debezium(...) Evaluate database change events.
Arrow IPC blazerules_io.ArrowIpcDecoder Binary columnar frames.
Avro blazerules_io.AvroDecoder Schema-based binary events.
Protobuf blazerules_io.ProtobufDecoder Descriptor-backed binary events.
S3 / local files read_ndjson_bytes(...), read_record_batches(...) Offline jobs, backtests, lookup/model/rule loading.

All paths converge on the same batch evaluation engine. The adapters differ in how they collect and decode records; rule execution stays the same.

Quick Python Example

import blazerules

engine = blazerules.RuleEngine()
engine.load_rules("rules.yaml")

payload = b"""
{"event_id":"e1","card_token":"card_1","amount":2500.0,"device_type":"emulator","country_code":"US"}
{"event_id":"e2","card_token":"card_2","amount":50.0,"device_type":"ios","country_code":"GB"}
"""

result = engine.evaluate_ndjson(payload)
print(result.n_records, result.n_matched)
print(result.decisions)
print(result.match_counts)

Rules can be loaded before a schema exists. The first evaluated batch samples rule-referenced fields and binds the inferred schema. You can still pass an explicit schema when you need strict control.

Local Agent For Logs And HTTP Events

Run an HTTP ingest endpoint:

blazerules_agent \
  --rules rules.yaml \
  --input http \
  --host 127.0.0.1 \
  --port 9480 \
  --batch-size 4096 \
  --flush-ms 50 \
  --output ndjson \
  --output-path decisions.ndjson

curl -X POST http://127.0.0.1:9480/v1/logs \
  --data-binary $'{"event_id":"e1","message":"payment error","amount":99.5}\n'

Pipe stdin:

journalctl -u checkout -f -o json | \
  blazerules_agent --rules rules.yaml --input stdin --output stdout

Tail a file:

blazerules_agent \
  --rules rules.yaml \
  --input file_tail \
  --path /var/log/containers/checkout.log \
  --output ndjson \
  --output-path decisions.ndjson

Each agent input batches records by batch_size or flush_ms, evaluates the batch, and writes compact decision events. Bad records can be counted, skipped, or written to a dead-letter NDJSON file depending on ingest settings.

Decisions, DLQ, And Dashboard

BlazeRules returns per-record decisions directly in Python/C++. The agent can also write an NDJSON decision log for downstream routing:

{"ts_ms":1782150000000,"batch_row":0,"decision":"REVIEW","score":72.0,"risk_band":"HIGH","winning_rule_id":"high_risk_payment"}

Dead-letter records keep malformed or type-bad input out of the hot path while preserving enough context to debug the producer. The dashboard reads decision logs, dead-letter logs, metrics, benchmark output, and rule summaries.

BlazeRules dashboard overview

Documentation

Start here:

Build From Source

Most users start with pip install blazerules. Build from source when you need to change native flags, embed the C++ library directly, or produce your own platform image.

cmake --preset linux-x86_64-release-dispatch
cmake --build --preset linux-x86_64-release-dispatch -j

Build details, CMake options, C++ embedding, and architecture-specific notes are kept together in the documentation instead of spread through the getting-started path.

Arrow Evaluation

Use Arrow when upstream data is already typed or when JSON parsing is not what you want to measure.

import pyarrow as pa
import blazerules

batch = pa.record_batch({
    "card_token": pa.array(["card_1", "card_2"]),
    "amount": pa.array([2500.0, 50.0], type=pa.float32()),
    "device_type": pa.array(["emulator", "ios"]),
    "country_code": pa.array(["US", "GB"]),
    "account_age_days": pa.array([2, 400], type=pa.int32()),
    "hour_of_day": pa.array([1.5, 12.0], type=pa.float32()),
})

engine = blazerules.RuleEngine()
engine.load_rules("rules.yaml")
result = engine.evaluate_batch(batch)

Arrow batches may contain extra columns or different physical column order. BlazeRules projects rule-referenced columns by name. Nested Arrow struct fields use the same dotted names as JSON.

YAML Rule Format

Minimal shape:

schema_version: "2.1"

fields:
  card_token: {type: entity_key, nullable: false}
  amount: {type: float32, nullable: false}
  device_type:
    type: categorical
    values: [ios, android, web, emulator]

ruleset:
  name: Fraud Rules
  version: "1.0.0"
  rules:
    - id: high_amount_emulator
      action: block
      severity: HIGH
      weight: 40
      conditions:
        and:
          - field: amount
            op: gt
            value: 2000
          - field: device_type
            op: eq
            value: emulator

Top-level fields are optional hints, not a mandatory user schema. They are useful for entity keys, timestamps, nullability, and closed categorical values. Without hints, BlazeRules infers referenced fields from the first batch.

Logical forms:

conditions:
  and:
    - field: amount
      op: gt
      value: 1000
    - or:
        - field: country_code
          op: in
          values: [US, GB]
        - not:
            field: device_type
            op: eq
            value: ios

SQL expression form:

conditions:
  sql: "amount > 1000 AND any_match(items, x -> x.price > 100)"

See rules.yaml for a compact file covering every operator family supported by the parser, plus a top-level instances section for the local agent.

Operator Summary

Numeric:

gt lt gte lte eq neq
between_including between_excluding
gt_field lt_field gte_field lte_field eq_field neq_field

Categorical/entity:

eq neq in not_in

Null and empty:

is_null is_not_null is_empty is_not_empty

Strings and regex:

contains starts_with ends_with ci_eq
length_gt length_lt length_eq
regex not_regex

Arrays and flags:

contains_any contains_all intersects not_intersects
array_len_gt array_len_lt array_len_eq
flags_any flags_all flags_none
array_any

Network, temporal, geo:

ip_in_subnet ip_not_in_subnet
before after within_last day_of_week_in time_of_day_between
distance_gt distance_lt

Lookups, windows, derived values:

in_lookup not_in_lookup
window: count sum avg ratio min max
expr arithmetic: + - * /
vector_distance: cosine l2 dot
model_score

Nested Records And Arrays Of Objects

Nested JSON:

{"merchant":{"risk":{"score":91}}}

Rule:

conditions:
  field: merchant.risk.score
  op: gt
  value: 50

Array-of-object same-element semantics:

conditions:
  array_any:
    path: items
    where:
      and:
        - field: price
          op: gt
          value: 100
        - field: category
          op: eq
          value: electronics

This matches only when one item has both price > 100 and category == electronics.

Lookups

Rule files can reference CSV lookup sets:

lookups:
  blocked_merchants:
    type: string_set
    path: lookups/blocked_merchants.csv
  risky_bins:
    type: int_set
    path: lookups/risky_bins.csv
  vpn_ranges:
    type: ipv4_cidr_set
    path: lookups/vpn_ranges.csv

Supported lookup CSV columns:

Type Column
string_set value
int_set value
ipv4_cidr_set cidr

Relative lookup paths resolve relative to the rules file. Missing or invalid lookup files fail rule loading and do not replace an active hot-reloaded ruleset.

Decisions And Routing

Use decision groups instead of Python loops over every row:

result = engine.evaluate_ndjson(payload)

approved = result.indices_for_decision("APPROVE")
needs_review = result.indices_for_not_decision("APPROVE")
groups = result.grouped_decision_indices()

Useful result fields:

n_records
n_matched
decisions
decision_codes
scores
risk_bands
winning_rule_ids
match_counts
matched_indices
timing_ms
messages_processed
messages_skipped
error_counts
error_samples

Use OutputDetail.DECISIONS for routing and OutputDetail.BITMASKS only when downstream code needs per-rule bitmasks.

Windows

Window rules read prior batch history, inject derived window columns, evaluate the current batch, then write the current batch for future batches. This means batch N sees state committed by earlier batches. Same-batch repeated entity rows do not see earlier rows from that same batch by default.

Supported window functions:

count sum avg ratio min max

Hot Reload

engine.load_rules("rules.yaml")
engine.enable_hot_reload("rules.yaml", poll_interval_seconds=5)
status = engine.hot_reload_status()

Reload compiles and validates the new YAML/lookups off the hot path, then swaps atomically only on success. Failed reloads keep the previous ruleset active. Batches keep the ruleset observed at batch start.

Error Handling

Rules and schema activation are strict. Bad YAML, unknown fields, duplicate rule IDs, invalid regex, bad lookup files, and type/operator mismatches fail before activation.

Ingest defaults are tolerant:

config.ingest_error_mode = blazerules.IngestErrorMode.SKIP_AND_COUNT
config.type_mismatch_mode = blazerules.TypeMismatchMode.NULL_ON_TYPE_ERROR

Other modes:

SKIP_TO_DEAD_LETTER
HARD_FAIL
COERCE
HARD_FAIL_TYPE

SIMD Diagnostics

import blazerules

print(blazerules.simd_backend())
print(blazerules.cpu_features_summary())

cfg = blazerules.EngineConfig()
cfg.simd_backend_override = "auto"
cfg.enable_avx512 = False

AVX-512 is disabled for auto-selection unless explicitly enabled because some server CPUs reduce frequency under wide vectors. Measure before enabling.

IO Module

The full wheel and default source build include blazerules_io. If you maintain a custom lean build, keep -DBLAZERULES_IO=ON and enable the matching decoder flags:

BLAZERULES_IO_AVRO=ON
BLAZERULES_IO_PROTOBUF=ON

The IO module supports:

  • Kafka source/sink through librdkafka.
  • Debezium CDC unwrap.
  • Arrow IPC frames.
  • Avro binary records.
  • Protobuf binary records with descriptor sets.
  • Local and exact-object s3:// file reads.

Binary decoders produce Arrow RecordBatch objects and call evaluate_batch; they do not need to convert through JSON.

S3 Resources

Rules, lookup CSVs, ONNX models, and files can be loaded from exact-object s3://bucket/key URIs through the AWS CLI cache path.

import blazerules

blazerules.set_aws_profile("personal")
blazerules.set_aws_region("us-east-1")
blazerules.set_aws_endpoint_url("http://127.0.0.1:9000")

engine = blazerules.RuleEngine()
engine.load_rules("s3://bucket/rules/fraud.yaml")

Equivalent environment variables:

export BLAZERULES_AWS_PROFILE=personal
export BLAZERULES_AWS_REGION=us-east-1
export BLAZERULES_AWS_ENDPOINT_URL=http://127.0.0.1:9000

Dashboard And Agent

Dashboard:

cmake --build cmake-build-release --target blazerules_dashboard -j
./cmake-build-release/blazerules_dashboard --host 127.0.0.1 --port 9470 --rules rules.yaml

Agent:

cmake --build cmake-build-release --target blazerules_agent -j

The dashboard is read-only and unauthenticated. Bind to localhost unless you add your own network controls.

Performance Guidance

  • Use Release builds.
  • Batch records; do not call the engine per record.
  • Prefer Arrow when upstream data is already typed.
  • Use evaluate_ndjson(bytes_blob) for JSON streams.
  • Use evaluate_ndjson_padded(...) or evaluate_ndjson_file(...) when input is already simdjson-padded or memory-mapped.
  • Keep streaming batches sized for latency, commonly 2K-64K rows.
  • Use larger batches for throughput benchmarks.
  • Use OutputDetail.DECISIONS unless per-rule masks are required.
  • Keep partition/entity affinity for window-heavy streaming workloads.
  • Avoid huge unused JSON fields when chasing JSON throughput; skipped bytes are still bytes the parser must scan.

Compatibility

  • Library version: blazerules.__version__ / blazerules.BLAZERULES_VERSION.
  • YAML compatibility: blazerules.RULE_YAML_COMPATIBILITY.
  • Public API follows semantic versioning.
  • Rule operator behavior is stable within a compatible YAML major version.

License And Commercial Use

BlazeRules is source-available under the Functional Source License 1.1, Apache 2.0 Future License (FSL-1.1-ALv2).

You may use, modify, and redistribute BlazeRules for permitted purposes under the FSL. Offering BlazeRules or substantially similar functionality as a commercial hosted service, managed service, SaaS, API, platform, or competing product requires a commercial license.

The intended product split is:

  • public/source-available data plane: local engine, SDKs, local agent, local dashboard, examples, and integrations;
  • commercial/SaaS control plane: hosted rule editor, approval workflows, versioning, fleet management, audit logs, registry, enterprise connectors, certified builds, support, and BYOC/BYOL management.

For commercial, OEM, managed-service, enterprise, or BYOC/BYOL licensing, contact licensing@blazerules.dev.

See LICENSE, COMMERCIAL.md, TRADEMARKS.md, and THIRD_PARTY_NOTICES.md.

About

A SIMD-accelerated and arrow-native C++ rule engine, embeddable in pipelines. Evaluates declarative rules over event batches at memory-bandwidth speed

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages