Releases: ryan-evans-git/ematix-flow
v0.7.0 — Workflow trigger model + AllOf/AnyOf composite triggers
Workflow trigger model. v0.6.0's centralised depends_on={dict} is replaced
by a richer trigger surface on the workflow plus per-job depends_on for the
within-workflow DAG. Hard break, no backwards compat (we're in alpha).
Highlights
-
triggered_by=accepts a workflow/job name, a list (AND-conjoined), or a
boolean expression built fromAllOf(...)/AnyOf(...)for arbitrarily
nested AND/OR composition. Example:ematix.workflow( name="combined_report", triggered_by=AllOf("workflow_A", AnyOf("workflow_B", "workflow_C")), schedule="0 21 * * *", timezone="America/New_York", jobs=[...], )
-
schedule=+timezone=on the workflow (not per-job). Cron tick is one
trigger condition; AND-conjoined withtriggered_by/on_message. -
on_message=<source>for per-message firing (mutually exclusive with
triggered_by/schedule). -
Per-job
depends_on=[…]declares the within-workflow DAG.
Web UI
- Per-element trigger pills on every Workflows card — colored dot per leaf
AND per composite (ALL/ANY) showing the rolled-up state
(🟢 ready · 🟡 pending · 🔴 failed). ▶ Run nowbutton on every workflow card (with job-subset checkboxes)
and every job card (with optional cascade-downstream toggle).- Endpoints:
POST /api/workflows/{name}/run-now,POST /api/jobs/{name}/run-now.
Migration from v0.6.0
# old
ematix.workflow(name="W", jobs=["a","b","c"], depends_on={"b":["a"], "c":["b"]})
# new
@ematix.job(name="b", depends_on=["a"], ...)
@ematix.job(name="c", depends_on=["b"], ...)
ematix.workflow(name="W", jobs=["a","b","c"], schedule="0 * * * *")register_workflow raises with a pointer to the new model if it sees the
old shape.
Docs
- USER_GUIDE: trigger surface + AllOf/AnyOf composite section.
- ematix.dev: scheduling page rewritten with 6 real-world scenarios + honest
competitive framing (dbt / Airflow Datasets / Prefect Automations / Dagster
sensors).
Full changelog: https://github.com/ryan-evans-git/ematix-flow/blob/v0.7.0/CHANGELOG.md
v0.6.0 — Workflow + Job model
What's new
The previous flat Pipelines page becomes Workflows (user-named groupings) on top of Jobs (individual tasks). The DAG between jobs lives on the workflow declaration, not on individual jobs. Existing @ematix.pipeline code keeps working as single-job workflows-of-one, so this is a non-breaking model upgrade.
Added
ematix.workflow(name=..., jobs=[...], depends_on={...})— new declaration.depends_onreads as{downstream: [upstream, ...]}. Edges are mirrored into the existing per-job depends-on table so the scheduler's freshness gating keeps working.@ematix.job— alias for@ematix.pipeline. New code should prefer.job./api/workflows— endpoint returning declared workflows + their jobs/edges, plus synthetic single-job workflows for any job not in a declared workflow.flow web --module <name>— pre-imports a pipelines module so the UI can render schedule, next-run, and DAG before any scheduler tick.- Pipelines API now forecasts
next_run_atfor batch jobs from the registered cron + timezone when no scheduler record exists yet.
Changed
- Web UI restructured to Workflows | Jobs | Runs | DAG tabs.
- DAG view is now an SVG flowchart with cubic-Bézier arrows replacing the rank-as-column layout.
- New shared
DagFlowchart.svelteused by both the Workflows card preview and the full DAG view. - Loopback bind no longer requires a bearer token by default.
See CHANGELOG.md for the full set of additions + migration notes, and docs/USER_GUIDE.md for the new Workflows section.
Install
pip install ematix-flow==0.6.0
Wheels published: linux-x86_64 (py3.11/3.12/3.13/3.14), macos-arm64 (py3.11/3.12/3.13/3.14), plus sdist.
v0.5.0 — operational milestone
[0.5.0] — 2026-05-21
Operational milestone — adds the user-facing surface (CLIs, Web UI,
alerters, observability) on top of v0.4.0's backend matrix. Same
query-execution surface as v0.4.0; per-query TPC-H times unchanged.
Highlights: four new CLI subcommands (flow doctor / init / logs
/ secrets test), bearer-token Web UI auth + cross-pipeline DAG
view, email + PagerDuty alerters, OTEL trace spans + a starter
Grafana dashboard, AWS Glue Schema Registry end-to-end Kafka
dispatch, Arrow-native warehouse adapters, streaming pipeline live
throughput in the Web UI, and the Rust executor for
@ematix.warehouse_pipeline via the new PyO3 callback bridge.
Added
@ematix.warehouse_pipelinedecorator (Phase 2d slice 1, #125).
WiresWarehouseSource/WarehouseTargetinto the
scheduler-registered pipeline registry so warehouse-shaped pipelines
participate in cron scheduling, retries,depends_onDAG, and
flow run-duethe same way DB-backed@ematix.pipelinepipelines
do. The wrapped function is zero-arg; returning astrforwards
it astransform_sql=torun_warehouse_pipeline(DuckDB transform
in-flight on the Arrow table). Slice 2 ships in v0.5.0 across
#135 (PyO3 callback bridge) + #136 (Rustinvoke_warehouse_pipeline
executor); slice 3 adds warehouse-side watermark cursors.- AWS Glue Schema Registry — end-to-end (#126 + #135). Slice 1
(#126) shipped the typedGlueSchemaRegistryConnection(kind
glue_schema_registry, fieldsregistry_name/region/ auth
viaaws_profile=/ explicit static creds / boto3 default chain),
the Rustglue_schema_registrymodule with the Glue wire format
(0x03header + 16-byte UUID + 1-byte compression byte) exposed as
parse_glue_frame/build_glue_frame/GlueFrame/GlueCodec,
and the[schema-registry-glue]extra (boto3 +
aws-glue-schema-registry). Slice 2 (#135) wired the Rust Kafka
backend to dispatch on registry kind for both consumer and
producer paths via aSchemaRegistryKind::{Confluent, Glue {…}}
enum, added per-backend schema caches (one boto3 round-trip per
UUID / per schema name), zlib codec (GlueCodec::Zlib,
byte 0x05) using flate2, producer-side encode
(encode_batch_as_glue_avro) via Arrow → JSONL → Avro, kafka
connection-time validation (Glue + non-Avro fails at construction),
and a LocalStack integration suite at
tests/python/integration/test_glue_localstack.py(gated on
EMATIX_FLOW_LOCALSTACK_ENDPOINT). Confluent path
(kind = "schema_registry") unchanged. - PyO3 callback bridge (#135, task #559 slice 2). Process-global
registry of named Rust callbacks at
ematix_flow_core::py_callbacks(register/unregister/
is_registered/invoke, JSON-in / JSON-out adapter). Concrete
Python wiring inematix-flow-py::py_callbacksexposes
register_python_callback/unregister_python_callback/
is_python_callback_registered/invoke_python_callback. The
Glue Kafka backend routes schema lookups (by-UUID for consumers,
by-name for producers) through this primitive; same primitive
carries the warehouse-pipeline executor in #136. - Rust executor for
@ematix.warehouse_pipeline(#136,
task #559 slice 2 final piece). New
ematix_flow_core::warehouse_executor::invoke_warehouse_pipeline(name)
dispatches a registered warehouse pipeline by name through
py_callbacks, no subprocess. Python side: the
@ematix.warehouse_pipelinedecorator now registers every wrapped
function as a callback at
ematix_flow.warehouse_pipeline:<pipeline_name>so the Rust
scheduler / worker can drive it directly. Three error variants
surface common failure modes (NotRegistered/
CallbackFailed/BadResponseShape); response shape is the
same dict the in-process scheduler builds (status/pipeline/
rows_read/rows_written/duration_ms/watermark). flow initproject scaffold (#136).flow init <dir>writes
a runnable starter project:pipelines.py,connections.toml,
Dockerfile,flow.service(systemd unit),.gitignore,
README.md. Refuses to overwrite without--force. Maven-archetype
shape — one command to a workingflow run-dueloop.flow logs <run_id>(#136). Tails the captured stdout / stderr
for a given run. Capture is opt-in via the
EMATIX_FLOW_CAPTURE_LOGS=1env var (so existing deployments see
no disk / latency change). Logs land at
$EMATIX_FLOW_LOGS_DIR/<run_id>.log
(default~/.ematix-flow/logs/);run_idis pinned to
<pipeline>-<UTC ts>-<attempt>so the same record matches the
RunLog. Tee-based capture (original stream still gets the bytes);
atomic write (tmp + rename) + 30-day prune helper.flow doctor(#136). Connection health probes by kind:
postgres (SELECT 1), kafka (TCP bootstrap probe), glue
(list_registries), pubsub (get_topic), rabbitmq (AMQP
handshake), s3 (head_bucket), snowflake / bigquery (SELECT 1).
Renders a one-pass status table; non-zero exit on any failure so it
fits CI / pre-deploy checks.flow secrets test(#136). Resolves every${…}placeholder
inconnections.toml(or whichever path is passed) and reports
per-secret outcome (provider, key, OK / missing / error) without
printing the resolved value. Useful for validating Vault / AWS /
GCP secret-store wiring before a deploy.- Web UI bearer-token auth (#136). New
--token <value>flag on
flow web(andbearer_token=kwarg oncreate_app/
run_server) gates every/api/*route except/api/health
behindAuthorization: Bearer <token>, compared with
hmac.compare_digest./api/healthstays open for load-balancer
probes. When a token is set, the "non-loopback bind without auth"
warning at startup is suppressed. - Cross-pipeline DAG view in the Web UI (#136). New
/api/dag
endpoint returns{nodes, edges}from thedepends_onregistry
(each node carriesname/schedule/timezone); new Svelte
route#/daglays nodes out in topological-rank columns
(upstreams always left-of downstreams) with fan-out counts. - Email + PagerDuty alerters (#136). Two new alerter URL
schemes register through the same--alerter <url>flag the
Slack / webhook / stdout alerters use:email://user:pass@host:port?from=...&to=...&starttls=1— stdlib
smtplib; default port 587 (STARTTLS) / 465 (implicit SSL).
Errors are caught + logged (an alerter outage never breaks the
pipeline run).pagerduty://<routing_key>?service=...&severity=...— Events
API v2 trigger / resolve, withdedup_key = "<service>:<pipeline>"
so a recovered pipeline auto-resolves its open incident. Maps
failed → trigger(error),gave_up → trigger(critical),
recovered → resolve.
- OpenTelemetry trace spans for pipeline runs (#136). New
ematix_flow.tracingmodule:pipeline_run_span(name, attempt)
context manager wraps every@ematix.pipeline/
@ematix.warehouse_pipelineexecution; configure once via
configure_tracer_from_url(...)withotel://stdout,
otel+otlp+grpc://collector:4317, or
otel+otlp+http://collector:4318. Span attributes include
pipeline name, attempt number, run_id, status, and exception
on failure. Sits alongside the existing OTel-metrics export — same
collector can receive both. - Streaming-pipeline live stats in the Web UI (#136). Streaming
pipelines used to show a useless "Median duration: —" on the
Pipelines view (one open-ended record, no duration). The
flow consumedaemon now opens an optional RunLog
(--run-log-url/$EMATIX_FLOW_RUN_LOG_URL) and spawns a
background thread that scrapes its own/metricsendpoint every
~30s, computing rolling 1m + 5m windows from
ematix_streaming_rows_consumed_total/
ematix_streaming_rows_written_total/
ematix_streaming_batches_total/
ematix_streaming_errors_totaland writing the result into the
running record'sextras./api/pipelinessurfaces these
fields;Pipelines.svelterenders "Throughput: X rps in (1m) /
Y rps in (5m) · Batch cycle: A ms avg (1m)" in place of the median
footer whenkind == "streaming". SqliteRunLog now also
implements the rich-history protocol
(record_run_record/list_runs/get_runagainst a
newrun_recordstable), so the same SQLite file backs both the
flow consumedaemon andflow web. - Grafana dashboard JSON (#136). New
examples/grafana/ematix-flow-dashboard.json— 6-panel starter
board driven by the Prometheus metrics ematix-flow already
exports: runs/min by outcome, success-rate stat, in-flight retries,
p50 / p95 / p99 duration, per-pipeline run rate, top-20 failure
counts.$pipelinetemplated variable. Import in Grafana via the
JSON Model field. - Cron schedule timezone support (#127, task #558 slice 1).
is_due()now accepts a keyword-onlytz=argument that, when
set, interprets the cron expression in that timezone instead of
whatever timezonenowcarries (effectively UTC for
flow run-duetoday). Accepts azoneinfo.ZoneInfoinstance or a
tz name string. DST transitions are honored via croniter's
zoneinfo-aware path.tz=None(the default) preserves today's
behavior bit-for-bit. Slice 2 wirestimezone=into
@ematix.pipeline/@ematix.warehouse_pipeline; slice 3 surfaces
the configured tz in the Web UI's "Next: …" rendering.
Design
- Arrow-native warehouse adapters (#128, task #557). Three-slice
plan to drop pandas from the Snowflake / BigQuery / Redshift
adapter paths. Slice 1 = Snowflake PUT + COPY INTO via parquet
staging; slice 2 = Redshift S3 + COPY (extend merge-mode path to
append); slice 3 = BigQuery GCS +load_table_from_uri. Open
questions on type fidelity, Storage Write API vs GCS staging, and
the backward-compat shim policy captured for review before slice...