Self-hosted data pipeline with Prefect (orchestration), dbt (transformation), Postgres (warehouse), and Metabase (BI dashboards). Deploy once, run forever.
┌────────────────────────────────────────┐
│ Railway Project │
│ │
Source APIs ──────►│ Prefect Worker │
│ ├── ETL flow ──► raw schema │
│ ├── dbt flow ──► analytics schema │
│ └── full-pipeline (daily) │
│ │
│ Prefect Server (:4200) [UI + API] │
│ │
│ Postgres [warehouse + Prefect state] │
│ │
Browser ──────────►│ Metabase (:3000) [dashboards] │
└────────────────────────────────────────┘
| Service | Image | Purpose |
|---|---|---|
prefect-server/ |
prefecthq/prefect:3-latest |
Orchestration API + UI |
worker/ |
Custom Python image | Flow execution + dbt runner |
| Metabase | metabase/metabase |
BI dashboards |
| Postgres | Railway plugin | Warehouse + Prefect state |
| Flow | Schedule | What it does |
|---|---|---|
etl-hourly |
Every hour | Fetches data from APIs → raw schema |
dbt-hourly |
Every hour +30m | Transforms raw → staging + analytics |
full-pipeline-daily |
3am UTC | ETL + dbt end-to-end |
| Variable | Value |
|---|---|
PREFECT_API_DATABASE_CONNECTION_URL |
${{Postgres.DATABASE_URL}} |
PREFECT_UI_ENABLED |
true |
| Variable | Value |
|---|---|
PREFECT_API_URL |
http://${{PrefectServer.RAILWAY_PRIVATE_DOMAIN}}:4200/api |
DATABASE_URL |
${{Postgres.DATABASE_URL}} |
PGHOST |
${{Postgres.PGHOST}} |
PGPORT |
${{Postgres.PGPORT}} |
PGUSER |
${{Postgres.PGUSER}} |
PGPASSWORD |
${{Postgres.PGPASSWORD}} |
PGDATABASE |
${{Postgres.PGDATABASE}} |
| Variable | Value |
|---|---|
MB_DB_TYPE |
postgres |
MB_DB_HOST |
${{Postgres.PGHOST}} |
MB_DB_PORT |
${{Postgres.PGPORT}} |
MB_DB_USER |
${{Postgres.PGUSER}} |
MB_DB_PASS |
${{Postgres.PGPASSWORD}} |
MB_DB_DBNAME |
${{Postgres.PGDATABASE}} |
- Prefect UI: generate a Railway domain for the
prefect-serverservice → port 4200 - Metabase: generate a Railway domain for the
metabaseservice → port 3000
After the first pipeline run, Postgres will contain:
raw.posts ← raw JSON from API
raw.users ← raw JSON from API
staging.stg_posts ← parsed, typed posts (view)
staging.stg_users ← parsed, typed users (view)
analytics.mart_user_activity ← per-user activity summary (table)
analytics.mart_content_stats ← aggregate content stats (table)
Connect Metabase to the analytics schema to build dashboards.
- Add an extract task in
worker/flows/etl.py:
@task(retries=3)
def extract_orders() -> list[dict]:
# fetch from your API/DB/file
return [...]
@task
def load_raw_orders(records):
load_raw("orders", records, datetime.utcnow().isoformat())-
Add staging + mart models in
worker/dbt_project/models/ -
Add a schedule in
worker/flows/entrypoint.py
Access the Prefect UI and trigger any flow run on demand, or use the CLI:
# From your local machine (set PREFECT_API_URL first)
prefect deployment run etl-extract-load/etl-hourly
prefect deployment run dbt-transform/dbt-hourlyWorker keeps restarting: Prefect Server isn't ready yet. Worker waits up to 5 minutes — check prefect-server service logs.
dbt fails with "relation does not exist": Run the ETL flow first to populate raw schema before running dbt.
Metabase can't connect to DB: Verify MB_DB_* vars point to Postgres. Metabase needs its own internal DB (same Postgres is fine).
Prefect UI shows no flows: Worker deploys flows on startup — check worker logs for [startup] Deployed messages.