Skip to content

furelid/railway-data-stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Railway Modern Data Stack

Self-hosted data pipeline with Prefect (orchestration), dbt (transformation), Postgres (warehouse), and Metabase (BI dashboards). Deploy once, run forever.

Deploy on Railway

Architecture

                    ┌────────────────────────────────────────┐
                    │          Railway Project                │
                    │                                        │
 Source APIs ──────►│  Prefect Worker                        │
                    │    ├── ETL flow  ──► raw schema        │
                    │    ├── dbt flow  ──► analytics schema  │
                    │    └── full-pipeline (daily)           │
                    │                                        │
                    │  Prefect Server (:4200) [UI + API]     │
                    │                                        │
                    │  Postgres [warehouse + Prefect state]  │
                    │                                        │
 Browser ──────────►│  Metabase (:3000) [dashboards]        │
                    └────────────────────────────────────────┘

Services

Service Image Purpose
prefect-server/ prefecthq/prefect:3-latest Orchestration API + UI
worker/ Custom Python image Flow execution + dbt runner
Metabase metabase/metabase BI dashboards
Postgres Railway plugin Warehouse + Prefect state

What runs on schedule

Flow Schedule What it does
etl-hourly Every hour Fetches data from APIs → raw schema
dbt-hourly Every hour +30m Transforms raw → staging + analytics
full-pipeline-daily 3am UTC ETL + dbt end-to-end

Environment Variables

Prefect Server

Variable Value
PREFECT_API_DATABASE_CONNECTION_URL ${{Postgres.DATABASE_URL}}
PREFECT_UI_ENABLED true

Worker

Variable Value
PREFECT_API_URL http://${{PrefectServer.RAILWAY_PRIVATE_DOMAIN}}:4200/api
DATABASE_URL ${{Postgres.DATABASE_URL}}
PGHOST ${{Postgres.PGHOST}}
PGPORT ${{Postgres.PGPORT}}
PGUSER ${{Postgres.PGUSER}}
PGPASSWORD ${{Postgres.PGPASSWORD}}
PGDATABASE ${{Postgres.PGDATABASE}}

Metabase

Variable Value
MB_DB_TYPE postgres
MB_DB_HOST ${{Postgres.PGHOST}}
MB_DB_PORT ${{Postgres.PGPORT}}
MB_DB_USER ${{Postgres.PGUSER}}
MB_DB_PASS ${{Postgres.PGPASSWORD}}
MB_DB_DBNAME ${{Postgres.PGDATABASE}}

Accessing the UIs

  • Prefect UI: generate a Railway domain for the prefect-server service → port 4200
  • Metabase: generate a Railway domain for the metabase service → port 3000

Data model

After the first pipeline run, Postgres will contain:

raw.posts          ← raw JSON from API
raw.users          ← raw JSON from API
staging.stg_posts  ← parsed, typed posts (view)
staging.stg_users  ← parsed, typed users (view)
analytics.mart_user_activity  ← per-user activity summary (table)
analytics.mart_content_stats  ← aggregate content stats (table)

Connect Metabase to the analytics schema to build dashboards.

Adding your own data sources

  1. Add an extract task in worker/flows/etl.py:
@task(retries=3)
def extract_orders() -> list[dict]:
    # fetch from your API/DB/file
    return [...]

@task
def load_raw_orders(records):
    load_raw("orders", records, datetime.utcnow().isoformat())
  1. Add staging + mart models in worker/dbt_project/models/

  2. Add a schedule in worker/flows/entrypoint.py

Running flows manually

Access the Prefect UI and trigger any flow run on demand, or use the CLI:

# From your local machine (set PREFECT_API_URL first)
prefect deployment run etl-extract-load/etl-hourly
prefect deployment run dbt-transform/dbt-hourly

Troubleshooting

Worker keeps restarting: Prefect Server isn't ready yet. Worker waits up to 5 minutes — check prefect-server service logs.

dbt fails with "relation does not exist": Run the ETL flow first to populate raw schema before running dbt.

Metabase can't connect to DB: Verify MB_DB_* vars point to Postgres. Metabase needs its own internal DB (same Postgres is fine).

Prefect UI shows no flows: Worker deploys flows on startup — check worker logs for [startup] Deployed messages.

About

Modern data stack with Prefect, dbt, Metabase, and Postgres — Railway template

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors