The DWE CLI (dwe) is the orchestration brain of the Data Warehouse Ecosystem. It takes a blank or existing client Git repository and injects a fully working Adapter — infrastructure, application config, CI/CD pipelines, and local dev commands — in a single command.
dwe create-service test_adapter --git-repo https://github.com/client/repo --envs dev --envs prod
Internally this does:
1. Clone GitPython clones the client repo to a temp directory
2. Hydrate Copier renders the adapter template into the clone
3. State CLI writes dwe-state.json
4. CI/CD CLI renders per-environment GitHub Actions / GitLab CI files
5. Branch initial-commit branch is created and committed
6. Env branches dev, prod branches are created from initial-commit
7. Push All branches are pushed to the remote
8. Secrets GitHub/GitLab API uploads secrets to the repository settings
The result is a client repo that already has working infrastructure code, a justfile with just up / just deploy-prod, and CI/CD that deploys to the right environment when you push to its branch.
pip install poetry # if not already installed
poetry install # from dwe-core source (creates venv, installs deps)
# or once published:
pip install dwe-coreVerify:
dwe --help
dwe list-adaptersdwe create-service <adapter_name> \
--git-repo <url> \
[--envs <name>]... \ # default: development, main
[--secrets <json>] \ # e.g. '{"AWS_KEY":"abc"}'
[--tag <version>] \ # adapter git tag, e.g. v1.2.0
[--token <api-token>] \ # or set GITHUB_TOKEN / GITLAB_TOKEN
[--aws-region <region>] \
[--instance-type <type>] \
[--clone-dir <path>] # default: temp dir
Example — full run:
export GITHUB_TOKEN=ghp_xxxx
dwe create-service test_adapter \
--git-repo https://github.com/acme/data-platform \
--envs development \
--envs staging \
--envs main \
--secrets '{"PULUMI_ACCESS_TOKEN":"pul-xxx","AWS_ACCESS_KEY_ID":"AKI...","AWS_SECRET_ACCESS_KEY":"..."}' \
--tag v1.0.0 \
--aws-region eu-west-1 \
--instance-type t3.smallAfter this runs, the data-platform repo has:
.github/workflows/
deploy-development.yaml
deploy-staging.yaml
deploy-main.yaml
blueprint/
html/index.html
instance-setup.sh
docker-compose.yml
docker-compose.prod.yml
.env.example
justfile
infrastructure/
__main__.py <- project_name, instance_type already substituted
Pulumi.yaml
requirements.txt
dwe-state.json
.copier-answers.yml <- Copier's internal state (enables future updates)
dwe update-service <adapter_name> <local_path> [--tag <version>]
Example:
dwe update-service test_adapter ./data-platform --tag v1.2.0Internally:
- Reads
dwe-state.jsonand validates the adapter name matches - Creates a branch
dwe-update-20260322-1.2.0 - Runs
copier.run_update()— smart merge that preserves your customisations - Updates
dwe-state.jsonwith the new version
Review the diff on the branch, then merge into your environment branches to trigger deployments.
dwe list-adaptersShows all adapters registered in adapters.json.
{
"test_adapter": {
"path": "/absolute/path/to/dwe_test_adapter",
"type": "local",
"description": "Test adapter: AWS EC2 instance via Pulumi"
},
"superset_adapter": {
"url": "https://github.com/hipposys/dwe-superset-adapter",
"type": "git",
"description": "Apache Superset on ECS"
}
}An adapter is a real, runnable project that also serves as a Copier template. The guiding principle:
The adapter must work locally as-is. A developer should be able to
git clonethe adapter, runjust up, and have a working service — without running the DWE CLI at all.
mkdir my_adapter && cd my_adapter
git initBuild your service as a real project before adding any template variables. For example, if you're building a Superset adapter:
# Make it work locally first
docker compose up # verify it runsOnly once everything works locally do you introduce {{ variables }}.
my_adapter/
├── copier.yml # Copier config + question definitions
│
├── docker-compose.yml # Real, runnable. Uses ${ENV_VAR:-default} for runtime values.
├── docker-compose.prod.yml # Production overrides (restart policy, logging)
├── .env.example # Template for secrets — committed; .env is git-ignored
├── .gitignore
│
├── justfile # Dev commands (just up, just deploy-prod, just infra-up)
│
├── blueprint/ # Application-level config files
│ ├── html/ # or nginx.conf, superset_config.py, etc.
│ └── instance-setup.sh # EC2 user-data bootstrap script
│
├── infrastructure/ # Pulumi IaC — only files here use .jinja
│ ├── __main__.py.jinja # <- .jinja because it embeds {{ project_name }}
│ ├── Pulumi.yaml.jinja # <- .jinja because it embeds {{ project_name }}
│ └── requirements.txt
│
└── ci-templates/ # Jinja2 templates rendered by the CLI (not Copier)
└── deploy.yaml # Uses {{ ENV_NAME }}, {{ AWS_REGION }}
copier.yml controls how Copier processes the adapter. Key settings:
_templates_suffix: .jinja # ONLY files ending in .jinja are treated as templates
# Everything else is copied verbatim
_exclude:
- copier.yml # Don't copy Copier's own config
- ci-templates # CLI handles this separately
- README.md # Adapter's README is not for client repos
- .git
- .env # Never copy actual secrets
- __pycache__
- "*.pyc"
_skip_if_exists:
- .env.example # Preserve user customisations on updates
# Questions (answered non-interactively by the dwe CLI):
project_name:
type: str
help: "Client project name (used for cloud resource naming)"
adapter_name:
type: str
default: "my_adapter"
when: false # always set programmatically
adapter_version:
type: str
default: "v1.0.0"
when: false # always set programmatically
environments:
type: yaml
default: "[development, main]"
aws_region:
type: str
default: "us-east-1"Apply this rule: if the value changes per client, use {{ variable }}. If it changes per deployment environment, use a .env variable.
| File | Approach | Reason |
|---|---|---|
docker-compose.yml |
.env interpolation (${VAR:-default}) |
Works locally without any substitution; runtime config |
infrastructure/__main__.py |
Jinja2 (.jinja extension) |
Cloud resource names must be unique per client at provision time |
infrastructure/Pulumi.yaml |
Jinja2 (.jinja extension) |
Stack name must be unique per client |
justfile |
Verbatim copy (no .jinja) |
Commands are identical across clients |
blueprint/instance-setup.sh |
Verbatim copy | Generic bootstrap, no client-specific values |
.env.example |
Verbatim copy | Users fill in real values after cloning |
Jinja2 syntax in .jinja files:
# infrastructure/__main__.py.jinja
instance = aws.ec2.Instance(
"{{ project_name }}-instance", # <- substituted by Copier
instance_type="{{ instance_type }}",
...
)After dwe create-service this becomes:
instance = aws.ec2.Instance(
"acme-data-platform-instance",
instance_type="t3.small",
...
)This is a Jinja2 file rendered by the dwe CLI (not by Copier) to generate one workflow file per environment. The CLI uses {@ @} as variable delimiters (not {{ }}), so GitHub Actions ${{ secrets.X }} syntax passes through untouched — no escaping needed.
name: Deploy to {@ ENV_NAME @}
on:
push:
branches:
- {@ ENV_NAME @}
pull_request:
branches:
- {@ ENV_NAME @}
jobs:
deploy:
runs-on: ubuntu-latest
environment: {@ ENV_NAME @}
steps:
- uses: actions/checkout@v4
- name: Deploy
run: just deploy-prod
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} # passes through unchanged
AWS_REGION: {@ AWS_REGION @} # substituted by dwe CLIAvailable variables: {@ ENV_NAME @}, {@ AWS_REGION @}.
Add an entry to dwe-core/adapters.json:
Local (development):
{
"my_adapter": {
"path": "/absolute/path/to/my_adapter",
"type": "local",
"description": "My adapter description"
}
}Remote Git (production):
{
"my_adapter": {
"url": "https://github.com/your-org/my-adapter",
"type": "git",
"description": "My adapter description"
}
}Test locally first (without DWE CLI):
cd my_adapter
cp .env.example .env
just up # docker compose up — must work hereTest Copier rendering in isolation:
pip install copier
copier copy /path/to/my_adapter /tmp/test-output \
--data project_name=testproject \
--data aws_region=us-east-1 \
--defaults --overwrite --trust
# Inspect the output
ls /tmp/test-output
cat /tmp/test-output/infrastructure/Pulumi.yaml # should have project_name substituted
cat /tmp/test-output/docker-compose.yml # should be identical to source
cd /tmp/test-output && docker compose up # should still workTest via dwe CLI:
dwe create-service my_adapter \
--git-repo https://github.com/test-org/empty-repo \
--envs development \
--envs mainTag your adapter repository with semantic version tags. The DWE CLI and Copier use these tags for update-service:
cd my_adapter
git add -A && git commit -m "feat: add postgres service"
git tag v1.1.0
git push origin v1.1.0When a client wants to update:
dwe update-service my_adapter ./client-repo --tag v1.1.0Copier reads the source URL from .copier-answers.yml in the client repo, checks out v1.1.0, and runs a 3-way merge. Files the user has customised are preserved where possible; conflicts surface as standard git merge conflicts.
What gets updated:
infrastructure/— Pulumi code (Jinja2 re-rendered with new template)blueprint/— Application config filesjustfile— Dev commands
What is NOT updated (protected):
.env.example— skipped if it already exists (_skip_if_existsincopier.yml).copier-answers.yml— managed by Copier internally
Written by the dwe CLI after copier.run_copy(). Tracks DWE-specific metadata:
{
"dwe_version": "1.0.0",
"adapter": {
"name": "test_adapter",
"version": "v1.0.0",
"last_update": "2026-03-22"
},
"environments": ["development", "main"],
"infrastructure": "pulumi"
}Written by Copier. Tracks the template source, version, and question answers. Do not edit manually. This is what enables copier.run_update() to know where the template came from.
# Changes here will be overwritten by copier
_commit: v1.0.0
_src_path: /path/to/my_adapter
project_name: acme-data-platform
aws_region: eu-west-1
instance_type: t3.smallBoth files coexist. dwe-state.json is for DWE tooling; .copier-answers.yml is for Copier's update machinery.
Once the client repo is hydrated, the full developer loop is:
1. Local development (laptop):
git clone https://github.com/client/data-platform
cd data-platform
cp .env.example .env # fill in local values (no real AWS keys needed)
just up # docker compose up — app is running at localhost:80802. Provision cloud infrastructure (once):
# Fill in real AWS keys in .env
just install-infra # pip install pulumi pulumi-aws
just infra-preview # see what Pulumi will create
just infra-up # provision the EC2 instance3. Deploy to EC2 (SSH into the instance, then):
git clone https://github.com/client/data-platform /srv/app
cd /srv/app
cp .env.example .env # fill in production values
just deploy-prod # docker compose -f ... up -d4. CI/CD (automatic after push):
Pushing to development or main triggers the corresponding GitHub Actions workflow. See the CI/CD Workflow Design section below for the full two-path logic.
The generated CI/CD workflow (.github/workflows/deploy-{env}.yaml) implements a two-path logic inspired by the Superset production setup. The key insight: infrastructure changes and application changes require completely different responses.
Push to branch
│
▼
Detect changes
(dorny/paths-filter)
│
├─── infrastructure/** changed?
│ │
│ ├─ Pull Request → pulumi preview (validate, no apply)
│ └─ Push → pulumi up --yes (apply infra changes)
│
└─── docker-compose / blueprint changed?
AND infrastructure NOT changed?
│
└─ Push → SSM: git pull + just deploy-prod
(redeploy app on the live EC2 instance)
Why skip deploy when infra also changed? The pulumi up step re-provisions the EC2 instance itself, which already pulls the latest code via its user-data script. Running the app deploy on top of that would be redundant and potentially racy.
| Job | Trigger | What it does |
|---|---|---|
pulumi-preview |
PR, infrastructure/** changed |
Runs pulumi preview — shows what would change, no side effects |
pulumi-apply |
Push, infrastructure/** changed |
Runs pulumi up --yes — applies infra changes |
deploy-app |
Push, app files changed, infra NOT changed | AWS SSM command: git pull && just deploy-prod on live EC2 |
Set these via dwe create-service --secrets '{...}' or manually in GitHub repository settings:
| Secret | Description |
|---|---|
AWS_ACCESS_KEY_ID |
AWS credentials for Pulumi and SSM |
AWS_SECRET_ACCESS_KEY |
AWS credentials |
PULUMI_ACCESS_TOKEN |
Pulumi Cloud token |
PULUMI_CONFIG_PASSPHRASE |
Pulumi stack encryption passphrase |
PULUMI_STACK |
Pulumi stack reference, e.g. myorg/myproject/development |
EC2_INSTANCE_ID |
Instance ID from pulumi stack output instance_id, e.g. i-0abc1234 |
The deploy-app job uses AWS Systems Manager (SSM) instead of SSH — no port 22, no SSH key stored as a secret.
To enable SSM on the EC2 instance:
1. IAM instance profile — attach a role with these policies to the EC2:
{
"Effect": "Allow",
"Action": [
"ssm:UpdateInstanceInformation",
"ssmmessages:CreateControlChannel",
"ssmmessages:OpenControlChannel",
"ec2messages:GetMessages",
"ec2messages:SendReply"
],
"Resource": "*"
}Or simply attach the AWS managed policy AmazonSSMManagedInstanceCore.
2. SSM agent — Amazon Linux 2023 ships with it pre-installed. The blueprint/instance-setup.sh bootstrap script ensures it's running:
systemctl enable amazon-ssm-agent
systemctl start amazon-ssm-agent3. Store the instance ID — after running just infra-up, get the instance ID and store it as a secret:
cd infrastructure && pulumi stack output instance_id
# → i-0abc1234567890def
# Add this to GitHub repository secrets as EC2_INSTANCE_IDScenario 1 — you edited blueprint/html/index.html:
Push to development branch
↓
detect-changes: infrastructure=false, app=true
↓
deploy-app runs:
aws ssm send-command "git pull && just deploy-prod"
polls every 10s until success
prints stdout from EC2 instance
↓
New HTML is live ~30 seconds after push
Scenario 2 — you changed infrastructure/__main__.py.jinja (e.g. bigger instance type):
Push to development branch
↓
detect-changes: infrastructure=true, app=false
↓
pulumi-apply runs:
pulumi up --yes
Pulumi modifies the EC2 instance type in-place (or replaces it)
↓
Infrastructure updated. New instance pulls latest code via user-data.
Scenario 3 — you opened a PR with Pulumi changes:
Pull Request to development
↓
detect-changes: infrastructure=true
↓
pulumi-preview runs:
pulumi preview
Output shown in CI logs — no changes applied
↓
Reviewer can see exactly what Pulumi will do before merging.
The same two-path logic works for GitLab CI. The superset's .gitlab-ci.yml uses:
# Skip deploy if terraform changed
- if: $CI_COMMIT_BRANCH == "main"
changes:
- terraform_scalling/**/*
when: never
# Only deploy if docker/compose changed
- if: $CI_COMMIT_BRANCH == "main"
changes:
- docker/**/*
- docker-compose.ymlFor your adapter's GitLab template, mirror this pattern with pulumi instead of terraform and infrastructure/** instead of terraform_scalling/**.
Environments are set up at create-service time. To add one later:
# Create the branch
git checkout initial-commit
git checkout -b staging
git push origin staging
# Generate the workflow file
cp .github/workflows/deploy-development.yaml .github/workflows/deploy-staging.yaml
# Edit deploy-staging.yaml: change all occurrences of "development" to "staging"
git add .github/workflows/deploy-staging.yaml
git commit -m "chore: add staging environment"
git pushTwo workflows handle the full release lifecycle:
bump version in pyproject.toml → merge to main
│
▼
tag-version.yml triggers on: push to main, pyproject.toml changed
reads Poetry version creates git tag vX.Y.Z automatically
│
▼
(go to GitHub → Releases → Draft a new release → publish it)
│
▼
pypi-publish.yml triggers on: release published
poetry build + publish pushes to PyPI via PYPI_TOKEN
Add PYPI_TOKEN to the repository secrets (Settings → Secrets → Actions):
- Go to https://pypi.org/manage/account/token/ and create an API token scoped to
dwe-core - In GitHub:
Settings → Secrets and variables → Actions → New repository secret- Name:
PYPI_TOKEN - Value: the token from PyPI (starts with
pypi-)
- Name:
Step 1 — bump the version and merge to main:
poetry version patch # 1.0.0 → 1.0.1
poetry version minor # 1.0.0 → 1.1.0
poetry version major # 1.0.0 → 2.0.0
poetry version prerelease # 1.0.0 → 1.0.1a1
poetry version 1.2.0 # set explicit version
git add pyproject.toml
git commit -m "chore: bump version to $(poetry version -s)"
git push origin maintag-version.yml fires on the push, reads the version from pyproject.toml, and pushes tag vX.Y.Z. No manual tagging needed, and it only runs on main.
Step 2 — publish the GitHub Release:
Go to github.com/<org>/dwe-core/releases, click Draft a new release, select the tag just created, and click Publish release.
pypi-publish.yml fires on the publish event: runs poetry install, poetry build, then poetry publish -u __token__ -p $PYPI_TOKEN.
| Concern | Library |
|---|---|
| CLI framework | Typer |
| Template engine | Copier |
| Git operations | GitPython |
| GitHub secrets | PyGithub |
| GitLab variables | python-gitlab |
| Runtime templating | Jinja2 (for CI templates) |
| Infrastructure | Pulumi |
| Task runner | Just |