Skip to content

Commit 5a46f15

Browse files
authored
feat: Airflow demo - add DBT job (#342)
* wip * working dag * working dag: cleanup * comment * working with 3.0.6 and necessary rbac * wip: tls almost working * demo setup * demo setup II * minor cleanup * reduce size of demo and prepare for public use * revert changes * added back in admin credentials * renamed stack, remove logging config, correct password * working dbt test * working dbt DAG in original airflow stack * remove project * use secret for env-var * demo cleanup * add workflow for demo image, move dbt folder to demos, complete docs * correct paths * add clarifying comments * corrected link * re-set demos/stacks links
1 parent 95bb4e2 commit 5a46f15

File tree

18 files changed

+1298
-46
lines changed

18 files changed

+1298
-46
lines changed

.pre-commit-config.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ exclude: '(stacks/_templates/minio-.*/rendered-chart\.yaml|\.svg)$'
66

77
repos:
88
- repo: https://github.com/pre-commit/pre-commit-hooks
9-
rev: 2c9f875913ee60ca25ce70243dc24d5b6415598c # 4.6.0
9+
rev: v6.0.0 # 4.6.0
1010
hooks:
1111
- id: trailing-whitespace
1212
- id: end-of-file-fixer
@@ -16,22 +16,22 @@ repos:
1616
exclude: '^stacks/argo-cd-git-ops/secrets/sealed-secrets-key\.yaml$'
1717

1818
- repo: https://github.com/adrienverge/yamllint
19-
rev: 81e9f98ffd059efe8aa9c1b1a42e5cce61b640c6 # 1.35.1
19+
rev: v1.37.1 # 1.35.1
2020
hooks:
2121
- id: yamllint
2222

2323
- repo: https://github.com/igorshubovych/markdownlint-cli
24-
rev: f295829140d25717bc79368d3f966fc1f67a824f # 0.41.0
24+
rev: v0.46.0 # 0.41.0
2525
hooks:
2626
- id: markdownlint
2727

2828
- repo: https://github.com/koalaman/shellcheck-precommit
29-
rev: 2491238703a5d3415bb2b7ff11388bf775372f29 # 0.10.0
29+
rev: v0.11.0 # 0.10.0
3030
hooks:
3131
- id: shellcheck
3232
args: ["--severity=info"]
3333

3434
- repo: https://github.com/rhysd/actionlint
35-
rev: 62dc61a45fc95efe8c800af7a557ab0b9165d63b # 1.7.1
35+
rev: v1.7.9 # 1.7.1
3636
hooks:
3737
- id: actionlint
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
---
2+
apiVersion: batch/v1
3+
kind: Job
4+
metadata:
5+
name: create-tables-in-trino
6+
spec:
7+
template:
8+
spec:
9+
serviceAccountName: demo-serviceaccount
10+
containers:
11+
- name: create-tables-in-trino
12+
image: oci.stackable.tech/sdp/testing-tools:0.2.0-stackable0.0.0-dev
13+
command: ["bash", "-c", "python -u /tmp/script/script.py"]
14+
volumeMounts:
15+
- name: script
16+
mountPath: /tmp/script
17+
- name: trino-users
18+
mountPath: /trino-users
19+
volumes:
20+
- name: script
21+
configMap:
22+
name: create-tables-in-trino-script
23+
- name: trino-users
24+
secret:
25+
secretName: trino-users
26+
restartPolicy: OnFailure
27+
backoffLimit: 50
28+
---
29+
apiVersion: v1
30+
kind: ConfigMap
31+
metadata:
32+
name: create-tables-in-trino-script
33+
data:
34+
script.py: |
35+
import sys
36+
import trino
37+
38+
if not sys.warnoptions:
39+
import warnings
40+
warnings.simplefilter("ignore")
41+
42+
def get_connection():
43+
connection = trino.dbapi.connect(
44+
host="trino-coordinator",
45+
port=8443,
46+
user="admin",
47+
http_scheme='https',
48+
auth=trino.auth.BasicAuthentication("admin", open("/trino-users/admin").read()),
49+
)
50+
connection._http_session.verify = False
51+
return connection
52+
53+
def run_query(connection, query):
54+
print(f"[DEBUG] Executing query {query}")
55+
cursor = connection.cursor()
56+
cursor.execute(query)
57+
return cursor.fetchall()
58+
59+
connection = get_connection()
60+
61+
run_query(connection, "CREATE SCHEMA iceberg.dbt_schema WITH (location = 's3a://demo/dbt_schema')")

demos/airflow-scheduled-job/dbt/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ RUN python -m venv /opt/venv
1111
ENV PATH="/opt/venv/bin:$PATH"
1212

1313
# Install Python packages
14-
COPY requirements.txt .
14+
COPY demos/airflow-scheduled-job/dbt/requirements.txt .
1515
RUN pip install --no-cache-dir --upgrade pip && \
1616
pip install --no-cache-dir -r requirements.txt
1717

@@ -30,7 +30,7 @@ ENV PATH="/opt/venv/bin:$PATH"
3030

3131
WORKDIR /dbt
3232

33-
COPY dbt_test ./dbt_test
33+
COPY demos/airflow-scheduled-job/dbt/dbt_test ./dbt_test
3434

3535
# Security: non-root user
3636
RUN useradd -m -u 1000 dbt && chown -R dbt:dbt /dbt
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
apiVersion: v1
3+
kind: ServiceAccount
4+
metadata:
5+
name: demo-serviceaccount
6+
namespace: default
7+
---
8+
apiVersion: rbac.authorization.k8s.io/v1
9+
kind: ClusterRoleBinding
10+
metadata:
11+
name: demo-clusterrolebinding
12+
subjects:
13+
- kind: ServiceAccount
14+
name: demo-serviceaccount
15+
namespace: default
16+
roleRef:
17+
kind: ClusterRole
18+
name: demo-clusterrole
19+
apiGroup: rbac.authorization.k8s.io
20+
---
21+
apiVersion: rbac.authorization.k8s.io/v1
22+
kind: ClusterRole
23+
metadata:
24+
name: demo-clusterrole
25+
rules:
26+
- apiGroups:
27+
- ""
28+
resources:
29+
- pods
30+
verbs:
31+
- get
32+
- list
33+
- watch
34+
- apiGroups:
35+
- apps
36+
resources:
37+
- statefulsets
38+
verbs:
39+
- get
40+
- list
41+
- watch
42+
- apiGroups:
43+
- batch
44+
resources:
45+
- jobs
46+
verbs:
47+
- get
48+
- list
49+
- watch

demos/demos-v2.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ demos:
5252
- plainYaml: https://raw.githubusercontent.com/stackabletech/demos/main/demos/airflow-scheduled-job/04-enable-and-run-date-dag.yaml
5353
- plainYaml: https://raw.githubusercontent.com/stackabletech/demos/main/demos/airflow-scheduled-job/05-enable-and-run-kafka-dag.yaml
5454
- plainYaml: https://raw.githubusercontent.com/stackabletech/demos/main/demos/airflow-scheduled-job/06-create-opa-users.yaml
55+
- plainYaml: https://raw.githubusercontent.com/stackabletech/demos/main/demos/airflow-scheduled-job/serviceaccount.yaml
56+
- plainYaml: https://raw.githubusercontent.com/stackabletech/demos/main/demos/airflow-scheduled-job/create-trino-tables.yaml
5557
supportedNamespaces: []
5658
resourceRequests:
5759
cpu: 2401m
969 KB
Loading
285 KB
Loading
909 KB
Loading
57.9 KB
Loading

0 commit comments

Comments
 (0)