Skip to content

Commit

Permalink
Removed AirflowDB (#322)
Browse files Browse the repository at this point in the history
* Removed AirflowDB

* Formatting fixes

* Updated changelog

* Removed "databaseInitialization" option

* Update rust/operator-binary/src/airflow_controller.rs

Co-authored-by: Malte Sander <malte.sander.it@gmail.com>

---------

Co-authored-by: Malte Sander <malte.sander.it@gmail.com>
  • Loading branch information
dervoeti and maltesander committed Sep 20, 2023
1 parent 35e2c46 commit dbcfb2f
Show file tree
Hide file tree
Showing 14 changed files with 45 additions and 1,279 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Expand Up @@ -12,6 +12,7 @@
- [BREAKING] Consolidated `spec.clusterConfig.authenticationConfig` to `spec.clusterConfig.authentication` which now takes a vector of AuthenticationClass references ([#303]).
- `vector` `0.26.0` -> `0.31.0` ([#308]).
- `operator-rs` `0.44.0` -> `0.45.1` ([#308]).
- [BREAKING] Removed AirflowDB object, since it created some problems when reinstalling or upgrading an Airflow cluster. Instead, the initialization of the database was moved to the startup phase of each scheduler pod. To make sure the initialization does not run in parallel, the `PodManagementPolicy` of the scheduler StatefulSet was set to `OrderedReady`. The `.spec.clusterConfig.databaseInitialization` option was removed from the CRD, since it was just there to enable logging for the database initialization Job, which doesn't exist anymore ([#322]).

### Fixed

Expand Down
263 changes: 0 additions & 263 deletions deploy/helm/airflow-operator/crds/crds.yaml
Expand Up @@ -6937,86 +6937,6 @@ spec:
- repo
type: object
type: array
databaseInitialization:
nullable: true
properties:
logging:
default:
enableVectorAgent: null
containers: {}
properties:
containers:
additionalProperties:
anyOf:
- required:
- custom
- {}
description: Fragment derived from `ContainerLogConfigChoice`
properties:
console:
nullable: true
properties:
level:
description: Log levels
enum:
- TRACE
- DEBUG
- INFO
- WARN
- ERROR
- FATAL
- NONE
nullable: true
type: string
type: object
custom:
description: Custom log configuration provided in a ConfigMap
properties:
configMap:
nullable: true
type: string
type: object
file:
nullable: true
properties:
level:
description: Log levels
enum:
- TRACE
- DEBUG
- INFO
- WARN
- ERROR
- FATAL
- NONE
nullable: true
type: string
type: object
loggers:
additionalProperties:
properties:
level:
description: Log levels
enum:
- TRACE
- DEBUG
- INFO
- WARN
- ERROR
- FATAL
- NONE
nullable: true
type: string
type: object
default: {}
type: object
type: object
type: object
enableVectorAgent:
nullable: true
type: boolean
type: object
type: object
exposeConfig:
nullable: true
type: boolean
Expand Down Expand Up @@ -24873,186 +24793,3 @@ spec:
storage: true
subresources:
status: {}
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: airflowdbs.airflow.stackable.tech
annotations:
helm.sh/resource-policy: keep
spec:
group: airflow.stackable.tech
names:
categories: []
kind: AirflowDB
plural: airflowdbs
shortNames: []
singular: airflowdb
scope: Namespaced
versions:
- additionalPrinterColumns: []
name: v1alpha1
schema:
openAPIV3Schema:
description: Auto-generated derived type for AirflowDBSpec via `CustomResource`
properties:
spec:
properties:
config:
properties:
logging:
default:
enableVectorAgent: null
containers: {}
properties:
containers:
additionalProperties:
anyOf:
- required:
- custom
- {}
description: Fragment derived from `ContainerLogConfigChoice`
properties:
console:
nullable: true
properties:
level:
description: Log levels
enum:
- TRACE
- DEBUG
- INFO
- WARN
- ERROR
- FATAL
- NONE
nullable: true
type: string
type: object
custom:
description: Custom log configuration provided in a ConfigMap
properties:
configMap:
nullable: true
type: string
type: object
file:
nullable: true
properties:
level:
description: Log levels
enum:
- TRACE
- DEBUG
- INFO
- WARN
- ERROR
- FATAL
- NONE
nullable: true
type: string
type: object
loggers:
additionalProperties:
properties:
level:
description: Log levels
enum:
- TRACE
- DEBUG
- INFO
- WARN
- ERROR
- FATAL
- NONE
nullable: true
type: string
type: object
default: {}
type: object
type: object
type: object
enableVectorAgent:
nullable: true
type: boolean
type: object
type: object
credentialsSecret:
type: string
image:
anyOf:
- required:
- custom
- productVersion
- required:
- productVersion
description: The Airflow image to use
properties:
custom:
description: Overwrite the docker image. Specify the full docker image name, e.g. `docker.stackable.tech/stackable/superset:1.4.1-stackable2.1.0`
type: string
productVersion:
description: Version of the product, e.g. `1.4.1`.
type: string
pullPolicy:
default: Always
description: '[Pull policy](https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy) used when pulling the Images'
enum:
- IfNotPresent
- Always
- Never
type: string
pullSecrets:
description: '[Image pull secrets](https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod) to pull images from a private registry'
items:
description: LocalObjectReference contains enough information to let you locate the referenced object inside the same namespace.
properties:
name:
description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names'
type: string
type: object
nullable: true
type: array
repo:
description: Name of the docker repo, e.g. `docker.stackable.tech/stackable`
nullable: true
type: string
stackableVersion:
description: Stackable version of the product, e.g. `23.4`, `23.4.1` or `0.0.0-dev`. If not specified, the operator will use its own version, e.g. `23.4.1`. When using a nightly operator or a pr version, it will use the nightly `0.0.0-dev` image.
nullable: true
type: string
type: object
vectorAggregatorConfigMapName:
nullable: true
type: string
required:
- config
- credentialsSecret
- image
type: object
status:
nullable: true
properties:
condition:
enum:
- Pending
- Initializing
- Ready
- Failed
type: string
startedAt:
description: Time is a wrapper around time.Time which supports correct marshaling to YAML and JSON. Wrappers are provided for many of the factory methods that the time package offers.
format: date-time
nullable: true
type: string
required:
- condition
type: object
required:
- spec
title: AirflowDB
type: object
served: true
storage: true
subresources:
status: {}
Expand Up @@ -71,23 +71,6 @@ echo "Creating Airflow cluster"
kubectl apply -f airflow.yaml
# end::install-airflow[]

for (( i=1; i<=15; i++ ))
do
echo "Waiting for AirflowDB to appear ..."
if eval kubectl get airflowdb airflow; then
break
fi

sleep 1
done

echo "Waiting on AirflowDB to become ready ..."
# tag::wait-airflowdb[]
kubectl wait airflowdb/airflow \
--for jsonpath='{.status.condition}'=Ready \
--timeout 300s
# end::wait-airflowdb[]

sleep 5

echo "Awaiting Airflow rollout finish ..."
Expand Down
Expand Up @@ -71,23 +71,6 @@ echo "Creating Airflow cluster"
kubectl apply -f airflow.yaml
# end::install-airflow[]

for (( i=1; i<=15; i++ ))
do
echo "Waiting for AirflowDB to appear ..."
if eval kubectl get airflowdb airflow; then
break
fi

sleep 1
done

echo "Waiting on AirflowDB to become ready ..."
# tag::wait-airflowdb[]
kubectl wait airflowdb/airflow \
--for jsonpath='{.status.condition}'=Ready \
--timeout 300s
# end::wait-airflowdb[]

sleep 5

echo "Awaiting Airflow rollout finish ..."
Expand Down
29 changes: 1 addition & 28 deletions docs/modules/airflow/pages/getting_started/first_steps.adoc
Expand Up @@ -67,30 +67,7 @@ It should generally be safe to simply use the latest image version that is avail

This will create the actual Airflow cluster.

== Initialization of the Airflow database

When creating an Airflow cluster, a database-initialization job is first started to ensure that the database schema is present and correct (i.e. populated with an admin user). A Kubernetes job is created which starts a pod to initialize the database. This can take a while.

You can use kubectl to wait on the resource, although the cluster itself will not be created until this step is complete.:

[source,bash]
include::example$getting_started/code/getting_started.sh[tag=wait-airflowdb]

The job status can be inspected and verified like this:

[source,bash]
----
kubectl get jobs
----

which will show something like this:

----
NAME COMPLETIONS DURATION AGE
airflow 1/1 85s 11m
----

Then, make sure that all the Pods in the StatefulSets are ready:
After a while, all the Pods in the StatefulSets should be ready:

[source,bash]
----
Expand All @@ -109,10 +86,6 @@ airflow-webserver-default 1/1 11m
airflow-worker-default 2/2 11m
----

The completed set of pods for the Airflow cluster will look something like this:

image::getting_started/airflow_pods.png[Airflow pods]

When the Airflow cluster has been created and the database is initialized, Airflow can be opened in the
browser: the webserver UI port defaults to `8080` can be forwarded to the local host:

Expand Down
3 changes: 0 additions & 3 deletions docs/modules/airflow/pages/usage-guide/logging.adoc
Expand Up @@ -8,9 +8,6 @@ ConfigMap for the aggregator and by enabling the log agent:
spec:
clusterConfig:
vectorAggregatorConfigMapName: vector-aggregator-discovery
databaseInitialization:
logging:
enableVectorAgent: true
webservers:
config:
logging:
Expand Down

0 comments on commit dbcfb2f

Please sign in to comment.