Skip to content

Commit

Permalink
Adapt getting started to executors instead of workers (#351)
Browse files Browse the repository at this point in the history
* adapt getting started to executors instead of workers

* fix cluster config and version references
  • Loading branch information
maltesander committed Dec 1, 2023
1 parent 4437c3a commit c038e9e
Showing 1 changed file with 17 additions and 18 deletions.
35 changes: 17 additions & 18 deletions docs/modules/airflow/pages/getting_started/first_steps.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The `connections.secretKey` will be used for securely signing the session cookie

`connections.celeryResultBackend` must contain the connection string to the SQL database storing the job metadata (in the example above we are using the same postgresql database for both).

`connections.celeryBrokerUrl` must contain the connection string to the Redis instance used for queuing the jobs submitted to the airflow worker(s).
`connections.celeryBrokerUrl` must contain the connection string to the Redis instance used for queuing the jobs submitted to the airflow executor(s).

The `adminUser` fields are used to create an admin user.
Please note that the admin user will be disabled if you use a non-default authentication mechanism like LDAP.
Expand All @@ -36,7 +36,7 @@ Please note that the admin user will be disabled if you use a non-default authen
An Airflow cluster is made of up three components:

- `webserver`: this provides the main UI for user-interaction
- `workers`: the nodes over which the job workload will be distributed by the scheduler
- `executors`: the `CeleryExecutor` or `KubernetesExecutor` nodes over which the job workload will be distributed by the scheduler
- `scheduler`: responsible for triggering jobs and persisting their metadata to the backend database

Create a file named `airflow.yaml` with the following contents:
Expand All @@ -54,16 +54,15 @@ include::example$getting_started/code/getting_started.sh[tag=install-airflow]

Where:

- `metadata.name` contains the name of the Airflow cluster
- the label of the Docker image provided by Stackable must be set in `spec.version`
- `spec.celeryExecutors`: deploy workers managed by Airflow's Celery engine. Alternatively you can use `kuberenetesExectors` that will use Airflow's Kubernetes engine for worker management. For more information see https://airflow.apache.org/docs/apache-airflow/stable/executor/index.html#executor-types).
- the `spec.loadExamples` key is optional and defaults to `false`. It is set to `true` here as the example DAGs will be used when verifying the installation.
- the `spec.exposeConfig` key is optional and defaults to `false`. It is set to `true` only as an aid to verify the configuration and should never be used as such in anything other than test or demo clusters.
- the previously created secret must be referenced in `spec.credentialsSecret`
- `metadata.name` contains the name of the Airflow cluster.
- the product version of the Docker image provided by Stackable must be set in `spec.image.productVersion`.
- `spec.celeryExecutors`: deploy executors managed by Airflow's Celery engine. Alternatively you can use `kuberenetesExectors` that will use Airflow's Kubernetes engine for executor management. For more information see https://airflow.apache.org/docs/apache-airflow/stable/executor/index.html#executor-types).
- the `spec.clusterConfig.loadExamples` key is optional and defaults to `false`. It is set to `true` here as the example DAGs will be used when verifying the installation.
- the `spec.clusterConfig.exposeConfig` key is optional and defaults to `false`. It is set to `true` only as an aid to verify the configuration and should never be used as such in anything other than test or demo clusters.
- the previously created secret must be referenced in `spec.clusterConfig.credentialsSecret`.

NOTE: Please note that the version you need to specify for `spec.version` is not only the version of Apache Airflow which you want to roll out, but has to be amended with a Stackable version as shown. This Stackable version is the version of the underlying container image which is used to execute the processes. For a list of available versions please check our
https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%airflow%2Ftags[image registry].
It should generally be safe to simply use the latest image version that is available.
NOTE: Please note that the version you need to specify for `spec.image.productVersion` is the desired version of Apache Airflow. You can optionally specify the `spec.image.stackableVersion` to a certain release like `23.11.0` but it is recommended to leave it out and use the default provided by the operator. For a list of available versions please check our https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%airflow%2Ftags[image registry].
It should generally be safe to simply use the latest version that is available.

This will create the actual Airflow cluster.

Expand All @@ -77,13 +76,13 @@ kubectl get statefulset
The output should show all pods ready, including the external dependencies:

----
NAME READY AGE
airflow-postgresql 1/1 16m
airflow-redis-master 1/1 16m
airflow-redis-replicas 1/1 16m
airflow-scheduler-default 1/1 11m
airflow-webserver-default 1/1 11m
airflow-worker-default 2/2 11m
NAME READY AGE
airflow-postgresql 1/1 16m
airflow-redis-master 1/1 16m
airflow-redis-replicas 1/1 16m
airflow-scheduler-default 1/1 11m
airflow-webserver-default 1/1 11m
airflow-celery-executor-default 2/2 11m
----

When the Airflow cluster has been created and the database is initialized, Airflow can be opened in the
Expand Down

0 comments on commit c038e9e

Please sign in to comment.