title | tags | metaDescription | redirects | freshnessValidatedDate | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Monitor Apache Airflow with OpenTelemetry |
|
Monitor Airflow data with New Relic using OpenTelemetry. |
|
2023-11-16 |
import opentelemetryAirflow01 from 'images/opentelemetry_screenshot_airflow_01.webp'
import opentelemetryAirflow02 from 'images/opentelemetry_screenshot_airflow_02.webp'
Monitor Apache Airflow data by configuring OpenTelemetry to send data to New Relic, where you can visualize tasks, operators, and DAG executions as metrics.
Before enabling OpenTelemetry in Apache Airflow, you'll need to install the Airflow package with the otel
extra. The installation method depends on your Airflow deployment approach:
-
Follow the installation instructions from Airflow's Documentation.
-
When installing with pip, add the
otel
extra to the command. For example:pip install "apache-airflow[otel]"
-
Set up the Airflow Docker image using instructions from Airflow's documentation.
-
Extend the pre-built Docker image by using a Dockerfile to install the
otel
extra. You can replace the latest tag with your desired version of the image.FROM apache/airflow:latest RUN pip install --no-cache-dir "apache-airflow[otel]==$AIRFLOW_VERSION"
To send Airflow metrics to New Relic, configure the OpenTelemetry metrics to export data to an OpenTelemetry collector, which will then forward the data to a New Relic OTLP endpoint using a .
Due to Airflow's current lack of support for sending OpenTelemetry data with authentication headers, the OpenTelemetry collector is essential for authenticating with New Relic.- Follow the basic collector example to set up your OpenTelemetry collector.
- Configure the collector with your appropriate OTLP endpoint, such as
https://otlp.nr-data.net:4317
. - For authentication, add your to the environment variable
NEW_RELIC_LICENSE_KEY
so that it populates theapi-key
header. - Ensure port 4318 on the collector is reachable from the running Airflow instance. (For docker, you may need to use a docker network.)
- Launch the collector.
Airflow sends metrics using OTLP over HTTP, which uses port 4318
. Airflow has multiple methods of setting configuration options.
Choose one of the following methods to set the required options for Airflow.
-
Set the required options in the
airflow.cfg
file.[metrics] otel_on = True otel_host = localhost otel_port = 4318 otel_ssl_active = False
-
Or, set the required options as environment variables.
export AIRFLOW__METRICS__OTEL_ON=True export AIRFLOW__METRICS__OTEL_HOST=localhost export AIRFLOW__METRICS__OTEL_PORT=4318 export AIRFLOW__METRICS__OTEL_SSL_ACTIVE=False
To confirm New Relic is collecting your Airflow data, run a DAG or pipeline:
- Login to Airflow.
- Click the run button on one of the existing tutorial DAGs, or your own.
- Wait for the pipeline to finish running.
- Go to one.newrelic.com > All capabilities > APM & services > Services - OpenTelemetry > Airflow.
- Click Metrics Explorer to visualize metrics for pipeline executions.
With Airflow metrics, you can build dashboards around individual pipelines, overall performance, or view a comparison between different pipelines. Click here to learn more about querying your metrics.
This query retrieves a list of all reported metrics for Airflow:
SELECT uniques(metricName) FROM Metric WHERE entity.name = 'Airflow' AND metricName LIKE 'airflow.%' SINCE 30 MINUTES AGO LIMIT 100
Make sure to change the limit (100
) if your metric names exceed it.
This query shows a comparison of different completion times for successful runs of different DAGs:
SELECT latest(airflow.dagrun.duration.success) FROM Metric FACET dag_id WHERE entity.name = 'Airflow' SINCE 30 minutes AGO TIMESERIES
This query shows counts of failed DAG runs, which can be used to build for critical pipelines:
SELECT count(airflow.dagrun.duration.failed) FROM Metric FACET dag_id WHERE entity.name = 'Airflow' SINCE 30 minutes AGO TIMESERIES