Skip to content

Latest commit

 

History

History
151 lines (108 loc) · 7.36 KB

File metadata and controls

151 lines (108 loc) · 7.36 KB
title tags metaDescription redirects freshnessValidatedDate
Monitor Apache Airflow with OpenTelemetry
Integrations
Open source telemetry integrations
OpenTelemetry
Airflow
Quickstart
Monitor Airflow data with New Relic using OpenTelemetry.
/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/airflow/monitoring-airflow-ot
2023-11-16

import opentelemetryAirflow01 from 'images/opentelemetry_screenshot_airflow_01.webp'

import opentelemetryAirflow02 from 'images/opentelemetry_screenshot_airflow_02.webp'

Monitor Apache Airflow data by configuring OpenTelemetry to send data to New Relic, where you can visualize tasks, operators, and DAG executions as metrics.

Screenshot showing sample Airflow DAG runs in New Relic

Prerequisites [#prerequisites]

Before enabling OpenTelemetry in Apache Airflow, you'll need to install the Airflow package with the otel extra. The installation method depends on your Airflow deployment approach:

Option 1: Installing from PyPi [#install-pypi]

  1. Follow the installation instructions from Airflow's Documentation.

  2. When installing with pip, add the otel extra to the command. For example:

    pip install "apache-airflow[otel]"

Option 2: Installing from Docker [#install-docker]

  1. Set up the Airflow Docker image using instructions from Airflow's documentation.

  2. Extend the pre-built Docker image by using a Dockerfile to install the otel extra. You can replace the latest tag with your desired version of the image.

    FROM apache/airflow:latest
    RUN pip install --no-cache-dir "apache-airflow[otel]==$AIRFLOW_VERSION"
    
`$AIRFLOW_VERSION` is already set by the apache/airflow container, but can be replaced with a version number for other base images.

Configuration [#configuration]

To send Airflow metrics to New Relic, configure the OpenTelemetry metrics to export data to an OpenTelemetry collector, which will then forward the data to a New Relic OTLP endpoint using a .

Due to Airflow's current lack of support for sending OpenTelemetry data with authentication headers, the OpenTelemetry collector is essential for authenticating with New Relic.

Configure the OpenTelemetry collector [#configuration-collector]

  1. Follow the basic collector example to set up your OpenTelemetry collector.
  2. Configure the collector with your appropriate OTLP endpoint, such as https://otlp.nr-data.net:4317.
  3. For authentication, add your to the environment variable NEW_RELIC_LICENSE_KEY so that it populates the api-key header.
  4. Ensure port 4318 on the collector is reachable from the running Airflow instance. (For docker, you may need to use a docker network.)
  5. Launch the collector.

Configure Airflow metrics [#configuration-airflow]

Airflow sends metrics using OTLP over HTTP, which uses port 4318. Airflow has multiple methods of setting configuration options.

If your environment has Airflow running in a docker container alongside the OpenTelemetry Collector, you will need to change the `otel_host` setting from `localhost` to the container address of the collector.

Choose one of the following methods to set the required options for Airflow.

  1. Set the required options in the airflow.cfg file.

    [metrics]
    otel_on = True
    otel_host = localhost
    otel_port = 4318
    otel_ssl_active = False
    
  2. Or, set the required options as environment variables.

    export AIRFLOW__METRICS__OTEL_ON=True
    export AIRFLOW__METRICS__OTEL_HOST=localhost
    export AIRFLOW__METRICS__OTEL_PORT=4318
    export AIRFLOW__METRICS__OTEL_SSL_ACTIVE=False
    
Airflow has [additional settings](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/metrics.html#setup-opentelemetry) for metrics that may be useful. This includes the ability to [rename metrics](https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/metrics.html#rename-metrics) before sending, which is helpful if metric names exceed the 63 byte limit for OpenTelemetry.

Validate data is sent to New Relic [#validation]

To confirm New Relic is collecting your Airflow data, run a DAG or pipeline:

  1. Login to Airflow.
  2. Click the run button on one of the existing tutorial DAGs, or your own.
  3. Wait for the pipeline to finish running.
  4. Go to one.newrelic.com > All capabilities > APM & services > Services - OpenTelemetry > Airflow.
  5. Click Metrics Explorer to visualize metrics for pipeline executions.

Building dashboards [#building-dashboards]

With Airflow metrics, you can build dashboards around individual pipelines, overall performance, or view a comparison between different pipelines. Click here to learn more about querying your metrics.

This query retrieves a list of all reported metrics for Airflow:

SELECT uniques(metricName) FROM Metric WHERE entity.name = 'Airflow' AND metricName LIKE 'airflow.%' SINCE 30 MINUTES AGO LIMIT 100

Make sure to change the limit (100) if your metric names exceed it.

This query shows a comparison of different completion times for successful runs of different DAGs:

SELECT latest(airflow.dagrun.duration.success) FROM Metric FACET dag_id WHERE entity.name = 'Airflow' SINCE 30 minutes AGO TIMESERIES

Screenshot showing sample Airflow DAG runs in New Relic

This query shows counts of failed DAG runs, which can be used to build for critical pipelines:

SELECT count(airflow.dagrun.duration.failed) FROM Metric FACET dag_id WHERE entity.name = 'Airflow' SINCE 30 minutes AGO TIMESERIES

Screenshot showing sample Airflow failures in New Relic

Airflow's OpenTelemetry metrics are not maintained by New Relic, so if you have any issues with the instrumentation, [create a new issue in Airflow's GitHub repo](https://github.com/apache/airflow/issues).