Skip to content

joednkn1/airflow-prometheus

Repository files navigation

Airflow Prometheus (0.4.2)

PyPI GitHub commit activity

This is an Airflow extension that adds support for generating Prometheus metrics. Package is extension of awesome airflow-promtheus-exporter project by Robinhood.

Installation

To install this package please do:

  $ python3 -m pip install "airflow-prometheus==0.4.2"

Or if you are using Poetry to run Apache Airflow:

  $ poetry add apache-airflow@latest
  $ poetry add "airflow-prometheus@0.4.2"

What this package provides?

  • Support for exporting Prometheus metrics
  • Support for exporting additional data into Grafana

Prometheus metrics

Metrics are exported on the /metrics endpoint:

Property Labels Descriptions
dag_bag_stats property Statistics for the dag bag:
* property=loaded_dags_count - number of loaded DAGs
airflow_dag_status dag_id, owner, status Shows the number of dag starts with this status
airflow_dag_run_duration dag_id Duration of successful dag_runs in seconds
airflow_dag_scheduler_delay dag_id Airflow DAG scheduling delay
airflow_task_status dag_id, task_id, operator_name, owner, state Shows the number of task instances with particular status
airflow_task_duration aggregation, operator_name, task_id, dag_id Durations of tasks in seconds by operator:
* aggregation=max
* aggregation=min
* aggregation=avg
airflow_task_max_tries operator_name, task_id, dag_id Max tries for tasks
airflow_last_dag_run status, task_id, dag_id Tasks status for latest dag run
airflow_successful_task_duration task_id, dag_id, execution_date Duration of successful tasks in seconds
airflow_task_fail_count dag_id, task_id Count of failed tasks
airflow_xcom_parameter dag_id, task_id Airflow Xcom Parameter
airflow_task_scheduler_delay queue Airflow Task scheduling delay
airflow_num_queued_tasks - Airflow Number of Queued Tasks

JSON metadata

You can use SimpleJson datasource to display states of DAGs. Install the plugin with the following command or via grafana.com:

    $ sudo grafana-cli plugins install grafana-simple-json-datasource

Now let's create a json datasource and point it to /metrics/json/ (trailing slash is important and you may need to check skip TLS verify in order for it to work):

Now add ad-hoc variable:

Now you can see ad-hoc filter at the top of the dashboard. You can select DAGs with that filter. Now we need to add some visualizations.

We add new panel and select newly created json datasource. As metric we select dags and for visualization type: NodeGraph

Node graph will show the dependencies between tasks and their status for the latests instance of the DAG. DAGs can be selected with the ad-hoc variable you created. You can remove that ad-hoc filter to show all DAGs, but it's not recommended as NodeGraph panel is fairly bad at zooming or paning the diagram.

Example dashboard

The example dashboard is available here: example/dashboard.json