# What is Airflow?

*Airflow* is a workflow management system (WMS) to handle simple to complex workflows. Below is a sample workflow to aggregate daily revenues from different Ad networks and provide a revenue forecast. Such workflows can be handled by Airflow.

<img src="https://miro.medium.com/max/1382/1*ytMWtgd5h-1EiGiCe_D3ew.png" width="750">


[Reference](https://towardsdatascience.com/why-quizlet-chose-apache-airflow-for-executing-data-workflows-3f97d40e9571)

# Use Cases 

* Extract Transform Load (ETL) jobs - extracting data from multiple sources, transforming for analysis, and loading it into another data store
* Machine Learning pipelines
* Data warehousing
* Automated testing
* Performing data backups


# Core Concepts

## Airflow DAG
DAG is short for *Directed Acyclic Graph*. It is a collection of tasks you want to run. DAG doesn't do any processing itself. It's job is to make sure that tasks are done at the right time and in the right order. Airflow DAGs are defined in standard Python files and in general one DAG file should correspond to a single workflow.

<img src="https://www.polidea.com/static/bce5fcc8a3c0ead34ab459d243a26349/331ea/image2.png"></img>



## Operators
While DAGs describe how to run a workflow, *Operators* determine what actually gets done. An operator describes a single task in a workflow. 

**Note**: Operators usually stand on their own and don't need to share resources with any other operators. If two operators need to share information, like a filename or small amount of data, consider combining them into a single operator.

### Types of Operators

* BashOperator       - executes a bash command
* PythonOperator     - calls an arbitrary Python function
* EmailOperator      -   sends an email
* SimpleHttpOperator - sends a HTTP request
* MySqlOperator, SqliteOperator, PostgresOperator etc. - executes a SQL command
* **Sensor** - waits for a certain time, file, database row, S3 key
* [Lot more ..](https://airflow.apache.org/_api/airflow/operators/index.html)

## Tasks
Once an operator is instaniated, it is referred as **task**. The instantiation defines specific values when calling an abstract operator, and the parameterized task becomes a node in a DAG.

## Task Instances
A task instance represents a specific run of a task and is characterized as the combination of DAG, a task, and a point in time. Task instances can have the below stages:
<img src="https://airflow.apache.org/_images/task_lifecycle.png"></img>

[Reference](https://airflow.apache.org/concepts.html)




## Core Benefits

* Smart Scheduling
* Depedency Management
* Resilience
* Scaleability
* Flexibility
* Monitoring & Interaction
* Programmatic Pipeline Definition