# Abstract of airflow chapter 02

### Anatomy of an Airflow DAG

In this chapter we implement a simple pipeline wich download rocket images, save and notify the dowload in the bash

<img src="./pic/CH_02_rocket_download_pipeline.png" width="800">

we can see one posible solucion to this using airflow dag:

```python
import json
import pathlib
import airflow
import requests
import requests.exceptions as requests_exceptions from airflow import DAG
from airflow.operators.bash import BashOperator from airflow.operators.python import PythonOperator
dag = DAG( #A
            dag_id="download_rocket_launches", #B 
            start_date=airflow.utils.dates.days_ago(14), #C 
            schedule_interval=None, #D
         )

download_launches = BashOperator( #E 
    task_id="download_launches", #F 
    bash_command="curl -o /tmp/launches.json -L 'https://ll.thespacedevs.com/2.0.0/launch/upcoming'", 
    dag=dag,
 )

def _get_pictures(): #G
    # Ensure directory exists pathlib.Path("/tmp/images").mkdir(parents=True, exist_ok=True)
    # Download all pictures in launches.json
    with open("/tmp/launches.json") as f:
        launches = json.load(f)
        image_urls = [launch["image"] for launch in launches["results"]] for image_url in image_urls:
        try:
            response = requests.get(image_url) 
            image_filename = image_url.split("/")[-1] 
            target_file = f"/tmp/images/{image_filename}" 
            
            with open(target_file, "wb") as f:
                f.write(response.content)
                print(f"Downloaded {image_url} to {target_file}")
                
        except requests_exceptions.MissingSchema: 
            print(f"{image_url} appears to be an invalid URL.")
        except requests_exceptions.ConnectionError: 
            print(f"Could not connect to {image_url}.")
            
get_pictures = PythonOperator( #H 
                                task_id="get_pictures", 
                                python_callable=_get_pictures, #H 
                                dag=dag,
                                )

notify = BashOperator(
                        task_id="notify",
                        bash_command='echo "There are now $(ls /tmp/images/ | wc -l) images."', 
                        dag=dag,
)

download_launches >> get_pictures >> notify #I    
```

And the explanation of the structure:

- #A Instantiate a DAG object - this is the starting point of any workflow 
- #B The name of the DAG
- #C The date at which the DAG should first start running
- #D At what interval the DAG should run
- #E Apply Bash to download the URL response with curl
- #F The name of the task
- #G A Python function will parse the response and download all rocket pictures #H Call the Python function in the DAG with a PythonOperator
- #I Set the order of execution of tasks

Tasks in Airflow manage the execution of an Operator; they can be thought of as a small “wrapper” or “manager” around an operator that ensures the operator executes correctly. The user can focus on the work to be done by using operators, while Airflow ensures correct execution of the work via tasks:

<img src="./pic/CH02_taskoperators.png" width="800">

The PythonOperator in Airflow is responsible for running any Python code. Just like the BashOperator used before, this and all other operators require a task_id. The task_id is referenced when running a task and displayed in the UI. The use of a PythonOperator is always twofold:
- We define the operator itself (get_pictures) and
- The python_callable argument points to a callable, typically a function (_get_pictures)

<img src="./pic/CH02_operator_callable.png" width="800">

#### Installation process and running the DAG 

Now we have our basic rocket launch DAG, let’s get it up and running and view it in the Airflow UI. The bare minimum Airflow consists of three core components: (1) a scheduler, (2) a webserver, and (3) a database. In order to get Airflow up and running, you can install Airflow either in your Python environment or run a Docker container. First install inside your pyenv with the command:

``` python
pip install apache-airflow

```

After that we need to start the airflow server, database, and scheduler with the commands:

```bash
1. airflow db init
2. airflow users create --username admin --password admin --firstname Anonymous --lastname Admin --role Admin --email admin@example.org
3. cp download_rocket_launches.py ~/airflow/dags/
4. airflow webserver &
5. airflow scheduler &
```

After that we can type http://localhost:8080 in firefox, and login with username “admin” and password “admin” to view Airflow.

The other sections of this chapter is some kind of presentation of airflow webserver