In this notebook we will 

* install Airflow
* Run the individual Airflow components
* Write our first DAG

## Installing Airflow

### Step One: Getting Ready
Open a terminal and navigate to the directory for this notebook

```{shell}
cd {your path}/python101/airflow101
```

Next, create and enter a new virtual environment:

```
python3 -m venv .venv-airflow
source .venv-airflow/bin/activate
```

### Step Two: Install Airflow

```
pip install apache-airflow==1.10.15
pip install SQLAlchemy==1.3.23 
pip install Flask-SQLAlchemy==2.4.4
```

### Step Three: Configure Airflow

Set Airflow environment variables for this shell session by sourcing a provided file:
```
source localAirflowEnv.sh
```

This tells Airflow where to look for DAGs and store data

```
airflow initdb
```

This creates and initialises a local Airflow metadata database using SQLite.

## Running Airflow Components

### Webserver

In your current terminal run:

```
airflow webserver
```

This should start a webserver listening on http://localhost:8080/

### Scheduler

Open another terminal and get it ready to run Airflow:

1. cd to your airflow101 directory
2. Enter your Airflow virtual environment
3. Source your Airflow environment variables

Then you can run

```
airflow scheduler
```

## Hello World DAG

We're now going to create our first DAG. It will have two tasks: say hello and then say goodbye.

### Step One: Create the Python DAG file

Create a file called `hello_world.py` in the directory `python101/airflow101/dags` and open it in your preferred editor.

Add the following content. I recommend typing instead of copy-pasting. It forces you to pay more attention to the details.

```python
from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime


def say_something(phrase, dt, **kwargs):
    print(f'{phrase} {dt}')


dag = DAG('hello_dag',
          schedule_interval='* * * * *',
          start_date=datetime(2021, 5, 17))


hello_task = PythonOperator(dag=dag,
                            task_id='say_hello',
                            python_callable=say_something,
                            op_kwargs={
                                'phrase': 'hello world!'
                            })

goodbye_task = PythonOperator(dag=dag,
                              task_id='say_goodbye',
                              python_callable=say_something,
                              op_kwargs={
                                  'phrase': 'see you later!'
                              })

hello_task >> goodbye_task
```

### Step Two: Start your DAG and explore in the Webserver

By default your DAG starts in a non-running state.

Click the toggle to get it running and give it time to schedule and run a few times.

Explore your DAG in the Webserver.

Find the Graph View (a view of the dependencies between your tasks)

Find the Tree View (a view of the status of previous runs of the DAG and tasks)

View the logs for individual tasks.

See if you can find log output in the Scheduler terminal window.

### Step Three: Let's examine the code in closer detail

In [12]:
def keyword_args(salutation='hello', item='world', **kwargs):
    print(f'{salutation} {item}')


In [13]:
keyword_args(salutation='goodbye', item='world')

goodbye world
