# Getting Started with Prefect

### Introduction

Prefect is popular workflow manager, which will allow us to schedule different services, as well as monitor their execution.

Let's dive in so we can begin to see how it works.

### Installing Prefect

The first task for us to do is to install prefect.  Move inside of the `src` directory, and the run the `requirements.txt` file.

> **Note**: ideally, we should create a new environment first.

* `python -m venv ./venv`
* `source ./venv bin/activate`

And then run the following to install our prefect library.

`pip install -r requirements.txt`

Then take a look at our index file.  There you will see the following:

```python
import requests
import pandas as pd
from prefect import flow, task

# @task
def find_receipts(name):
    url = "https://data.texas.gov/resource/naix-2893.json"
    response = requests.get(url, params = {'taxpayer_name': name})
    return response.json()

# @task
def write_to_csv(data):
    df = pd.DataFrame(data)
    df.to_csv('./data/receipts.csv')

# @flow
def get_and_write_data(name):
    receipts = find_receipts(name)
    df = write_to_csv(receipts)
    return df

name = 'HONDURAS MAYA CAFE & BAR LLC'
get_and_write_data(name)
```

You can see that, ignoring the comments, we have some normal Python code.  It starts at the bottom, with a call to `get_and_write_data`, which then calls the `find_receipts` and `write_to_csv`.  If you run this file with `python3 index.py`, you'll see the code is run.

### Moving to Prefect

To change this to prefect code, just uncomment the decorators (`@flow` and `@task` words).


```python
from prefect import flow, task

@flow
def get_and_write_data(url):
    ...

@task
def find_receipts(name):
    ...
    
@task
def write_to_csv(data)
    ...
    
name = 'HONDURAS MAYA CAFE & BAR LLC'
print(get_and_write_data(name))
```

Now that we added the decorators, the `get_and_write_data` function is now a **flow**, which then calls the `find_receipts` **task**. 

So a **flow** is really a `workflow`.  And a flow has many **tasks**, where a task is a discrete unit of work in a workflow.  

> **Comparing to Airflow**: In airflow the organization is very similar -- except we have a DAG (directed acyclic graph) and a dag has many tasks.  So both tools use the word tasks, and the overall workflow is described as a DAG (in airflow) and a flow (in prefect).  

Now let's run this code again.  We can do so like we did before.

```bash
python3 index.py
```

<img src="./flow-run.png" width="100%">

We can see that unlike with a python script, our flows and tasks log a lot of information.  Running the flow creates a **flow run** above named `hypnotica-jackal` -- just a random generated name.  And that flow run has a **task run** of the `find-receipts` task.

> So to review, a **flow** has many **tasks**.  
> And an individual run is called a `flow-run` which has many `task-runs`.

These flow runs, and task runs are logged, and this is an essential part of what prefect offers.  These logs will make it easier to see what occurs, when something inevitably goes wrong in our data pipeline.

It turns out that prefect also allows us to *view* these runs in a dashboard, just like a tool like airflow.  But we'll see that in the next lesson.

### Summary

In this lesson, we learned about prefect -- a workflow manager.  The first concept we learned about in prefect is a flow, which has many tasks.  And which is defined like so. 

```python
from prefect import flow, task

@flow
def get_and_write_data(url):
    pass

@task
def find_receipts(name):
    pass

name = 'HONDURAS MAYA CAFE & BAR LLC'
print(get_data(name))
```

We can run this workflow by just running the file.

`python3 index.py`

This will create a `flow-run` and a `flow-run` has many `task-runs`.  Prefect will log each of these runs in it's database.

### Resources

[Prefect with Lambda and Snowflake](https://www.dataknowsall.com/prefectintro.html)