# Orchestration and ML Pipelines

![Status](https://img.shields.io/static/v1.svg?label=Status&message=Ongoing&color=orange)

<!-- Place this tag where you want the button to render. -->
<a class="github-button" href="https://github.com/particle1331/steepest-ascent" data-color-scheme="no-preference: dark; light: light; dark: dark;" data-icon="octicon-star" data-size="large" data-show-count="true" aria-label="Star particle1331/steepest-ascent on GitHub">Star</a>
<!-- Place this tag in your head or just before your close body tag. -->
<script async defer src="https://buttons.github.io/buttons.js"></script> 


In the previous module, we learned about experiment tracking and model registry.
In particular, we discussed how to get a candidate model and promote it from staging to production.
In this module, we learn how to automate this process, and having this scheduled with workflow orchestration &mdash; specifically, with [Prefect 2.0](https://orion-docs.prefect.io/).

Prefect allows us to programatically author, schedule, and monitor workflows. Prefect allows us to minimize time on **negative engineering**, i.e. coding against all possible causes of failure. This is a Sisyphean task as there are endless ways that elements of a data pipeline can fail. In practical terms, Prefect provides tools such as retries, concurrency, logging, a nice UI, tracking dependencies, a database, caching and serialization, parameterization of scheduled tasks, and more. As we shall see later, this adds observability to the whole data pipeline.

```{margin}
⚠️ **Attribution:** These are notes for [Module 3](https://github.com/DataTalksClub/mlops-zoomcamp/tree/main/03-orchestration) of the [MLOps Zoomcamp](https://github.com/DataTalksClub/mlops-zoomcamp). The MLOps Zoomcamp is a free course from [DataTalks.Club](https://github.com/DataTalksClub).
```

## Prefect flows

A **flow** in Prefect is simply a Python function. This consists of **tasks** which can be thought of as the atom of observability in Prefect. In practice, to create a flow, you simply convert functions that make it up into tasks. Consider the following example from the [*Getting Started with Prefect 2.0*](https://www.prefect.io/guide/blog/getting-started-prefect-2/#Makingourflowsbetterwithtasks) blog post. Here, we simulate getting data from an unreliable API, augmenting the fetched data, and writing the resulting data into a database. 

In [3]:
import random
from prefect import flow, task 


@task(retries=3)
def call_unreliable_api():
    choices = [{"data": 42}, "Failure"]
    res = random.choice(choices)
    if res == "Failure":
        raise Exception("Our unreliable service failed.")
    else:
        return res

@task
def augment_data(data: dict, msg: str):
    data["message"] = msg
    return data

@task
def write_to_database(data: dict):
    print(f"Wrote {data} to database successfully!")
    return "Success!"

@flow 
def pipeline(msg: str):
    api_result = call_unreliable_api()
    augmented_data = augment_data(data=api_result, msg=msg)
    write_to_database(augmented_data)


pipeline(0) # Augment data with zero.

17:26:54.662 | INFO    | prefect.engine - Created flow run 'juicy-platypus' for flow 'pipeline'
17:26:54.664 | INFO    | Flow run 'juicy-platypus' - Using task runner 'ConcurrentTaskRunner'
17:26:54.708 | INFO    | Flow run 'juicy-platypus' - Created task run 'call_unreliable_api-48f93715-0' for task 'call_unreliable_api'
17:26:54.729 | INFO    | Flow run 'juicy-platypus' - Created task run 'augment_data-505b3e0c-0' for task 'augment_data'
17:26:54.741 | ERROR   | Task run 'call_unreliable_api-48f93715-0' - Encountered exception during execution:
Traceback (most recent call last):
  File "/Users/particle1331/miniforge3/envs/prefect/lib/python3.9/site-packages/prefect/engine.py", line 798, in orchestrate_task_run
    result = await run_sync_in_worker_thread(task.fn, *args, **kwargs)
  File "/Users/particle1331/miniforge3/envs/prefect/lib/python3.9/site-packages/prefect/utilities/asyncio.py", line 54, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(call, cancellabl

Wrote {'data': 42, 'message': '0'} to database successfully!


Completed(message='All states completed.', type=COMPLETED, result=[Completed(message=None, type=COMPLETED, result={'data': 42, 'message': '0'}, task_run_id=72365071-5085-4158-a391-eeddede5ca75), Completed(message=None, type=COMPLETED, result={'data': 42, 'message': '0'}, task_run_id=e7028889-7db0-435f-896c-03e6ff7bb733), Completed(message=None, type=COMPLETED, result='Success!', task_run_id=9ec022e6-39e8-41ea-98b2-b0a14df9eaff)], flow_run_id=be507fa9-289c-4ff1-a7eb-fd2aa29ab969)

## Prefect Orion UI

Notice that this failed before pushing through. We can start the UI by calling `prefect orion start` in any directory (`.prefect` is saved in the system's root directory). This starts the Prefect Orion server in port 4200.

```bash
❯ prefect orion start
Starting...

 ___ ___ ___ ___ ___ ___ _____    ___  ___ ___ ___  _  _
| _ \ _ \ __| __| __/ __|_   _|  / _ \| _ \_ _/ _ \| \| |
|  _/   / _|| _|| _| (__  | |   | (_) |   /| | (_) | .` |
|_| |_|_\___|_| |___\___| |_|    \___/|_|_\___\___/|_|\_|

Configure Prefect to communicate with the server with:

    prefect config set PREFECT_API_URL=http://127.0.0.1:4200/api

Check out the dashboard at http://127.0.0.1:4200



INFO:     Started server process [20557]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:4200 (Press CTRL+C to quit)
```

We navigate around the UI to find the `pipeline` flow and its most recent which, as we have seen in the logs, was able to complete its execution. Here we see that this flow started on `2022/06/10 11:05:51 PM` and ended on `2022/06/10 11:05:52 PM`. We also see the logs has the details of the exception when the API call failed. In the second tab, we can see the tasks that make up this flow. There is also the subflow tab which shows that we can call flows from a parent flow.

```{figure} ../../../img/hello-world-2.png
---
---
```

One of the more interesting features of the dashboard is **Radar** on the right. This shows the dependence between tasks. Notice the linear dependence of the tasks, e.g. `write_to_database` depends on `augment_data` task but not on `call_unreliable_api`. Hovering on the tasks show the backward and forward data dependencies. Having tasks arranged in concentric circles allow for a heirarchy of dependence. Note that the runtime for each task is also conveniently displayed.

```{figure} ../../../img/hello-world-1.png
---
---
```

Finally, let us look at a flow which failed to complete all its tasks. Here all calls to the API failed despite the retries. The radar plot nicely shows where the flow has failed. This is really useful, especially when we have a dozens task and multiple subflows happening in our data pipeline.


```{figure} ../../../img/hello-world-3.png
```

<br>

**Remark.** Note also that geometrically there is more space available to grow the dependence tree due to nodes being farther apart as we move radially with a fixed angle, this also allows Radar to minimize edge crossing by combining radial and circumferential movement for the edges between task nodes. This is in comparison to traditional top-down or left-right approaches of drawing graphs. Furthermore, it turns out that Radar dynamically updates as tasks complete (or fails) its execution. And the mini-map, edge tracing, and node selection tools make workflow inspection doable even for highly complex graphs. See [*Introducing Radar*](https://www.prefect.io/guide/blog/introducing-radar/) by Bill Palombi for further reading.



## MLflow runs as flow

In this section, we will write our code from the previous module for running modelling experiments as a flow in Prefect. Our idea is to have two flows: one for preprocessing the dataset such that the preprocessed datasets will be used by all experiment runs which will be the second flow.

```{margin}
[`utils.py`](https://github.com/particle1331/inefficient-networks/blob/57e38c5eb06ac3323035fb9f8d714870e397a39a/docs/notebooks/mlops/3-prefect/utils.py)
```
```python
@task
def load_training_dataframe(file_path, y_min=1, y_max=60):
    """Load data from disk and preprocess for training."""
    
    # Load data from disk
    data = pd.read_parquet(file_path)

    # Create target column and filter outliers
    data['duration'] = data.lpep_dropoff_datetime - data.lpep_pickup_datetime
    data['duration'] = data.duration.dt.total_seconds() / 60
    data = data[(data.duration >= y_min) & (data.duration <= y_max)]

    return data


@task
def fit_preprocessor(train_data):
    """Fit and save preprocessing pipeline."""

    # Unpack passed data
    y_train = train_data.duration.values
    X_train = train_data.drop('duration', axis=1)    

    # Initialize pipeline
    cat_features = ['PU_DO']
    num_features = ['trip_distance']

    preprocessor = make_pipeline(
        AddPickupDropoffPair(),
        SelectFeatures(cat_features + num_features),
        ConvertToString(cat_features),
        ConvertToDict(),
        DictVectorizer(),
    )

    # Fit only on train set
    preprocessor.fit(X_train, y_train)
    joblib.dump(preprocessor, artifacts / 'preprocessor.pkl')
    
    return preprocessor


@task
def create_model_features(preprocessor, train_data, valid_data):
    """Fit feature engineering pipeline. Transform training dataframes."""

    # Unpack passed data
    y_train = train_data.duration.values
    y_valid = valid_data.duration.values
    X_train = train_data.drop('duration', axis=1)
    X_valid = valid_data.drop('duration', axis=1)
    
    # Feature engineering
    X_train = preprocessor.transform(X_train)
    X_valid = preprocessor.transform(X_valid)

    return X_train, y_train, X_valid, y_valid


@flow
def preprocess_data(train_data_path, valid_data_path):
    """Preprocess data for model training."""

    train_data = load_training_dataframe(train_data_path)
    valid_data = load_training_dataframe(valid_data_path)
    
    preprocessor = fit_preprocessor(train_data)
    
    return create_model_features(preprocessor, train_data, valid_data).result()
```

Here we can see that the `preprocess_data` flow loads the datasets from disk, fits and saves a preprocessor, and then creates transformed features and targets for training the machine learning model. Note that we have to be careful here to make sure we don't use concurrent execution if using multiple since we may log different preprocessors. In this case, we only use on preprocessor for all experiments so this concern does not materialize.

Next, we will create a flow for executing experiment runs. Note that in the `main` flow we are passing around a [`PrefectFuture`](https://orion-docs.prefect.io/api-ref/prefect/futures/) object instead of Python objects. Futures represent the execution of a task and allow retrieval of the task run's state. This so that Prefect is able to track data dependency between tasks &mdash; converting to Python objects, i.e. using `.result()`, breaks this lineage. Note that once a future has been passed into the function, then we can treat this as a usual Python object. This is because the `task` decorator has done work to unpack the Future object into Python objects. For example, instead of defining:

```python
@task
def f(X, y):
    ...
```

We do:

```python
@task
def f(future):
    X, y = future
```

You will see notice this in the `xgboost_runs` and `lr_runs` tasks below. For the `main` flow, we execute the following sequentially:
setting up the connection to the experiment (not a task), a subflow run for preprocessing the datasets for modelling, one run of the linear regression baseline model, and multiple runs of XGBoost hyperparameter optimization using the TPE algorithm. Sequential execution ensures that all resources are allocated to a single learning algorithm at each point in the flow run.

```{margin}
[`main.py`](https://github.com/particle1331/inefficient-networks/blob/fd937c097b9f59e171f263f0208b2407bb22efde/docs/notebooks/mlops/3-prefect/main.py)
```
```python
def objective(params, xgb_train, y_train, xgb_valid, y_valid):
    """Compute validation RMSE (one trial = one run)."""

    with mlflow.start_run():
        
        model = xgb.train(
            params=params,
            dtrain=xgb_train,
            num_boost_round=100,
            evals=[(xgb_valid, 'validation')],
            early_stopping_rounds=5
        )

        # MLflow logging
        ...

    return {'loss': rmse_valid, 'status': STATUS_OK}


@task
def xgboost_runs(num_runs, training_packet):
    """Run TPE algorithm on search space to minimize objective."""

    X_train, y_train, X_valid, y_valid = training_packet
    Xgb_train = xgb.DMatrix(X_train, label=y_train)
    Xgb_valid = xgb.DMatrix(X_valid, label=y_valid)


    search_space = {
        'max_depth': scope.int(hp.quniform('max_depth', 4, 100, 1)),
        'learning_rate': hp.loguniform('learning_rate', -3, 0),
        'reg_alpha': hp.loguniform('reg_alpha', -5, -1),
        'reg_lambda': hp.loguniform('reg_lambda', -6, -1),
        'min_child_weight': hp.loguniform('min_child_weight', -1, 3),
        'objective': 'reg:squarederror',
        'seed': 42
    }

    best_result = fmin(
        fn=partial(
            objective, 
            xgb_train=Xgb_train, y_train=y_train, 
            xgb_valid=Xgb_valid, y_valid=y_valid,
        ),
        space=search_space,
        algo=tpe.suggest,
        max_evals=num_runs,
        trials=Trials()
    )


@task
def linreg_runs(training_packet):
    """Run linear regression training."""

    X_train, y_train, X_valid, y_valid = training_packet
    
    with mlflow.start_run():

        model = LinearRegression()
        model.fit(X_train, y_train)

        # MLflow logging
        ...

        
@flow(task_runner=SequentialTaskRunner())
def main(train_data_path, valid_data_path, num_xgb_runs=1):

    # Set and run experiment
    mlflow.set_tracking_uri("sqlite:///mlflow.db")
    mlflow.set_experiment("nyc-taxi-experiment")

    future = preprocess_data(train_data_path, valid_data_path)
    linreg_runs(future)
    xgboost_runs(num_xgb_runs, future)

```


If we look at the dashboard, we can see a `utopian-rat` run of the `main` flow. As expected, this consists of 3 tasks (one of which is a subflow) that are executed sequentially as indicated in the timeline. If we look at the preprocessing subflow, we see that it has concurrent execution from overlapping lines in the timeline graph. This subflow consists of four tasks.

```{figure} ../../../img/mlflow-runs-dashboard.png
---
---
```

```{figure} ../../../img/radar_preprocessing.png
---
---
```

If we check out the radar of the `main` flow, we see the following. Here in an earlier screenshot, we see that the XGBoost run is still running. The `xgboost_runs` task has been running for 1 minute and 14 seconds. Both runs depend on the preprocessing subflow. We can go down on the radar for the preprocessing subflow by clicking on the `4 task runs` button.


```{figure} ../../../img/radar_xgb.png
---
---
```

Here we see the radar plot. You might want to open this image in a new tab to see better. Hovering on each task shows its data dependence on other tasks. For each task, we show the dependency lines in the figure below:

```{figure} ../../../img/radar.png
---
---
```

The `load_training_dataframe` task on the left loads the validation dataset since it only has `create_model_features` as the only forward dependence. The `fit_preprocessor` task trains the preprocessor and therefore depends on the task that loads the training dataframes. Next, we see the dependencies of the `load_training_dataframe` task that loads the train dataset. This sends data to the preprocessor and to the final task `create_model_features` which returns all processed data for modelling.

## Deployment

In this section, we deploy a workflow that transitions a performant model to staging in MLflow's model registry. This can be useful for regularly staging candidate models models trained on new data. The staged models can then be further checked if it should be deployed into production. Refer to the code in [Module 2](https://particle1331.github.io/inefficient-networks/notebooks/mlops/2-mlflow/2-mlflow.html#api-workflows) to understand the next few code cells. Checking the connection to MLflow:

### Model staging review

In [17]:
import mlflow
from mlflow.tracking import MlflowClient
from mlflow.entities import ViewType

MLFLOW_TRACKING_URI = "sqlite:///mlflow.db"
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
client = MlflowClient(tracking_uri=MLFLOW_TRACKING_URI)


def print_experiment(experiment):
    print(f"(Experiment)")
    print(f"    experiment_id={experiment.experiment_id}")
    print(f"    name='{experiment.name}'")
    print(f"    artifact_location='{experiment.artifact_location}'")
    print()

for experiment in client.list_experiments():
    print_experiment(experiment)

(Experiment)
    experiment_id=0
    name='Default'
    artifact_location='./mlruns/0'

(Experiment)
    experiment_id=1
    name='nyc-taxi-experiment'
    artifact_location='./mlruns/1'



Recall we run a flow which performs HPO for XGBoost with 10 runs. So we expect we have experiments in our tracker. We will filter out runs with valid RMSE less than `6.5` and inference time less than `2e-5`. This can be done through the client as follows:

In [48]:
candidates = client.search_runs(
    experiment_ids=1,
    filter_string='metrics.rmse_valid < 6.5 and metrics.inference_time < 20e-6',
    run_view_type=ViewType.ACTIVE_ONLY,
    max_results=5,
    order_by=["metrics.rmse_valid ASC"]
)

for run in candidates:
    print(f"run_id: {run.info.run_id}   rmse_valid: {run.data.metrics['rmse_valid']:.3f}   inference_time: {run.data.metrics['inference_time']:.4e}")


run_id: 4659603b9b674df59319ae6d4b67890a   rmse_valid: 6.404   inference_time: 1.1171e-05
run_id: d2baeedede3545c7a12f858c86e01605   rmse_valid: 6.415   inference_time: 7.7727e-06
run_id: 4a627ff6420549e5a0dbaf41fef5795b   rmse_valid: 6.440   inference_time: 7.2689e-06
run_id: 391212c00c67495fbfcf6e5abc0c8a9d   rmse_valid: 6.442   inference_time: 6.0649e-06
run_id: f3fc839c1036469290c309afa47e3d3b   rmse_valid: 6.470   inference_time: 8.2583e-06


In [49]:
model_to_stage = candidates[0]
model_to_stage.info.run_id, model_to_stage.data.metrics['rmse_valid']

('4659603b9b674df59319ae6d4b67890a', 6.40444573825302)

Now that we have our model, we register this to `Staging`:

In [50]:
registered_model = mlflow.register_model(
    model_uri=f"runs:/{model_to_stage.info.run_id}/model", 
    name='NYCRideDurationModel'
)

client.transition_model_version_stage(
    name='NYCRideDurationModel',
    version=registered_model.version, 
    stage='Staging',
)

Successfully registered model 'NYCRideDurationModel'.
2022/06/11 23:25:29 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: NYCRideDurationModel, version 1
Created version '1' of model 'NYCRideDurationModel'.


<ModelVersion: creation_timestamp=1654961129078, current_stage='Staging', description=None, last_updated_timestamp=1654961129083, name='NYCRideDurationModel', run_id='4659603b9b674df59319ae6d4b67890a', run_link=None, source='./mlruns/1/4659603b9b674df59319ae6d4b67890a/artifacts/model', status='READY', status_message=None, tags={}, user_id=None, version=1>

```{figure} ../../../img/mlflow-automatic-staging.png
---
---
Staged model from code cells above.
```

### MLflow Staging flow

This looks good, so we now collect the above code cells along with the code for experiment runs into a workflow which will create a new experiment, perform the experiment runs, and filters the best model for staging. We will then schedule this workflow to be run at fixed intervals using Prefect.

```{margin}
[`mlflow_deploy.py`](https://github.com/particle1331/inefficient-networks/blob/fbd70bedc69e86a76b722c64cf53a6885b85d2ba/docs/notebooks/mlops/3-prefect/mlflow_deploy.py#L259-L306)
```
```python
from prefect.deployments import DeploymentSpec
from prefect.orion.schemas.schedules import IntervalSchedule
from prefect.flow_runners import SubprocessFlowRunner
from datetime import timedelta

...

@flow(task_runner=SequentialTaskRunner())
def deploy_main(train_data_path, valid_data_path, num_xgb_runs=1):

    # Set and run experiment
    MLFLOW_TRACKING_URI = "sqlite:///mlflow.db"
    EXPERIMENT_NAME = f"nyc-taxi-experiment-{str(datetime.datetime.now())}"
    mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
    mlflow.set_experiment(EXPERIMENT_NAME)

    future = preprocess_data(train_data_path, valid_data_path)
    linreg_runs(future)
    xgboost_runs(num_xgb_runs, future)

    # Register best model staging
    client = MlflowClient(tracking_uri=MLFLOW_TRACKING_URI)
    candidates = client.search_runs(
        experiment_ids=client.get_experiment_by_name(EXPERIMENT_NAME).experiment_id,
        filter_string='metrics.rmse_valid < 6.5 and metrics.inference_time < 20e-6',
        run_view_type=ViewType.ACTIVE_ONLY,
        max_results=5,
        order_by=["metrics.rmse_valid ASC"]
    )

    model_to_stage = candidates[0]
    registered_model = mlflow.register_model(
        model_uri=f"runs:/{model_to_stage.info.run_id}/model", 
        name='NYCRideDurationModel'
    )

    client.transition_model_version_stage(
        name='NYCRideDurationModel',
        version=registered_model.version, 
        stage='Staging',
    )
```

Here we have created a deployment which will run the `deploy_main` flow every 5 minutes locally. We also set the parameters since the `deploy_main` flow takes in arguments when it runs. This adds a bit of flexibility. Specifying `SubprocessFlowRunner()` as flow runner, means that this flow is executed locally, e.g. not on Kubernetes or Docker containers.


### Local storage setup

Before creating a deployment in Prefect let us first setup a local storage for saving for persisting flow code for deployments, task results, and flow results. This is simple enough to do.

```bash
❯ prefect storage create
Found the following storage types:
0) Azure Blob Storage
    Store data in an Azure blob storage container.
1) File Storage
    Store data as a file on local or remote file systems.
2) Google Cloud Storage
    Store data in a GCS bucket.
3) Local Storage
    Store data in a run's local file system.
4) S3 Storage
    Store data in an AWS S3 bucket.
5) Temporary Local Storage
    Store data in a temporary directory in a run's local file system.
Select a storage type to create: 3
You've selected Local Storage. It has 1 option(s).
STORAGE PATH: ~/.prefect/local-storage
Choose a name for this storage configuration: local-storage
Validating configuration...
Registering storage with server...
Registered storage 'local-storage' with identifier '0e3f5d76-4058-4bc7-afc6-eb101f749139'.
```

### Deployment specification

Finally, to create our deployment in Prefect, we have to execute `prefect deployment create <deployment script (.py)>` in the terminal.

```python
DeploymentSpec(
    flow=deploy_main,
    name="mlflow_staging",
    schedule=IntervalSchedule(interval=timedelta(minutes=5)),
    flow_runner=SubprocessFlowRunner(),
    parameters={
        "train_data_path": data_path / 'green_tripdata_2021-01.parquet',
        "valid_data_path": data_path / 'green_tripdata_2021-02.parquet',
        "num_xgb_runs": 10
    },
    tags=["ml"]
)
```

In [130]:
!prefect deployment create mlflow_deploy.py

Loading deployment specifications from python script at [32m'mlflow_deploy.py'[0m...
Creating deployment [1;34m'mlflow_staging'[0m for flow [34m'deploy-main'[0m...
Deploying flow script from [32m'/Users/particle1331/code/inefficient-networks/docs/n[0m
[32motebooks/mlops/3-prefect/main.py'[0m using Local Storage...
Created deployment [34m'deploy-main/[0m[1;34mmlflow_staging'[0m.
View your new deployment with: 

    prefect deployment inspect [34m'deploy-main/[0m[1;34mmlflow_staging'[0m
[32mCreated 1 deployments![0m


Note that relative imports fails for Prefect deployments, so we had to paste everything in the [`mlflow_deploy.py`](https://github.com/particle1331/inefficient-networks/blob/fbd70bedc69e86a76b722c64cf53a6885b85d2ba/docs/notebooks/mlops/3-prefect/mlflow_deploy.py) script for lack of time. But if we are to do this properly, we have to create a package for the project so imports for our own scripts work everywhere. Also notice that for the sake of simplicity we have the same dataset for each experiment run. Ideally, this should change depending on when the experiment has been run. Otherwise, we are simply staging the same model for each scheduled run.

```{figure} ../../../img/late-runs.png
---
---
109 runs are now scheduled in Prefect.
```

Workflow runs are now scheduled in Prefect. Notice that there are late runs. This is because we haven't attached any workers that will run these tasks. Note that unlike CI/CD platforms, all compute happens outside of Prefect that users will have to provide to run the scheduled workflows. We will now create and fire up a **work queue** for our deployment. Note that the setting up can also be done in the UI.  

In [131]:
!prefect deployment ls

[3m                             Deployments                             [0m
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1mName                      [0m[1m [0m┃[1m [0m[1mID                                  [0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│[34m [0m[34mdeploy-main/[0m[1;34mmlflow_staging[0m[34m [0m│[36m [0m[36m1747f0db-9292-4816-b3d4-21b7757e4ef7[0m[36m [0m│
└────────────────────────────┴──────────────────────────────────────┘


In [132]:
!prefect work-queue create \
    --deployment '1747f0db-9292-4816-b3d4-21b7757e4ef7' \
    --flow-runner subprocess \
    mlflow-deploy-runner

[1;35mUUID[0m[1m([0m[32m'9aa4a3e0-9590-43fe-988d-956f652b0bc6'[0m[1m)[0m


In [136]:
!prefect work-queue preview 9aa4a3e0-9590-43fe-988d-956f652b0bc6

┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃[1m [0m[1mScheduled St…[0m[1m [0m┃[1m [0m[1mRun ID                 [0m[1m [0m┃[1m [0m[1mName    [0m[1m [0m┃[1m [0m[1mDeployment ID          [0m[1m [0m┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━┩
│[33m [0m[33m2022-06-11 1…[0m[33m [0m│[36m [0m[36m82b034c5-1e08-47f9-a96…[0m[36m [0m│[32m [0m[32mcinnamo…[0m[32m [0m│[34m [0m[34m1747f0db-9292-4816-b3d…[0m[34m [0m│
│[33m [0m[33m2022-06-11 1…[0m[33m [0m│[36m [0m[36mf3383b88-ac26-44ce-949…[0m[36m [0m│[32m [0m[32mvigorou…[0m[32m [0m│[34m [0m[34m1747f0db-9292-4816-b3d…[0m[34m [0m│
│[33m [0m[33m2022-06-11 1…[0m[33m [0m│[36m [0m[36m481e85dc-0cad-4145-8d7…[0m[36m [0m│[32m [0m[32maspirin…[0m[32m [0m│[34m [0m[34m1747f0db-9292-4816-b3d…[0m[34m [0m│
│[33m [0m[33m2022-06-11 1…[0m[33m [0m│[36m [0m[36m4a502fde-21bf-43ee-a09…[0m[36m [

In [139]:
!prefect agent start 9aa4a3e0-9590-43fe-988d-956f652b0bc6

Starting agent with ephemeral API...

  ___ ___ ___ ___ ___ ___ _____     _   ___ ___ _  _ _____
 | _ \ _ \ __| __| __/ __|_   _|   /_\ / __| __| \| |_   _|
 |  _/   / _|| _|| _| (__  | |    / _ \ (_ | _|| .` | | |
 |_| |_|_\___|_| |___\___| |_|   /_/ \_\___|___|_|\_| |_|


Agent started! Looking for work from queue 
'9aa4a3e0-9590-43fe-988d-956f652b0bc6'...
04:08:50.216 | INFO    | prefect.agent - Submitting flow run '5f8c7d3c-8b41-4c22-b030-c8b65a871ea8'
04:08:55.230 | INFO    | prefect.agent - Submitting flow run '5f8c7d3c-8b41-4c22-b030-c8b65a871ea8'
04:08:59.266 | INFO    | prefect.flow_runner.subprocess - Opening subprocess for flow run '5f8c7d3c-8b41-4c22-b030-c8b65a871ea8'...
04:08:59.278 | INFO    | prefect.agent - Completed submission of flow run '5f8c7d3c-8b41-4c22-b030-c8b65a871ea8'
04:09:02.289 | INFO    | Flow run 'sociable-finch' - Using task runner 'SequentialTaskRunner'
2022/06/12 04:09:02 INFO mlflow.tracking.fluent: Experiment with name 'nyc-taxi-experiment-2022-06-1