***

# Taxi Trip Fare Prediction - Model 1

***

The goal of this example is to train and serve a taxi trip fare prediction model. We will
- train an ML model based on historical taxi trip fare data
- serve the ML model to predict the trip fare for new trips


***

### Create project

We will create a project called `trip_fare` for this tutorial.

In [None]:
create project trip_fare

***

# Configure Data Sources

<html><img src="../../images/trip_fare_images/1_1.png"/></html>

We will be using trip_table.csv available in s3 bucket to fetch the data and start working on it. The data includes the pickup datetime, pickup latitude, longitude, dropoff latitude, longitude, pickup and dropoff zipcodes, passenger count and fare amount.

The first few lines of the csv file are shown below,

##### pickup_datetime,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,pickup_zipcode,dropoff_zipcode,passenger_count,fare_amount
    2022-10-18 03:16:57,-73.9617,40.7628,-73.9748,40.7528,10065,10017,1,25.45
    2022-10-18 03:17:00,-73.9957,40.7594,-73.9758,40.7553,10199,10111,1,16.48
    2022-10-18 03:17:00,-73.9946,40.7259,-73.9915,40.7325,10003,10003,1,13.1
    2022-10-18 03:17:46,-73.9893,40.7419,-74.0018,40.7263,10010,10012,1,23.31
    
In this step we will connect AWS S3 as a data source to Aizen. This will allow Aizen to read the trip table csv file from the S3 bucket. Data sources are connected to Aizen via the `configure datasource` command. This command will prompt for various settings. The relevant information for this command is shown below. Enter this information in the prompts:
    
            Source: New                Source Name: trip_datasource
            Source Description: taxi trip data
            Source Type: aws
            Source Format: csv
            S3 Endpoint: https://s3.us-west-2.amazonaws.com
            S3 Bucket: s3a://aizen-public/trip_fare/trip_table.csv
            S3 Anon: check (true)                                                              
            Credential File:
            Credential Key:

The credentials are left blank. Click the `Get Columns` button and review the source column schema. 
<br>Click the `Save Configuration` button to configure the datasource.

In [None]:
configure datasource

***

# Configure Data Sinks

<html><img src="../../images/trip_fare_images/1_2.png"/></html>

In this step we will connect a data sink to the data source. This will define the Aizen table that stores data from the data source. This data sink is an Events Data Sink because the data source is event driven with the pickup_datetime as the event timestamp. 

Data sinks are connected to data sources via the `configure datasink` command. This command will prompt for various settings. The relevant information for this command is shown below. Enter this information in the prompts:
    
            DataSink: New                
            DataSink Name: trip_datasink
            DataSink Type: Events
            Data Source: trip_datasource
            Primary Key Columns (multi-select): pickup_latitude, pickup_longitude, dropoff_latitude, dropoff_longitude, pickup_zipcode, dropoff_zipcode                                                           
            Timestamp Column: pickup_datetime
            Min Aggregation Interval: 1h
            Backfill Start: 10/18/2022 12:00 AM
            Backfill End: 11/28/2022 12:00 AM

Click the `Save Configuration` button to configure the data sink.

In [None]:
configure datasink

***

# Create a Training Dataset

<html><img src="../../images/trip_fare_images/1_3.png"/></html>


In this step we will create a training dataset from the data sink. We will use the pickup_zipcode, dropoff_zipcode and passenger_count as input features to the ML model and add these features into training dataset. The fare_amount is the target or label for the ML model to train and will be added as a label feature. All four features are basis features drawn from the Events Data Sink.

## Building Datasets from Data Sinks

<html><img src="../../images/trip_fare_images/1_4.png"/></html>

<br>Basis features are sourced from a single data sink.

Datasets are configured via the `configure dataset` command. This command will prompt for various settings. The relevant information for this command is shown below. Enter this information in the prompts:
    
            Dataset: New                  Dataset Name: trip_dataset_1
            Feature: Create New
            Feature Type: Basis
            Data Sink: trip_datasink
            Feature: pickup_datetime
            Is Label: unchecked (false)   Materialize: checked (true)
           
Click the `Add Feature` button to add the pickup_datetime input feature. Continue to add all features with the following information in the prompts:

            Feature: Create New
            Feature Type: Basis
            Data Sink: trip_datasink
            Feature: pickup_zipcode
            Is Label: unchecked (false)   Materialize: checked (true)

Click the `Add Feature` button to add the pickup_zipcode input feature.           

            Feature: Create New
            Feature Type: Basis
            Data Sink: trip_datasink
            Feature: dropoff_zipcode
            Is Label: unchecked (false)   Materialize: checked (true)

Click the `Add Feature` button to add the dropoff_zipcode input feature.           

            Feature: Create New
            Feature Type: Basis
            Data Sink: trip_datasink
            Feature: passenger_count
            Is Label: unchecked (false)   Materialize: checked (true)
            
Click the `Add Feature` button to add the passenger_count input feature.

            Feature: Create New
            Feature Type: Basis
            Data Sink: trip_datasink
            Feature: fare_amount
            Is Label: checked (true)      Materialize: checked (true)

Click the `Add Feature` button to add the fare_amount output feature.
<br>Click the `Save Configuration` button followed by the `OK` button to configure the dataset.

In [None]:
configure dataset

### Create the dataset

Use the `start dataset` command to materialize the configured dataset into a training dataset table.The `status dataset` command will show the current status of dataset generation; "RUNNING", "COMPLETED" or "ERROR". The `list datasets` command will list the created datasets within a project. The `display dataset` command will display the first few rows of the training dataset.

**This command may take up to 10 minutes due to the size of the dataset.**

In [None]:
start dataset trip_dataset_1

In [None]:
status dataset trip_dataset_1

In [None]:
list datasets

In [None]:
display dataset trip_dataset_1

### Data analysis on the dataset

Use the `loader` command to load the dataset for visual exploration. Run the `loader` command, click the `Datasets` button and select the `trip_dataset_1` table.

Click the `Load Table` button to load the dataset.

In [None]:
loader

In [None]:
show stats

Run the `plot` command to display various charts.
<br>As an example, select DataFrame `trip_dataset_1`, select Chart Type `heatmap` and Dimension Columns pickup_zipcode, dropoff_zipcode, passenger_count, fare_amount. Click the `Update Plot` button.

In [None]:
plot

***

# Train an ML Model

<html><img src="../../images/trip_fare_images/1_5.png"/></html>

In this step we will train ML models using the training dataset that was created. We will use the pickup_zipcode, dropoff_zipcode and passenger_count as input features to the ML models. The fare_amount will be the target or label for the ML models. There are two types of Training Experiments in Aizen -

1. Machine Learning: Uses machine learning models to accomplish supervised and unsupervised learning tasks with models like linear regression, logistic regression, random forests, etc. 

2. Deep Learning: Uses neural network based models like cnn, lstm and rnn.

A Training Experiment must be configured to train a model. Experiments are configured via the `configure training` command. This command will prompt for various settings. We will configure two experiments, one for Machine Learning and the other for Deep Learning. The relevant information for this command is shown below. Enter this information in the prompts:
    
            Training Experiment: New                  Experiment Name: trip_ml_exp_1        Model Name: trip_fare_1_ml_model
            Select "Machine Learning"                 Select "Basic Settings"               ML Type: regression
            Dataset: trip_dataset_1                   Select Column: pickup_datetime        Click Remove Input Feature
           
Click the `Save Configuration` button to save the Machine Learning experiment configuration.
<br>Execute the `configure training` cell again to configure the second experiment. Enter this information in the prompts:

            Training Experiment: New                  Experiment Name: trip_dl_exp_1        Model Name: trip_fare_1_dl_model
            Select "Deep Learning"                    Select "Basic Settings"
            Dataset: trip_dataset_1                   Select Column: pickup_datetime        Click Remove Input Feature
            Epochs: 15                                Early Stop: 1                         Batch Size: 2048
            
Click the `Save Configuration` button to save the Deep Learning experiment configuration.
<br>Execute `listconfig trainings` to list the configured training experiments.

In [None]:
configure training

In [None]:
listconfig trainings

### Start ML model training

Use the `start training` command to run the training experiments. The `status training` command will show the status of the model training. 

### Machine Learning

When training a Machine Learning model to predict the 'fare_amount', auto-ML selects the best model after running through different machine learning algorithms for regression tasks.

In [None]:
start training trip_ml_exp_1,limit=2000

**Click the url shown in the output of status to open a *ML-Flow* session that displays the training metrics.**

#### Wait for ML model training to complete

Use the `status training` command to check the status of the model training. Wait for the ML model training status to complete. 

**Training could take 10 minutes or more to complete.**

In [None]:
status training trip_ml_exp_1

### Deep Learning

Training using Deep Learning models to predict 'fare_amount'. Auto-ML behind the scenes selects a cnn architecture to create the model.

In [None]:
start training trip_dl_exp_1

**Click the url shown in the output to open a *TensorBoard* session that displays the training progress and metrics.** After opening the *TensorBoard* url click on the reload button to the top right of the *TensorBoard* page.

**Training could take 10 minutes or more to complete.**

In [None]:
status training trip_dl_exp_1

#### Note:
 TensorBoard is only available for Deep Learning models
 
     list tensorboard "<model_name>,<run_id>"

In [None]:
list tensorboard trip_fare_1_dl_model,1

## Register a trained ML model
After the training is complete, the `status training` command will show COMPLETED status. The trained ML model must be registered before it can be used for predictions. The `list trained-models` command will list all the trained models within a project. The `register model` command will register a trained model. The `list registered-models` will list all registered models within a project.

##### To list all the ML models that have been trained

In [None]:
list trained-models trip_fare_1_ml_model

##### To list all the DL models that have been trained

In [None]:
list trained-models trip_fare_1_dl_model

##### Run this cell to register the machine learning model

In [None]:
register model trip_fare_1_ml_model,1,PRODUCTION

##### Run this cell to register the deep learning model

In [None]:
register model trip_fare_1_dl_model,1,PRODUCTION

#### To list all registered models

In [None]:
list registered-models

***

# Serve an ML Model

<html><img src="../../images/trip_fare_images/1_6.png"/></html>

In this step we will deploy a trained ML model to serve prediction requests. We will deploy the Machine Learning model. A prediction deployment must be configured to deploy a model. Deployments are configured via the `configure prediction` command. This command will prompt for various settings.

The relevant information for this command is shown below. Enter this information in the prompts:
    
            Prediction: New                  Prediction Name: trip_ml_deploy_1        Model Name: trip_fare_1_ml_model       Model Version: 1
            Source Type: http
           
Click the `Save Configuration` button to save the Machine Learning deployment.

In [None]:
configure prediction

### Deploy the model

Use the `start prediction` command to run the deployment. The `status prediction` command will show the status of the model serving. The url shown in the output is the endpoint to which REST prediction request may be sent via `curl` or some other means.

In [None]:
start prediction trip_ml_deploy_1

In [None]:
status prediction trip_ml_deploy_1

## Predict trip fare amounts

Use the `test prediction` command to send prediction requests to the deployed model. The command by default uses the last 10 rows from the training dataset and sends those rows in curl prediction requests to the deployed model. The predictions responses are collected and displayed.

Note: when you run the start prediction command, a prediction job starts running which deploys the model. You can use the URL in the status prediction to send curl requests to the deployed model. The `test prediction` command outputs an "Example Curl Request". Use this Curl request example to send data to the deployed model or integrate the curl request logic into applications which can send prediction requests and interpret prediction responses.

In [None]:
test prediction trip_ml_deploy_1

## Building Input Features for Predictions

<html><img src="../../images/trip_fare_images/1_7.png"/></html>

When an application sends a prediction request, the basis input features are present in the prediction request. The labels or output features are returned in the prediction response.

The cell below is a Markdown cell showing how to run a Curl Request to fetch predictions. Convert the cell into the Code state, then replace the prediction URL in the text below and execute the cell to get a prediction response.

!curl -X POST ">enter the prediction URL here<" -H "Content-Type: application/json" -d '[{"rest_request_id": "prediction_test-1", "pickup_datetime": "2022-11-12 07:15:00", "pickup_zipcode": "10010", "dropoff_zipcode": "10011", "passenger_count": 3}]'

### Stop the deployed model

Use the `stop prediction` command to stop ML model serving when you have completed the prediction requests. This step is optional, you may choose to leave the model deployed.

In [None]:
stop prediction trip_ml_deploy_1