***

# Taxi Trip Fare Prediction - Model 2

***

The goal of this example is to build on the Model 1 example and generate a better ML model. We will
- enhance the training dataset using contextual features
- train an ML model based on historical taxi trip fare data and contextual features
- serve the ML model to predict the trip fare for new trips

### Prepare your data

In the Model 1 example we configured a data source from a trip table csv file in an S3 bucket. In this example we will use one more csv file, geo_area_context to enrich the training dataset with more features for training.

Let us look at the first few lines of the csv's which we will be going to use for building Model 2.
   
##### geo_area_context.csv
    zipcode,geo_area
    10023,Commercial
    10021,Residential
    10002,Suburbs
    11201,Commercial


We will enhance the training dataset by using this new data source. The data source is a csv file in S3 bucket containing the respective mapping.

- a geo area table that maps a zipcode to a type of geo area.

The idea is that the type of pickup and dropoff geo areas have an influence on the trip fare amount. We can create a more accurate ML model with these additional features.

***

**We will reuse the `trip_fare` project from Model 1 for this example.**

In [None]:
set project trip_fare

***

# Configure Data Sources

<html><img src="../../images/trip_fare_images/1_1.png"/></html>

In the Model 1 example we have configured the trip table csv file from an S3 bucket as a data source to Aizen. Similarly in this step we will configure one more data source corresponding to the new csv file in the S3 bucket. Data sources are connected to Aizen via the `configure datasource` command. This command will prompt for various settings.

The relevant information for this command is shown below. Enter this information in the prompts:
    
            Source: New                Source Name: geo_area_datasource
            Source Description: geo area data
            Source Type: aws
            Source Format: csv
            S3 Endpoint: https://s3.us-west-2.amazonaws.com
            S3 Bucket: s3a://aizen-public/trip_fare/geo_area_context.csv
            S3 Anon: check (true)                                                              
            Credential File:
            Credential Key:

Click the `Get Columns` button and review the source column schema. 
<br>Click the `Save Configuration` button to configure the geo_area datasource.


In [None]:
configure datasource

***

# Configure Data Sinks

<html><img src="../../images/trip_fare_images/1_2.png"/></html>

In the Model 1 example we have configured an Events Data Sink against the trip table data source. In this step we will connect another data sink to the data source that we just configured. This will define the Aizen table that stores data from the data source. The data sink is a Static Data Sink because the data source is not event driven.

Data sinks are connected to data sources via the `configure datasink` command. This command will prompt for various settings.The relevant information for this command is shown below. Enter this information in the prompts:
    
            DataSink: New                
            DataSink Name: geo_area_datasink
            DataSink Type: Static
            Data Source: geo_area_datasource
            Primary Key Columns: zipcode                                                           

Click the `Save Configuration` button to configure the geo_area static data sink.

In [None]:
configure datasink

***

# Create a Training Dataset

<html><img src="../../images/trip_fare_images/1_3.png"/></html>

In this step we will create a training dataset from the data sinks. As in the case of Model 1, we will use the pickup_zipcode, dropoff_zipcode and passenger_count as input features to the ML model and add these features into training dataset. The fare_amount is the target or label for the ML model to train and will be added as a label feature. All four features are basis features drawn from the Events Data Sink.
<br>Additionally, we will use the pickup_geo_area and dropoff_geo_area as input features to the ML model. These two features are contextual features drawn from the Static Data Sink.

## Building Datasets from Data Sinks

<html><img src="../../images/trip_fare_images/2_1_free.png"/></html>

<br>Basis features are sourced from a single data sink. Contextual features are retrieved from data sinks using join keys from the basis features.

Datasets are configured via the `configure dataset` command. This command will prompt for various settings. The relevant information for this command is shown below. Enter this information in the prompts:
    
            Dataset: New                  Dataset Name: trip_dataset_2
            Feature: Create New
            Feature Type: Basis
            Data Sink: trip_datasink
            Feature: pickup_datetime
            Is Label: unchecked (false)   Materialize: checked (true)
           
Click the `Add Feature` button to add the pickup_datetime input feature. Continue to add all features with the following information in the prompts:

            Feature: Create New
            Feature Type: Basis
            Data Sink: trip_datasink
            Feature: pickup_zipcode
            Is Label: unchecked (false)   Materialize: checked (true)

Click the `Add Feature` button to add the pickup_zipcode input feature.           

            Feature: Create New
            Feature Type: Basis
            Data Sink: trip_datasink
            Feature: dropoff_zipcode
            Is Label: unchecked (false)   Materialize: checked (true)

Click the `Add Feature` button to add the dropoff_zipcode input feature.           

            Feature: Create New
            Feature Type: Basis
            Data Sink: trip_datasink
            Feature: passenger_count
            Is Label: unchecked (false)   Materialize: checked (true)
            
Click the `Add Feature` button to add the passenger_count input feature.

            Feature: Create New
            Feature Type: Basis
            Data Sink: trip_datasink
            Feature: fare_amount
            Is Label: checked (true)      Materialize: checked (true)

Click the `Add Feature` button to add the fare_amount output feature.

            Feature: Create New
            Feature Type: Contextual
            Name: pickup_geo_area
            Is Label: unchecked (false)   Materialize: checked (true)
            Expression: unchecked (false)
            Data Sink: geo_area_datasink
            Value: geo_area
    Join Key Map: 
            zipcode: pickup_zipcode
            Fillvalue:
            
Click the `Add Feature` button to add the pickup_geo_area input feature.

            Feature: Create New
            Feature Type: Contextual
            Name: dropoff_geo_area
            Is Label: unchecked (false)   Materialize: checked (true)
            Expression: unchecked (false)
            Data Sink: geo_area_datasink
            Value: geo_area
    Join Key Map: 
            zipcode: dropoff_zipcode
            Fillvalue:
            
Click the `Add Feature` button to add the dropoff_geo_area input feature.
<br>Click the `Save Configuration` button followed by the `OK` button to configure the dataset.

In [None]:
configure dataset

### Create the dataset

Use the `start dataset` command to materialize the configured dataset into a training dataset table.The `status dataset` command will show the current status of dataset generation; "RUNNING", "COMPLETED" or "ERROR". The `list datasets` command will list the created datasets within a project. The `display dataset` command will display the first few rows of the training dataset.

**This command may take up to 10 minutes due to the size of the dataset.**

In [None]:
start dataset trip_dataset_2

In [None]:
status dataset trip_dataset_2

In [None]:
list datasets

In [None]:
display dataset trip_dataset_2

***

# Train an ML Model

<html><img src="../../images/trip_fare_images/1_5.png"/></html>

In this step we will train an ML model using the training dataset that was created. We will use the pickup_zipcode, dropoff_zipcode, passenger_count, pickup_geo_area and dropoff_geo_area as input features to the ML model. The fare_amount will be the target or label for the ML model. 

A Training Experiment must be configured to train a model. Experiments are configured via the `configure training` command. This command will prompt for various settings. We will configure a Machine Learning experiment for Model 2. The relevant information for this command is shown below. Enter this information in the prompts:
    
            Training Experiment: New                  Experiment Name: trip_ml_exp_2        Model Name: trip_fare_2_ml_model
            Select "Machine Learning"                 Select "Basic Settings"               ML Type: regression
            Dataset: trip_dataset_2                   Select Column: pickup_datetime        Click Remove Input Feature
           
Click the `Save Configuration` button to save the Machine Learning experiment configuration.

In [None]:
configure training

### Start ML model training

Use the `start training` command to run the training experiment. The `status training` command will show the status of the model training. 

### Machine Learning

When training a Machine Learning model to predict the 'fare_amount', auto-ML selects the best model after running through different machine learning algorithms for regression tasks.

In [None]:
start training trip_ml_exp_2,limit=2000

**Click the url shown in the output of status to open a *ML-Flow* session that displays the training metrics.**

#### Wait for ML model training to complete

Use the `status training` command to check the status of the model training. Wait for the ML model training status to complete. 

**Training could take 10 minutes or more to complete.**

In [None]:
status training trip_ml_exp_2

## Register a trained ML model

After the training is complete, the `status training` command will show COMPLETED status. The trained ML model must be registered before it can be used for predictions. The `list trained-models` command will list all the trained models within a project. The `register model` command will register a trained model. The `list registered-models` will list all registered models within a project.

##### To list all the ML models that have been trained

In [None]:
list trained-models trip_fare_2_ml_model

##### Run this cell to register the machine learning model

In [None]:
register model trip_fare_2_ml_model,1,PRODUCTION

#### To list all registered models

In [None]:
list registered-models

***

# Serve an ML Model

<html><img src="../../images/trip_fare_images/1_6.png"/></html>

In this step we will deploy a trained ML model to serve prediction requests. We will deploy the Machine Learning model. A prediction deployment must be configured to deploy a model. Deployments are configured via the `configure prediction` command. This command will prompt for various settings.

The relevant information for this command is shown below. Enter this information in the prompts:
    
            Prediction: New                  Prediction Name: trip_ml_deploy_2        Model Name: trip_fare_2_ml_model       Model Version: 1
            Source Type: http
           
Click the `Save Configuration` button to save the Machine Learning deployment.

In [None]:
configure prediction

### Deploy the model

Use the `start prediction` command to run the deployment. The `status prediction` command will show the status of the model serving. The url shown in the output is the endpoint to which REST prediction request may be sent via `curl` or some other means.

In [None]:
start prediction trip_ml_deploy_2

In [None]:
status prediction trip_ml_deploy_2

## Predict trip fare amounts

Use the `test prediction` command to send prediction requests to the deployed model. The command by default uses the last 10 rows from the training dataset and sends those rows in curl prediction requests to the deployed model. The predictions responses are collected and displayed.

Note: when you run the start prediction command, a prediction job starts running which deploys the model. You can use the URL in the status prediction to send curl requests to the deployed model. The `test prediction` command outputs an "Example Curl Request". Use this Curl request example to send data to the deployed model or integrate the curl request logic into applications which can send prediction requests and interpret prediction responses.

In [None]:
test prediction trip_ml_deploy_2

## Building Input Features for Predictions

<html><img src="../../images/trip_fare_images/2_2_free.png"/></html>

When an application sends a prediction request, the basis input features are present in the prediction request. Any contextual features are fetched from data sinks and appended to the basis features before calling the model for a prediciton. The labels or output features are returned in the prediction response.

The cell below is a Markdown cell showing how to run a Curl Request to fetch predictions. Convert the cell into the Code state, then replace the prediction URL in the text below and execute the cell to get a prediction response.

!curl -X POST ">enter the prediction URL here<" -H "Content-Type: application/json" -d '[{"rest_request_id": "prediction_test-1", "pickup_datetime": "2022-11-12 11:29:05", "pickup_zipcode": "10069", "dropoff_zipcode": "10107", "passenger_count": 3}]'

### Stop the deployed model

Use the `stop prediction` command to stop ML model serving when you have completed the prediction requests. This step is optional, you may choose to leave the model deployed.

In [None]:
stop prediction trip_ml_deploy_2