***

# Taxi Trip Fare Prediction - Model 2

***

The goal of this example is to build on the Model 1 example and generate a better ML model. We will
- enhance the training dataset using contextual features
- train an ML model based on historical taxi trip fare data and contextual features
- serve the ML model to predict the trip fare for new trips

### Prepare your data

The trip table csv file was uploaded to MySQL and connected as a data source in the Model 1 example. There is no need to redo this step.

### Prepare your static contextual feature data

We will enhance the data by adding three contextual feature tables. 

- an hourly segment table that maps an hour to an hourly-segment. 
- a holiday weekend table that maps a date to a flag indicating whether that date was a holiday-or-weekend or neither.
- a geo area table that maps a zipcode to a type of geo area.

The idea is that the hourly-segment, the holiday-or-weekend flag and the type of pickup and dropoff geo areas have an influence on the trip fare amount. We can create a more accurate ML model with these additional features.

Each contextual feature table is a csv file containing the respective mapping. First we will download the csv files and peek at a few lines of data. The data in each file includes the datetime or date and the hourly_segment or holiday_or_weekend flag. We will download the csv using `wget` and print the first few lines using the `head` command.

In [None]:
!wget http://<wget server address>:8011/hour_of_day_context.csv

In [None]:
!wget http://<wget server address>:8011/holiday_weekend_context2.csv

In [None]:
!wget http://<wget server address>:8011/geo_area_context2.csv

In [None]:
!head -n 5 hour_of_day_context.csv

In [None]:
!head -n 5 holiday_weekend_context2.csv

In [None]:
!head -n 5 geo_area_context2.csv

### Upload your contextual feature data

We will use MySQL as the data source for the three contextual feature tables. We will upload the csv files to a MySQL server and connect that MySQL server to the Elevo platform. Use mysql-load-csv.py to upload a csv file to the MySQL server. The `-b` option specifies the IP address of the MySQL server. The `-u` and `-p` options specify the  MySQL username and password. The `-i` option specifies the input csv file name. The `-k` option specifies the MySQL table name. The `-n` option specifies the MySQL primary key column names. The `-g` option obtains the MySQL server credentials. The `-h` option displays help.

Note the `mysql source meta` from the upload output. It will be used later to connect MySQL to the Elevo platform.

In [None]:
!mysql-load-csv.py -b <mysql host> -u '<mysql user>' -p '<mysql password>' -i hour_of_day_context.csv -k hour_of_day_context -n hour_of_day

In [None]:
!mysql-load-csv.py -b <mysql host> -u '<mysql user>' -p '<mysql password>' -i holiday_weekend_context2.csv -k holiday_weekend_context -n calendar_day

In [None]:
!mysql-load-csv.py -b <mysql host> -u '<mysql user>' -p '<mysql password>' -i geo_area_context2.csv -k geo_area_context -n zipcode

In [None]:
!mysql-load-csv.py -b <mysql host> -u '<mysql user>' -p '<mysql password>' -g

***

**We will reuse the `trip_fare` project from Model 1 for this example.**

In [None]:
set project trip_fare

***

# Connect your Data Sources

<html><img src="2_1.png"/></html>

In the Model 1 example we have connected the MySQL data source to Elevo for the trip table. In this step we will connect the MySQL data source to Elevo for the three contextual feature tables. This will allow Elevo to read contextual features from the MySQL tables.

### Create a Foresight ML sources file

Data sources are connected to Elevo via a Foresight ML sources file. In the Model 1 example we have created a Foresight ML sources file to connect the MySQL server to the Elevo platform for the trip table. Create another Foresight ML sources file to add the three new contextual feature sources. Use the templates and code snippets available at the icons to the left. Refer to the  Elevo Foresight User Manual for help.
Alternatively you may use the Foresight ML sources file from this tutorial.

**Make sure you update the Foresight ML sources file with the correct MySQL server url address and user credentials obtained from the *"Upload your data"* step above.**
<br>Multiple sections need to be updated, one section per table. The relevant sections in the `trip_fare_data_sources_2.yml` file look like this:
    
            meta:
              source_type: mysql
              source_format: jdbc
              url: jdbc:mysql://<mysql host>:3306/tutorial_client_<xxxx_xxxxxx>       <<<
              user: <mysql user>                                                      <<<
              password: <mysql password>                                              <<<
              driver: com.mysql.jdbc.Driver

In [None]:
!cat ~/tutorial/examples/trip_fare_prediction_model_2/trip_fare_data_sources_2.yml

#### Add column schema to your data sources file

Foresight can automatically infer column schema from your data sources and update the ML sources file. Use the `add columns` command to automatically infer and update the ML sources file with the data source column schema. After this command completes, you must review the column schema for correctness and if necessary edit the ML sources file to fix column names or data types. Alternatively you may manually edit the ML sources file and add all the column names and data types to match your data source schema.

In [None]:
add columns trip_fare_data_sources_2.yml

In [None]:
!cat ~/tutorial/examples/trip_fare_prediction_model_2/trip_fare_data_sources_2.yml

If you are using the Foresight ML sources file from this tutorial, copy it to your project location using the `cp` command in the cell below.

In [None]:
!cp ~/tutorial/examples/trip_fare_prediction_model_2/trip_fare_data_sources_2.yml ~/projects/trip_fare/

***

# Create Feature Sets for contextual features

<html><img src="2_2.png"/></html>

In this step we will create three feature sets to generate and store the contextual feature tables in Elevo storage based on the three csv data sources. 

### Create Foresight ML job files to generate feature sets

The feature sets will be created using Foresight ML job files. The `using_elevo_options` section of the Foresight ML job file is where you specify the key entities for each feature set. Key entities indicate row uniqueness within a table, and they are used to lookup the contextual feature. Create Foresight ML job files using the templates and code snippets available at the icons to the left. Refer to the Elevo Foresight User Manual for help.


Alternatively you may view and copy the Foresight ML job files from this tutorial to your project location using the `cp` command in the cells below.

In [None]:
!cat ~/tutorial/examples/trip_fare_prediction_model_2/hour_of_day_context.ml

In [None]:
!cat ~/tutorial/examples/trip_fare_prediction_model_2/holiday_weekend_context.ml

In [None]:
!cat ~/tutorial/examples/trip_fare_prediction_model_2/geo_area_context.ml

In [None]:
!cp ~/tutorial/examples/trip_fare_prediction_model_2/hour_of_day_context.ml ~/projects/trip_fare/

In [None]:
!cp ~/tutorial/examples/trip_fare_prediction_model_2/holiday_weekend_context.ml ~/projects/trip_fare/

In [None]:
!cp ~/tutorial/examples/trip_fare_prediction_model_2/geo_area_context.ml ~/projects/trip_fare/

### Create feature sets

Use the `start featureset` command to execute the Foresight ML job file to create the feature set. This command will start a job that creates the feature set tables within Elevo, and fetches data into the Elevo tables from the data source. The job continues to run until all the data has been fetched. The `status featureset` command will show the status of the feature set.


In [None]:
start featureset hour_of_day_context

In [None]:
start featureset holiday_weekend_context

In [None]:
start featureset geo_area_context

In [None]:
status featureset hour_of_day_context

In [None]:
status featureset holiday_weekend_context

In [None]:
status featureset geo_area_context

***

# Create a Feature View to serve contextual features

<html><img src="2_3.png"/></html>

In this step we will create a feature view to serve four contextual features from the three feature sets that we created in internal Elevo storage. The feature view will output the following contextual features
- the hourly_segment for a given hour of day
- the holiday_or_weekend flag for a given date
- the pickup_geo_area for a given pickup zipcode
- the dropoff_geo_area for a given dropoff zipcode

### Create a Foresight ML job file to generate a feature view

The feature view will be created using a Foresight ML job file. The `using_elevo_options` section of the Foresight ML job file is where you specify the feature name and source for the feature. Create a Foresight ML job file using the templates and code snippets available at the icons to the left. Refer to the Elevo Foresight User Manual for help. Make sure to update the models section of your Foresight ML job sources file as well.


Alternatively you may view and copy the Foresight ML job file from this tutorial to your project location using the `cp` command in the cells below.

In [None]:
!cat ~/tutorial/examples/trip_fare_prediction_model_2/trip_feature_view_2.ml

In [None]:
!cp ~/tutorial/examples/trip_fare_prediction_model_2/trip_feature_view_2.ml ~/projects/trip_fare/

### Start serving contextual features

Use the `start featureview` command to execute the Foresight ML job file to start serving contextual features for the feature view. This command starts a job to serve the feature view. Use the `offline` option to serve features for training dataset creation and the `online` option to serve features for prediction. 

The `status featureview` command will show the status of the feature view. The *`feature_status`* element indicates the availability of feature data. A feature status of "OK" indicates that feature data is available.

In [None]:
start featureview trip_feature_view_2,offline

In [None]:
start featureview trip_feature_view_2,online

In [None]:
status featureview trip_feature_view_2,offline

In [None]:
status featureview trip_feature_view_2,online

### Explore feature sets and feature views

Explore the feature sets and feature views that you created using `Foresight Explorer`. The `Foresight Explorer` tool can be opened by clicking on the following icon in the Launcher page. 

<html><img src="2_7.png"/></html>

Navigate to the `Foresight Explorer` web page and open the `trip_fare` project. Explore the feature sets and feature views within that project.

***

# Create a Training Dataset

<html><img src="2_4.png"/></html>

In this step we will create a training dataset using the trip table data source and the contextual features. We will use the pickup_zipcode, dropoff_zipcode and passenger_count as input features to the ML model. We will use the ***contextual_feature_fetch*** UDF to fetch the the hourly_segment, the is_holiday_or_weekend flag, the pickup_geo_area and the dropoff_geo_area from the feature view and use those as additional inputs to the ML model. The fare_amount will be the target or label for the ML model to train. 

### Create a Foresight ML job file to generate a training dataset

The training dataset will be created using a SQL command. SQL commands can be executed via Foresight ML job files. Create a Foresight ML job file using the templates and code snippets available at the icons to the left. Refer to the Elevo Foresight User Manual for help.
Alternatively you may view and copy the Foresight ML job file from this tutorial to your project location using the `cp` command in the cells below.

In [None]:
!cat ~/tutorial/examples/trip_fare_prediction_model_2/trip_fare_train_dataset_2.ml

In [None]:
!cp ~/tutorial/examples/trip_fare_prediction_model_2/trip_fare_train_dataset_2.ml ~/projects/trip_fare/

### Create the dataset

Use the `create dataset` command to execute the Foresight ML job file to create the training dataset in Elevo. The `list datasets` command will list the created datasets within a project. The `display dataset` command will display the first few rows of the training dataset.

**This command may take up to 10 minutes due to the size of the dataset.**

In [None]:
create dataset trip_fare_train_dataset_2

In [None]:
list datasets

In [None]:
display dataset trip_fare_train_dataset_2

### Explore the dataset

Use the `explore dataset` command to visually explore the dataset using the Elevo Foresight data explorer. The `target_column` is the target or label for ML training. Click on the output url to visualize the dataset.

**This command may take a few minutes due to the size of the dataset.**

In [None]:
explore dataset trip_fare_train_dataset_2,datetime_column=pickup_datetime,target_column=fare_amount

***

# Train an ML Model

<html><img src="2_5.png"/></html>

In this step we will train an ML model using the training dataset that was created. We will use the pickup_zipcode, dropoff_zipcode, passenger_count, hourly_segment and is_holiday_or_weekend as input features to the ML model. The fare_amount will be the target or label for the ML model to train. 

### Create a Foresight ML job file for model training

ML model training is initiated via a Foresight ML job file which specifies the ML training parameters. Create a Foresight ML job file using the templates and code snippets available at the icons to the left. Refer to the Elevo Foresight User Manual for help.
Alternatively you may view and copy the Foresight ML job file from this tutorial to your project location using the `cp` command in the cells below.

In [None]:
!cat ~/tutorial/examples/trip_fare_prediction_model_2/trip_fare_model_train_2.ml

In [None]:
!cp ~/tutorial/examples/trip_fare_prediction_model_2/trip_fare_model_train_2.ml ~/projects/trip_fare/

### Start ML model training

Use the `start training` command to execute the Foresight ML job file to start the model training in Elevo. The `status training` command will show the status of the model training. 

**Click the url shown in the output to open a *TensorBoard* session that displays the training progress and metrics.** After opening the *TensorBoard* url click on the reload button to the top right of the *TensorBoard* page.

In [None]:
start training trip_fare_model_train_2

In [None]:
list tensorboard trip_fare_model_2,1

#### Wait for ML model training to complete

Use the `status training` command to check the status of the model training. Wait for the ML model training status to complete. 

**Training could take 10 minutes or more to complete.**

In [None]:
status training trip_fare_model_train_2

## Register a trained ML model

After the training is complete, the `status training` command will show COMPLETED status. The trained ML model must be registered before it can be used for predictions. The `list trained-models` command will list all the trained models within a project. The `register model` command will register a trained model. The `list registered-models` will list all registered models within a project.

In [None]:
list trained-models trip_fare_model_2

In [None]:
register model trip_fare_model_2,1,PRODUCTION

In [None]:
list registered-models

***

# Serve an ML Model

<html><img src="2_6.png"/></html>

In this step we will deploy the trained ML model to serve prediction requests. 

### Create a Foresight ML job file for model serving

ML models are deployed via a Foresight ML job file which specifies the ML serving options. Create a Foresight ML job file using the templates and code snippets available at the icons to the left. Refer to the Elevo Foresight User Manual for help.
Make sure to create another prediction Foresight ML sources file to match your ML job file. You will need to add two REST sources, one for the prediction REST request and one for the prediction REST response. You will need to add a prediction log table definition.

Alternatively you may view and copy the Foresight ML job file and ML sources file from this tutorial to your project location using the `cp` command in the cells below.

In [None]:
!cat ~/tutorial/examples/trip_fare_prediction_model_2/trip_fare_model_serve_2.ml

In [None]:
!cat ~/tutorial/examples/trip_fare_prediction_model_2/trip_fare_prediction_sources_2.yml

In [None]:
!cp ~/tutorial/examples/trip_fare_prediction_model_2/trip_fare_model_serve_2.ml ~/projects/trip_fare/

In [None]:
!cp ~/tutorial/examples/trip_fare_prediction_model_2/trip_fare_prediction_sources_2.yml ~/projects/trip_fare/

### Deploy the model

Use the `start prediction` command to execute the Foresight ML job file to deploy a model in Elevo. The `status prediction` command will show the status of the model serving. The url shown in the output is the endpoint to which REST prediction request may be sent via `curl` or some other means.

In [None]:
start prediction trip_fare_model_serve_2

In [None]:
status prediction trip_fare_model_serve_2

## Predict trip fare amounts

Use the `curl` command to send prediction requests to the deployed model via the serving url shown above. Change the http url in the two cells below to match the url shown above and execute the `curl` commands.

For predictions, get the current datetime by executing the cell below and use that datetime as the pickup_datetime value in the prediction curl request

In [None]:
!date -u +'"pickup_datetime":"%Y-%m-%d %H:%M:%S", "hour_of_day":"%H", "calendar_day":"%Y-%m-%d"'

In [None]:
!curl -X GET http://<use url info from above status prediction cmd> -H "Content-Type: application/json" -d \
'[{"pickup_datetime": "2022-10-27 08:39:00", "hour_of_day": 8, "calendar_day": "2022-10-27", "pickup_latitude": "40.7514", "pickup_longitude": "-73.994", "dropoff_latitude": "40.7599", "dropoff_longitude": "-73.9795", "pickup_zipcode": "10001", "dropoff_zipcode": "10111", "passenger_count": 2}]'

In [None]:
!curl -X GET http://<use url info from above status prediction cmd> -H "Content-Type: application/json" -d \
'[{"pickup_datetime": "2022-10-27 18:57:00", "hour_of_day": 18, "calendar_day": "2022-10-27", "pickup_latitude": "40.754", "pickup_longitude": "-73.9721", "dropoff_latitude": "40.7296", "dropoff_longitude": "-73.987", "pickup_zipcode": "10017", "dropoff_zipcode": "10003", "passenger_count": 1}]'

### Stop the deployed model

Use the `stop prediction` command to stop ML model serving when you have completed the prediction requests. This step is optional, you may choose to leave the model deployed.

In [None]:
stop prediction trip_fare_model_serve_2