<div class="alert alert-block alert-info">

<h1>Lab Center – Hands-on Lab</h1>

<h2>Session <font color=red>7461</font></h2>
<h2>Session Title  <font color=red>Add Intelligence to Business Automation with IBM Business Automation Insights</font></h2>

<b>Christophe Jolif</b>, IBM Digital Business Automation, Architecture & Development, christophe.jolif@fr.ibm.com<br>
<b>Sebastian Carbajales</b>, IBM Digital Business Workflow, Architecture & Development, sebastia@ca.ibm.com

</div>

You have completed <b>Section 1</b> in the lab by inspecting the as-is process.  You are now ready to work with the historical data generated by the process and create a machine-learning model that will provide a recommendation to approve or reject a loan request.

<b>You will use this Python Jupyter notebook to accomplish this goal.</b>

<h2>Jupyter Notebook Introduction</h2>

A Jupyter notebook is a web-based environment for interactive computing. You can run small pieces of code that process your data, and you can immediately view the results of your computation. 

Notebooks include all of the building blocks you need to work with data:
<ul>
<li>Loading of the data
<li>The code computations that process the data
<li>Visualizations of the results
<li>Text and rich media to enhance understanding
</ul>

Code computations can build upon each other to quickly unlock key insights from your data. Notebooks record how you worked with data, so you can understand exactly what was done, reproduce computations reliably, and share your findings with others.

<h3>The cells in a Jupyter notebook</h3>

A Jupyter notebook consists of a sequence of cells. The flow of a notebook is sequential. You enter code into an input cell, and when you run the cell, the notebook executes the code and prints the output of the computation to an output cell.

You can change the code in an input cell and re-run the cell as often as you like. In this way, the notebook follows a read-evaluate-print loop paradigm.

<h3>Useful Shortcuts</h3>

Use the following shortcuts to execute the code in a cell:
<div class="alert alert-block alert-warning">
    <ul>
        <li><b>Run cell: </b>CTRL + ENTER
        <li><b>Run cell, select below: </b>SHIFT + ENTER
    </ul>
    <p>For a full list of shortcuts: <i><b>Help</b></i> > <i><b>Keyboard Shortcuts</b></i>
</div>

<h3>Do Want to Discover More?</h3>

Take a took of the Jupyter notebook interface.  Launch it from <i><b>Help</b> > <b>User Interface Tour<b></i>.

<div class="alert alert-block alert-info"></div>

***
<h2> <font color=blue>Train and Deploy a Machine Learning Model to provide Loan Approval Recommendation</font>  </h2>

You will now execute the code in this notebook to train and deploy a machine learning model.  You will integrate this model, in __Section 3__ of the lab, with the **Car Loan Approval** process to provide a recommendation to approve or reject a loan request.

You will perform the following steps: 

1. [Load the time series data for the As-Is process](#step1)
1. [Explore the format of the data and interpret it](#step2)
1. [Create an Apache® Spark machine learning model](#step3)
1. [Store the model in Watson ML](#step4)
1. [Deploy a model](#step5)
1. [Test the deployed model](#step6)

<div class="alert alert-block alert-success">
The code in this notebook is ready to execute, but you are encouraged to experiment with it by changing it and re-executing cells to see the effect of your change.<br>
    
Should you need to undo your changes you can revert to a previous checkpoint using the _**File** > **Revert to checkpoint**_ menu.
</div>

The following cell contains all the dependencies required to run this notebook.  The code can be uncommented and executed for a new environment.

In [None]:
## Install PySpark
# !rm -rf $PIP_BUILD/pyspark
# !pip install --upgrade pyspark==2.1.3

## Install visualization packages
# !pip install --upgrade matplotlib
# !pip install --upgrade seaborna

## Inspall Numpy
# !pip install numpy

## Insatll the Watson Machine learning API Package
# !rm -rf $PIP_BUILD/watson-machine-learning-client
# !pip install --upgrade watson-machine-learning-client==1.0.260

<a id='step1'></a>

*** 

## Step 1: Load the time series data for the As-Is process

### The format of the IBM Business Automation Insights data

Events emitted while a process executes are stored in **IBM Business Automation Insights**. Several event types are supported but in this scenario you only need the events that are recorded when a tracking point executes.  These are stored as **bpm-timeseries** for tracking data.  Every time a process executes a tracking point, a record is added to HDFS in the form of JSON data.

In this scenario, the timeseries data is partitioned by the following elements:
- The identifier and version number of the Workflow business process application
- The tracking group identifier 

Thus, HDFS file names start with the following path:

> _**[hdfs root]**/ibm-bai/bpmn-timeseries/**[processAppId]**/**[processAppVersionId]**/tracking/**[trackingGroupId]**_


Remember, the tracking group name is **Loan_Approval**. To find the data, you query the various IDs from the Workflow system.

> *Refer to https://www.ibm.com/support/knowledgecenter/en/SSYHZ8_18.0.x/com.ibm.dba.bai/topics/ref_bai_data_paths.html for more details on HDFS data paths.* 


### Finding the application ID and version, and the tracking group ID

You will use the **IBM Business Automation Workflow** REST API to retrieve the application and tracking group information.  You will then use these values to build the HDFS path described in the previous section.

> *You can refer to https://www.ibm.com/support/knowledgecenter/en/SSYHZ8_18.0.x/com.ibm.dba.bai/topics/tsk_bai_retrieve_bpmn_id.html for more information on how to retrieve BPMN identifiers.*

The Python code below sets up the REST API URL. 

In [None]:
import urllib3, requests, json
urllib3.disable_warnings()

bpmusername='deadmin'
bpmpassword='Think4me'
bpmrestapiurl = 'https://ibmwin16.ibm.demo:9443/rest/bpm/wle/v1'

headers = urllib3.util.make_headers(basic_auth='{username}:{password}'.format(username=bpmusername, password=bpmpassword, verify=False))


You now retrieve the _**process application ID and version number**_ by using the **processApps** REST API. The code below searches for the **'Lab 7461 - Car Loan Approval'** application and assumes that only one version or snapshot is installed.

In [None]:
url = bpmrestapiurl + '/processApps'
response = requests.get(url, headers=headers, verify=False)

[processApp] = [x for x in json.loads(response.text).get('data').get('processAppsList') if x.get('name') == 'Lab 7461 - Car Loan Approval']

processAppId = processApp.get('ID')

# Note that the first 5 characters of the process app ID below are removed.  All BPM artifact IDs are prefixed
# with the artifact type.  In this case, '2066.' indicates this is a process app ID.
# The REST API returns the full process application id, including its prefix. 

print("Process application ID: " + processAppId[5:])

# Get the first snapshot - assume only one - this is the app version ID
snapshot = processApp.get('installedSnapshots')[0]
processAppVersionId = snapshot.get('ID')
print("Process application version ID: " + processAppVersionId)

Next you retrieve the _**tracking group ID**_ using the **'assets'** REST API.  You specify the process app ID, just computed, and asset type to filter the results to tracking groups defined within the **'Lab 7461 - Car Loan Approval'** application.  You then retrieve the **'Loan_Approval'** tracking group from the results.


In [None]:
url = bpmrestapiurl + '/assets'

response = requests.get(url, headers=headers, verify=False, params={'processAppId': processAppId, 'filter': 'type=TrackingGroup' })

[trackingGroupId] = [x.get('poId') for x in json.loads(response.text).get('data').get('TrackingGroup') if x.get('name') == 'Loan_Approval']


# Note that the first 3 characters of the tracking group ID below are removed. As in the process app case
# this is the prefix to indicate this is a tracking group ID.

print('Tracking group ID : ' + trackingGroupId[3:])


All required information to build the HDFS path has been obtained.  You now continue to query the data.


### Using Spark SQL to read IBM Business Automation Insights data

**IBM Business Automation Insights** stores data in HDFS. As described above, the events coming from the Workflow instance are stored in JSON files. 

The code below is already configured with the target HDFS URL for the lab environment.

In [None]:
from pyspark.sql import SparkSession

hdfs_root = 'hdfs://hdfs1.ibm.edu/think2019'

spark = SparkSession.builder.getOrCreate()
spark.conf.set("dfs.client.use.datanode.hostname", "true")

# Get the timeseries Dataset by reading the JSON data from HDFS
timeseries = spark.read.json(hdfs_root + "/ibm-bai/bpmn-timeseries/" + processAppId[5:] + '/' + processAppVersionId + '/tracking/' + trackingGroupId[3:] +  '/*/*')

# If BAI were not available but you have the data in a loal file you can load it this way instead.
# timeseries = spark.read.json("sample_loan_approval.json") 

Note that the various ids for the path are specified in the JSON path. This HDFS path could also use HDFS wildcards. Here, the * character replaces any directory or file name in the path.

The data is loaded, let's take a quick look.  The code below will show a sample of the data, the schema and the number of records available. 

In [None]:
# Displays the top 20 rows of Dataset in a tabular form.
timeseries.show()

# Print the schema in a tree format
timeseries.printSchema()

# Finally, print the total number of events
print ('The data containts ' + str(timeseries.count()) + ' events')

For this lab, we're interested in the data that was tracked by the process.  Looking at the schema, this data is contained within the **'trackedFields'** attribute of each even.

The code below creates a temporary view on the data so that we can select just the tracked fields from the events.

In [None]:
# Creates a local temporary view called 'timeseries'
timeseries.createOrReplaceTempView("timeseries")

# Select all tracked fields 
businessdata = spark.sql("SELECT trackedFields.* from timeseries")

# Displays the top 20 rows of Dataset in a tabular form.
businessdata.show()

# Print the schema in a tree format
businessdata.printSchema()

Let's clean up the column names by removing the type suffix from each column.

In [None]:
businessdata = businessdata.withColumnRenamed("approved.string", "approved")
businessdata = businessdata.withColumnRenamed("creditScore.integer", "creditScore")
businessdata = businessdata.withColumnRenamed("requestedAmount.integer", "requestedAmount")
businessdata = businessdata.withColumnRenamed("approvedAmount.integer", "approvedAmount")
businessdata = businessdata.withColumnRenamed("vehicleMake.string", "vehicleMake")
businessdata = businessdata.withColumnRenamed("vehicleModel.string", "vehicleModel")
businessdata = businessdata.withColumnRenamed("vehicleType.string", "vehicleType")
businessdata = businessdata.withColumnRenamed("vehicleYear.integer", "vehicleYear")
businessdata.printSchema()


<a id='step2'></a>

***
## Step 2: Explore the format of the data and interpret it

In this step you will learn a few techniques to understand the data that you are working with and determine what characteristics may be relevant to your prediction. In this exercise you want to be able to predict whether to approve or reject a loan request.  So you will look at the relationships between the **'approved'** field and the rest of them to determine which ones may be a good predictor.

### Data manipulation libraries

In the previous section you used Spark to load the data.  You will use the result to also train the Spark model.  However, in this section we convert the Spark dataframe to Pandas.  This is another table manipulation library but it plays well with visualization libraries. Spark, on the other hand, does not.  This lab uses the Seaborn visualization library.

> _You can find more information on Pandas here http://pandas.pydata.org/pandas-docs/stable/._

The Python code below creates a Pandas DataFrame from our Spark data.

In [None]:
import pandas as pd

summary_pd = businessdata.toPandas()

Now we can take a quick look at the DataFrame.

In [None]:
# This prints just the columns in the data frame
print(summary_pd.columns)

# Use the head() function to preview the first n rows in the data frame.  If you don't pass
# a value to the function, the default is 5.
summary_pd.head()

### Use visualization to analyze individual feature patterns

First, we import the visualization packages "Matplotlib" and "Seaborn".  The last line, "%matplotlib inline", is required to plot in a Jupyter notebook.

> *Refer to https://seaborn.pydata.org/ for more information on the Seaborn library.*

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline 

To visualize individual fields it is important to understand the type of data you are dealing with to help you find the right visualization method.  the **'dtypes'** attribute returns the types in the DataFrame.

In [None]:
summary_pd.dtypes

#### Using  Box Plots to see relationship between categorical and numerical variables

In this lab we want to predict the **'approved'** field, so we want to understand the relationship between this field and the others to determine which will influence the value of **'approved'**.  Our target field is a categorical variable because it can only contain a value of true or false (1 or 0), as opposed to say, **'creditScore'** that can contain any value within a range.  **'creditScore'** is a numerical variable.

A good way to visualize categorical variables is by using boxplots. Let's first examine the relationship between the **'approved'** and  **'vehicleYear'** fields.  The boxplot below shows this relationship.

In [None]:
sns.boxplot(x="approved", y="vehicleYear", data=summary_pd)

We see that the distributions of the vehicle's model year between the approved and rejected loan request categories have no significant overlap.  The distributions are distinct enough that **'vehicleYear'** is potentially a good predictor of **'approved'**.  

On the other hand, if the overlap between distributions is significant then the that particular field would not be a good predictor of **'approved'**.  Change the value of __y__ in the code above to plot the distributions for _**'creditScore'**_ and _**'requestedAmount'**_ to determine whether those two are good predictors or not.

#### Using Categorical  Plots to see the breakdown in the distributions

Categorical plots can be used to break down the distributions in a box plot into additional categorical variables.  The code below plots the approved/rejected distributions, for each vehicle make, against the vehicle model year.

In [None]:
sns.catplot(x="vehicleMake", y="vehicleYear", hue="approved", data=summary_pd, kind="box")

We can see from the plot that the distribution for each make is different.  This would imply that vehicle make itself also has influence on the approval recommendation.

Try changing the __x__ and **y** inputs to the plot to see the relationships between other fields.  Use categorical fields for x (vehicleMake, vehicleType) and use numerical fields for y (vehicleYear, requestedAmount, creditScore).  

### Data Characteristics

We can take a look at the statistics of your data by using the __DataFrame.describe()__ function.  It will compute basic statistics for all variables.  If invoked with no arguments, it will analyze only continuous variables.

In [None]:
summary_pd.describe()

You can display statistics for categorical variables by invoking it as follows.

In [None]:
summary_pd.describe(include=['object'])

It may also be interesting to look at the value counts within each category.  You can compute this as follows.

In [None]:
summary_pd['approved'].value_counts()

This is useful in determining if we have enough entries for a given category.  For example, assume we had very few entries for SUVs, then vehicle type would not be a good predictor as it would skew the results.  As such, we may not be able to draw any conclusion about the vehicle type.

Modify the code above to see the counts for the other categorical variables: **_vehicleType, vehicleMake, vehicleModel_**.

You can also visualize the data above against the **'approved'** variable by showing the counts of observations in each categorical bin using bars.  The code below plots the **'vehicleMake'** counts for approved and rejected loans.

In [None]:
sns.countplot(x="vehicleMake", hue="approved", data=summary_pd)

Try changing **x** to the other categorical variables: __*vehicleType and vehicleModel*__.

### What have we observed?

The techniques above gave us a picture of the data we are working with.  Using the box plots reveal that __vehicleYear__, __creditScore__ and __requestedAmount__, all numerical variables, can potentially be good predictors of approved.  The distribution of records between approved and rejected are different enough.

Using categorical plots to look at further breakdown of the categories in these groups also reveal that the distributions are different enough that the categorical variables __vehicleMake__, __vehicleModel__ and __vehicleType__, are also potentially good predictors.

Using the data characteristic analysis we can also confirm that there are enough samples in the data.  The value ranges in the numerical variables are wide, and the value counts on the categorical variables show that all possible categories are well represented. 

<a id='step3'></a>

***
## Step 3: Create an Apache® Spark machine-learning model

IBM Watson Machine learning supports a growing number of IBM or open-source machine-learning and deep-learning packages. This example uses Spark ML and, in particular, the Random Forest Classifier algorithm. In this section you will prepare the data, create an Apache® Spark machine-learning pipeline, and train the model.

The first step is to import the libraries required.

In [None]:
from pyspark.ml.feature import OneHotEncoder, StringIndexer, IndexToString, VectorAssembler
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml import Pipeline, Model

### Adaptation of data

In this section you will combine multiple complex algorithms into a single pipeline. These algorithms will be applied to the training data as well as the data supplied to the trained model to obtains the prediction, resulting in a simpler code when invoking the model.

Our pipeline will include the following stages:
1. Indexers
1. Encoder
1. Assembler
1. Label converter
1. Random Forest ML model

> *More information on this and the Spark ML library can be found here: https://spark.apache.org/docs/2.1.0/ml-features.html*

#### Indexers
First set up the indexers whose job is to encode a string column of labels to a column of label indices. We'll use it to encode our categorical column into a numerical value.

We use the **StringIndexer** which is a feature transformer.  We use it to accomplish two things:
- Transforms the **'approved'** column, which is a column of type 'string' containing only 'true' or 'false' values, into a numeric column, **'label'**, with '0' and '1' values so that the classifier can understand it.
- Transform the other categorical columns into a new set of index columns containing label indices.  The indices are in the the range of 0 to the number of labels for that category.

In [None]:
# Instanciate the indexer for the approved column
approvalIndexer = StringIndexer(inputCol='approved', outputCol="label").fit(businessdata)

# Instanciate the rest of the indexers for the vehicle make, model and type columns.
indexers = [StringIndexer(inputCol=column, outputCol=column+"_index").fit(businessdata) for column in list(set(["vehicleMake", "vehicleModel", "vehicleType"])) ]

# Combine all indexers in a single list.
indexers.append(approvalIndexer)

#### Encoder

We use the **OneHotEncoder** which is another feature transformer.   It maps a column of label indices to a column of binary vectors, with at most a single one-value. Here we use it to map the index columns we created in the previous stage for a vehicle's make, model and type.

This encoding allows algorithms which expect continuous features, such as Logistic Regression, to use categorical features such as _vehicleMake, vehicleModel and vehicleType._

In [None]:
encoders = [OneHotEncoder(inputCol=column+"_index", outputCol=column+"_encoded") for column in list(set(["vehicleMake", "vehicleModel", "vehicleType"])) ]

#### Assembler
Next we set up a **VectorAssembler** which is a transformer that combines a given list of columns into a single vector column.  It is useful for combining raw features and features generated by different feature transformers into a single feature vector, in order to train ML models like logistic regression and decision trees. 

We will train a random forest ML model so we will combine the features that we want to use in predicting our approval.

We first define the set of features that we will use to predict the the **approved** field. We use all fields except for **approvedAmount** since the __Car Loan Approval__ process will not have that data when requesting a recommendation.  However, based on the analysis in the previous section, it makes sense to include all other fields.

In [None]:
# For categorical columns we use the column name produced by the onehotencoder: <colName>_encoded
features = ["creditScore", "requestedAmount","vehicleMake_encoded","vehicleModel_encoded","vehicleType_encoded", "vehicleYear"]

assembler = VectorAssembler(inputCols=features, outputCol="features")

#### Label converter
Finally, set up the **IndexToString** transformer which does the opposite of the StringIndexer.  It maps a column of label indices back to a column containing the original labels as strings.

We use it to get the original label for our predicted value.  In other words, map '1' or '0' to 'true' or 'false'.

In [None]:
labelConverter = IndexToString(inputCol="prediction", outputCol="predictedLabel", labels=approvalIndexer.labels)

###  Creating the model
The model is built from the RandomForestClassifier algorithm.  We chose to use Random Forest because it is flexible and easy to use.  It produces great results even without hyperparameter tunning.  These are parameters that get set before the training begings and are used to optimize the model produced.

In [None]:
rf = RandomForestClassifier(labelCol="label", featuresCol="features")

In the cell below we split the data into training data and test data.  The prediction model is then trained and tested, and finally the accuracy of the model is displayed.

In [None]:
# Select the data
businessdata = businessdata[["creditScore", "requestedAmount","vehicleMake","vehicleModel","vehicleType", "vehicleYear", "approved"]]

# Split the data into a training and testing set (80/20)
splitted_data = businessdata.randomSplit([0.8, 0.20], 24)
train_data = splitted_data[0]
test_data = splitted_data[1]

# Instanciate the pipeline object, specifying all the satges we defined above
pipeline = Pipeline(stages=indexers + encoders + [assembler, rf, labelConverter])

# Train the model
model = pipeline.fit(train_data)

# Test the model
predictions = model.transform(test_data)

# Compute the accuracy of the predictions
evaluator = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="accuracy")
accuracy = evaluator.evaluate(predictions)

# Print the result
print("Accuracy = %g" % accuracy)
print("Test Error = %g" % (1.0 - accuracy))


<a id='step4'></a>

*** 
## Step 4: Store the model in Watson ML
Watson machine learning is used here to store the resulting model. After the model is stored, Watson machine learning makes it possible to create an HTTP scoring endpoint, which is then used as the recommendation service.

The code below stores the created model and pipeline in IBM Watson Machine Learning. 

Lets start by importing the Watson Machine Learning API package.  

> *Documentation on this API can be found here: https://wml-api-pyclient.mybluemix.net/*


In [None]:
# !pip install watson_machine_learning_client 
from watson_machine_learning_client import WatsonMachineLearningAPIClient

Instanciate a Watson machine learning client.  **Note** that if you are using your own instance of IBM Watson Machine Learning service you need to specify the authentication information for it in the cell below (wml_credentials).

In [None]:
# Authenticate to Watson Machine Learning service on IBM Cloud.

wml_credentials={
  "apikey": "8aSfy7WbWV_7ioSe8kR4_tEc1ghS6Z07wkIflthGQmgT",
  "iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:pm-20:us-south:a/96f925a37d236c8abd126579c5a53a7b:8f794cbe-79cd-4d90-acb2-f797eca3dbec::",
  "iam_apikey_name": "auto-generated-apikey-109020c5-c94e-457e-9246-9ee6f63f3a62",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/96f925a37d236c8abd126579c5a53a7b::serviceid:ServiceId-547f2b67-3197-4cae-8348-b98c9784b4e9",
  "instance_id": "8f794cbe-79cd-4d90-acb2-f797eca3dbec",
  "password": "6800c9a6-aee6-4bea-a59d-09511ae025c5",
  "url": "https://us-south.ml.cloud.ibm.com",
  "username": "109020c5-c94e-457e-9246-9ee6f63f3a62"
}

client = WatsonMachineLearningAPIClient(wml_credentials)

We can now save the model and the training data. Call the **store_model** API to store trained model into Watson Machine Learning repository on Cloud.

In [None]:
published_model_details = client.repository.store_model(model=model, meta_props={'name':'Recommendation Prediction Model'}, training_data=train_data, pipeline=pipeline)

<a id='step5'></a>

***
## Step 5: Deploy the model

Now that the model is stored, we need to deploy it in a runtime environment, we start by retrieving the model uid:

In [None]:
model_uid = client.repository.get_model_uid(published_model_details)
print(model_uid)

We use the **deployments** client API to create a new deployment for our model:

In [None]:
deployment_details = client.deployments.create(artifact_uid=model_uid, name='Recommendation Prediction Model')

<div class="alert alert-block alert-success">
    Your model has been deployed and it is ready for use.  Take note of the <b><u>model_uid</u></b> and <b><u>deployment_uid</u></b> above.  You will need these values for <b>Section 3</b> of the lab.  These will be used by a Service Flow to make the REST call to the scoring API and obtain a recommendation for the <b>approved</b> field.
</div>

<a id='step6'></a>

***
## Step 6: Testing the deployed model

You can test the model using the **deployments** client API.

The deployment details specifies the URL that will allow us to score against the published model.

In [None]:
recommendation_url = client.deployments.get_scoring_url(deployment_details)

print(recommendation_url)

Test using different input value. For categorical variables you must pass a value that is know to the model.  To quickly look at the different labels for each category you can use the following code to examine the Spark DataFrame.

In [None]:
businessdata.groupBy("vehicleMake").count().show()

We now call the **deployments.score** API to get a prediction. 

In [None]:
import json

# Declare summy input for the prediction
recommendation_data = {"fields": ["creditScore", "requestedAmount", "vehicleMake", "vehicleModel", "vehicleYear", "vehicleType"],
                       "values": [[350, 14654, "GM", "Malibu", 2011, "Car"]]}

# Call the scoring API to predict the approval
scoring_response = client.deployments.score(recommendation_url, recommendation_data)
i = scoring_response['fields'].index('predictedLabel')
j = scoring_response['fields'].index('probability')
print("Recommend to approve = %s" %scoring_response['values'][0][i])
print("Confidence = %g" %scoring_response['values'][0][j][1])

# Uncomment to dump the full response
# print(json.dumps(scoring_response, indent=3))

***
## Conclusion

Using this notebook you have trained and deployed a machine learning model that can provide a recommendation to approve or reject a car loan request.  With the two ID's created in **step 5**, you can now continue to __Section 3__ in the lab instructions to complete the process application changes.
