<div><img src="http://www.stevinsonauto.net/assets/Icon_Brake.png", width=270, height=270, align = 'right'> 

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/51/IBM_logo.svg/640px-IBM_logo.svg.png", width = 90, height = 90, align = 'right', style="margin:0px 25px"></div>

# Classifying Driver Type with Brake Events
##### By Rafi Kurlansik, Sidney Phoon and Ross Lewis 

________________________________

**Table of contents**
    
* [Problem Statement](#problemStatement)
    
* [Exploratory Data Analysis](#eda)

* [Modeling](#ml)
    
* [Data and Model Export](#export)

* [Conclusion](#conclusion)

______________________

<a id='problemStatement'></a>

### Problem Statement

The service bays at dealerships have seen an increase in warranty claims related to brakes.  Using historical telematics data of known driver types, can we classify the driving style of customers making warranty claims?

________
<a id='eda'></a>

### Exploratory Data Analysis

## **<span style="color:red"> Action Required </span>** 

1. In the code cell below, click **Insert to code** the **historical_brake_events.csv** as an R Datafame
2. Rename the generated **df.data.1** variable to **brakeEventDF**

**Note:** You are reading the csv file from the project data assets

In [None]:
#add code to read historical_brake_events.csv as an R Datafame



We see VINs, the type or classification of the brake event, and then a series of columns related to the brake event itself.  

#### Summary Statistics

Let's begin exploring the data by looking at some summary statistics of these events by both type and road type.

In [None]:
library(magrittr)
library(dplyr)

In [None]:
print("Summary Statistics by Event Type")
group_by(brakeEventDF, type) %>% summarise(avg_braketime = mean(brake_time_sec), avg_brakedistance = mean(brake_distance_ft), avg_brakescore = mean(braking_score), abs_events = sum(abs_event))

print("Summary Statistics by Event Type and Road Type")
aggDF <- group_by(brakeEventDF, type, road_type) %>% summarise(avg_braketime = mean(brake_time_sec), avg_brakedistance = mean(brake_distance_ft), avg_brakescore = mean(braking_score), abs_events = sum(abs_event))

aggDF

Looks like aggressive drivers have lower brake times, distances, and scores.  Distracted drivers have more ABS events.  Quality drivers are on the other side of the spectrum.  

#### Visualization

We can see these relationships visually using the open source R package, ggplot2.  Let's examine the following three relationships:

* Brake Time by Type
* Brake Distance by Braking Score
* ABS Events by Type and Road Type

In [None]:
library(ggplot2)

In [None]:
options(repr.plot.width = 12, repr.plot.height = 3)

ggplot(brakeEventDF, aes(x = brake_time_sec, color = type, fill = type)) + 
    geom_density(alpha = 0.5) +
    labs(x = "Braking Time (seconds)", y = "Observation Density", title = "Distribution of Brake Time by Type") +
    theme_minimal()

ggplot(sample_frac(brakeEventDF, .33), aes(x = brake_distance_ft, y = braking_score)) + 
    geom_point(aes(shape = road_type, color = type), size = 2) +
    scale_shape_manual(values=c(3, 5, 8)) +
    geom_point(color = 'black', size = 0.35, aes(shape = road_type)) +
    labs(x = "Braking Distance (feet)", y = "Braking Score", title = "Braking Score by Distance (ft)") +
    theme_minimal()

ggplot(aggDF, aes(x = road_type, y = abs_events)) + 
    geom_bar(aes(fill = type), stat = 'identity') + 
    coord_flip() +
    labs(x = "# of ABS Events", y = "Road Type", title = "ABS Events by Road Type and Event Type") +
    theme_minimal()

After visually inspecting the data, we see some clear grouping along the lines of event type, road type, and number of ABS events.  There is also an obvious linear relationship between brake score and brake time.  This historical data is clean enough to build a model from.

__________

<a id='ml'></a>



### Model Training with Caret Package

By using the Caret package to build the model, and saving it to the ML repository, you will be able to evaluate and score the saved R model.  

See documentation on the <a href="https://topepo.github.io/caret/index.html">Caret Package</a> and <a href="https://content-dsxlocal.mybluemix.net/docs/content/local-dev/ml-r-models.htm">Saving R models</a>


The caret package (short for _C_lassification _A_nd _RE_gression _T_raining) is a set of functions that attempt to streamline the process for creating predictive models. The package contains tools for:

    * data splitting
    * pre-processing
    * feature selection
    * model tuning using resampling
    * variable importance estimation

as well as other functionality. 

#### Example using `train()` function
We can train a decision tree model on the historical brake event data.  It will learn the relationship between the various quantitative variables and the type of brake event, allowing us to classify new records as they come in.

#### Split data into train and test sets

In [None]:
suppressWarnings(suppressMessages(library(caTools)))
suppressWarnings(suppressMessages(library(randomForest)))
suppressWarnings(suppressMessages(library(caret)))

In [None]:
set.seed(22)

inTraining <- createDataPartition(brakeEventDF$type, p = .70, list = FALSE)
trainingDF <- brakeEventDF[ inTraining,]
testingDF  <- brakeEventDF[-inTraining,]

## Check dimensions, should add up to 2100
paste("Rows in training set: ", dim(trainingDF)[1])
paste("Rows in test set: ", dim(testingDF)[1])

#### Select features, train model, evaluate accuracy

In [None]:
## Preserve VINs to add on after modeling
vins <- trainingDF$VIN

## Select columns for modeling
trainingDF <- select(trainingDF, type, brake_time_sec, brake_distance_ft, road_type, braking_score, 
                 brake_pressure20pct, brake_pressure40pct, brake_pressure60pct,
                 brake_pressure80pct, brake_pressure100pct, abs_event, travel_speed)

# The function trainControl can be used to specifiy the type of resampling
# Here, three separate 10-fold cross-validations are used as the resampling scheme
fitControl <- trainControl(## 10-fold CV
                           method = "repeatedcv",
                           number = 10,
                           ## repeated 3 times
                           repeats = 3)


## Using `caret` package
brakeEventModel <- train(type ~ .,
                         data = trainingDF,
                         method = "rf",
                         ntree = 50,
                         trControl=fitControl,
                         proximity = TRUE)

print("Confusion Matrix for Testing Data:")
table(predict(brakeEventModel, select(testingDF, -type)), testingDF$type)

In [None]:
brakeEventModel

### Model Export

Save model in the **WSL RStudio directory** for use in our Shiny app. When we save to the file system, we will not be able to take advantage of the built-in model deployment capabilties. However, we use this option  just for demo purposes because we can quickly integrate it with the Shiny application. To integrate with the model that has been saved in a WSL repository, we will need to deploy it for online scoring in **Deployment Manager** and invoke it from Shiny via REST API. 

## **<span style="color:red"> Action Required </span>** 

1. Take note of the directory in RStudio where the *brakeEventModel.rds* file is saved

In [None]:
saveRDS(object = brakeEventModel, file = "../rstudio/demoBrakeEvents/brakeEventModel.rds")

Verify that the model has been saved. 

In [None]:
print(system("pwd", intern = TRUE))
print(system("ls -l ../rstudio/demoBrakeEvents", intern = TRUE))

The model has successfully been saved to File System. 

Now **save the model to Watson Studio Local** so that it shows up in your project assets.  Models saved in WSL can be scored and evaluated.

In [None]:
### Save Model in the WSL repository

suppressWarnings(suppressMessages(library(modelAccess)))
suppressWarnings(suppressMessages(library(jsonlite)))

saveModel(model = brakeEventModel, name = "BrakeEventClassifier", test_data=testingDF)

## **<span style="color:red"> Action Required </span>** 
Load the saved model from the WSL repository for scoring.  Review <a href="https://content-dsxlocal.mybluemix.net/docs/content/SSAS34_current/local-dev/ml-load-model.htm">Load caret model in R</a>

Take note of the model path in the code cell below, compare it to the "path" displayed after executing the previous code cell to save the model in WSL


In [None]:

modelPath <- paste(Sys.getenv("DSX_PROJECT_DIR"),"/models/BrakeEventClassifier/1/model",sep="")
savedModel<-readRDS(modelPath)

In [None]:
# payload for scoring 

scoringDF <- data.frame(
      brake_time_sec = 40,
      brake_distance_ft = 120,
      road_type = as.factor("highway"),
      braking_score = 100,
      brake_pressure20pct = 1,
      brake_pressure40pct = 1,
      brake_pressure60pct = 0,
      brake_pressure80pct = 0,
      brake_pressure100pct = 0,
      abs_event = 1,
      travel_speed = 20)
    

In [None]:
predictions <- predict(savedModel, scoringDF)
probabilities <- predict(savedModel, scoringDF, type="prob")
classes <- colnames(probabilities)

output <- list(classes, unname(probabilities, force=FALSE), predictions)
names(output) <- list("classes", "probabilities", "predictions")
json_output <- toJSON(output)

In [None]:
json_output

## **<span style="color:red"> Action Required </span>** 

#### Verify the two different locations of the saved model

1. Go to the Models section of the project, verify that the BrakeEventClassifier model is there
2. Open RStudio:
     2.1 Verify that the brakeEventsModel.rds is saved in the **demoBrakeEvents** folder
     2.2 Open demoBrakeEvents\server.R and and click "Run App" to run the Shiny App


### Optional Exercise

This exercise assumes you have:

* Deployed this project into WML
* Created an online deployment for the BrakeEventClassifier model

**Action Required**
* Copy the generated code in the online deployment
* Edit the code cell below to reolace the URL and the value of the Authorization Bearer token with those of your own deployment
* Execute the code cell below

In [None]:
# payload for scoring 

scoringDF <- data.frame(
      brake_time_sec = 40,
      brake_distance_ft = 120,
      road_type = as.factor("highway"),
      braking_score = 100,
      brake_pressure20pct = 1,
      brake_pressure40pct = 1,
      brake_pressure60pct = 0,
      brake_pressure80pct = 0,
      brake_pressure100pct = 0,
      abs_event = 1,
      travel_speed = 20)
    

In [None]:
suppressWarnings(suppressMessages(library(jsonlite)))

## Use generated code from model API
curl_left <- "curl -k -X POST https://169.50.74.155/dmodel/v1/r-lab-test-rel/rscript/brakeevent-online/score -H \'Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InNpZG5leXAiLCJwYWNrYWdlTmFtZSI6InItbGFiLXRlc3QtcmVsIiwicGFja2FnZVJvdXRlIjoici1sYWItdGVzdC1yZWwiLCJpYXQiOjE1NDU4Njk2NTR9.sG9t_NrFqZIsl2tB34wYsVXNqYX313WDUxvdcLet-C-fbrv8-VTqAFftbtIv4E_x8TV49yWQUguT6UNJd9EKI8CdKqmb7uNJM96MXp_EZjnYUIxV1lljRE7felcfm_u_sre7UrAeslrBInOujVrkMzbEhFz2J_Gybj8VdZLVHyppC_iKXyRlhTyPkgsh9Jk-AhMoov-s0KNFOwWc4mEANE1ZrHxnXb1aySzA9RcYvbxWVdZ5G1A5nGEDi-VGoP6qdM_MjtTNvTBdogEHVPraq6IEYz-ng5ovHVZyMB4H9lhzuxw9wPw8MY50GWp_ipB6TEaIakoYQiVxkp0XitfV3g\' -H \'Cache-Control: no-cache\' -H \'Content-Type: application/json\' -d \'{\"args\":{\"input_json\":"
curl_right <- "}}\'"
    
## Convert your dataframe for scoring to JSON that can be sent in the request body via REST
request_body <- toJSON(scoringDF)
    
## Make request by passing the curl command with the JSON-formatted request body to the system
response <- system(paste0(curl_left, request_body, curl_right), intern = T)
    
## Parse the response from the API back into an R dataframe
prediction <- as.data.frame(fromJSON(unlist(fromJSON(response)[1])))

results <- list(prediction = prediction, response = response)

In [None]:
results

________

### Conclusion

In this notebook we have quickly explored and visualized brake event data using R.  We've also built, tested, and exported a decision tree model to the RStudio file systems as well as the WSL repository.  The exported model can be embedded in applications or used to create reports.  To see the Shiny app where this model is used on customers coming into the service bay, click on 'Tools --> RStudio' in the menu bar above.

**Last Updated**: Dec 7th, 2018
_______



<div><br><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/51/IBM_logo.svg/640px-IBM_logo.svg.png" width = 200 height = 200>
</div><br>