# RECURSIVE PARTIONING TREES FOR PREDICTING CUSTOMER CHURN RATE

Example adapted from Yu-Wei Chapter 5

In [None]:
library(data.table)
library(tidyverse)
library(plotly)
library(C50) # for churn data
library(rpart) # for recursive partioning trees
library(rpart.plot) # for plotting recursive partioning trees
library(visNetwork) # for better plotting recursive partioning trees
library(caret) # for a better confusion matrix

In [None]:
options(repr.matrix.max.rows=20, repr.matrix.max.cols=15) # for limiting the number of top and bottom rows of tables printed 

In [None]:
datapath <- "~/data_ad454"

In [None]:
library(modeldata)

In [None]:
data(mlc_churn, package = "modeldata")

In [None]:
mlc_churn

## Explore

In [None]:
churn <- mlc_churn %>% as.data.table()

In [None]:
str(churn)

Now first let's say, I want to get the unique levels of each factor column in a concise and simple way

We use the purrr package for that in order to iterate through fields:

keep, selects only those columns that satisfied the condition, and map works like "lapply" to apply the function to each selected column:

In [None]:
churn %>% purrr::keep(is.factor) %>% purrr::map(levels)

Let's have the histograms for factor variables

In [None]:
churn_factors <- churn %>% purrr::keep(is.factor) %>% # select factor columns
    tidyr::gather() %>% # convert into long format for faceting
    ggplot(aes(x = value)) + # plot value
    facet_wrap(~ key, scales = "free") + # divide into separate plots by key
    geom_bar()

plotly::ggplotly(churn_factors)

So:

- Most frequent area code is 415
- 707 out of 5000 observations have a churn
- 4527 does not have an international plan
- Data is nearly evenly distributed across states
- 3677 does not gave a voice mail plan

You might wonder what tidyr::gather() does:

In [None]:
churn %>% purrr::keep(is.factor) # that's wide format

In [None]:
churn %>% purrr::keep(is.factor) %>% tidyr::gather() # that's the long format

For numeric variables, it is good to have five point summaries easily as such:

In [None]:
churn %>% purrr::keep(is.numeric) %>% sapply(quantile) %>% t()

And we can have density plots for numeric variables:

In [None]:
churn %>% purrr::keep(is.numeric) %>% # select columns
    tidyr::gather() %>% # reshape into long format in columns "key" and "value"
    ggplot(aes(value)) + # plot value
        facet_wrap(~ key, scale = "free" ) + # divide into separate plots by key
        geom_density(fill = "green")  # get density plots

## Partition dataset

In [None]:
set.seed(1863)
train_ind <- churn[,sample(.I, 0.7 * .N)]

In [None]:
churn_train <- churn[train_ind]
churn_test <- churn[-train_ind]

## Train the dataset

In [None]:
?rpart

```
Recursive Partitioning and Regression Trees
Description
Fit a rpart model

Usage
rpart(formula, data, weights, subset, na.action = na.rpart, method,
      model = FALSE, x = FALSE, y = TRUE, parms, control, cost, ...)
Arguments
formula	
a formula, with a response but no interaction terms. If this a a data frame, that is taken as the model frame (see model.frame).

data	
an optional data frame in which to interpret the variables named in the formula.

...

cost	
a vector of non-negative costs, one for each variable in the model. Defaults to one for all variables. These are scalings to be applied when considering splits, so the improvement on splitting on a variable is divided by its cost in deciding which split to choose.
```

In [None]:
churn.rp <- rpart::rpart(churn ~ ., data = churn_train)

In [None]:
churn.rp

- split is the condition for split,
- n is the total number of cases at node
- loss is the misclassification cost
- yval is the fitted value for the node (yes or no)
- and the yprob is the probabilities of yes and no (those reaching yes on the left and no the right)

When we stop at the root without any classification and predict all cases as "No", we would have a total misclassification of 491 - the total number of "yes" cases in the train sample.

After one step of partioning according to whether total_day_minutes >= 265.75, # of misclassified cases is down to 83+358 = 441

Now let's examine the complexity parameter.

Complexity parameter serves as a penalty to control the size of the tree. The greater the CP value, the fewer the number of splits there are 

In [None]:
printcp(churn.rp)

We see that out of 19 variables only 9 are used

And we can plot the cost complexity parameters:

In [None]:
plotcp(churn.rp)

## Visualize the tree

A simple way to visualize a rpart tree is the base plot function with text:

In [None]:
plot(churn.rp, , uniform = F, branch=0.6, margin = 0)
text(churn.rp, all = T, use.n = T)

This does not work well with larger trees

A better option is the rpart.plot function from the rpart.plot package:

In [None]:
rpart.plot::rpart.plot(churn.rp)

A better option is to use the visTree function from the JS powered visNetwork package:

In [None]:
visNetwork::visTree(churn.rp)

## Evaluate the classification accuracy

In [None]:
predictions_train <- predict(churn.rp, churn_train, type = "class")

In [None]:
table(churn_train$churn, predictions_train)

In [None]:
caret::confusionMatrix(table(predictions_train, churn_train$churn))

Accuracy rate is 96% with misclassified cases of 142 out of 3500

## Predictive power of the model

In [None]:
predictions_test <- predict(churn.rp, churn_test, type = "class")

In [None]:
table(churn_test$churn, predictions_test)

In [None]:
caret::confusionMatrix(table(predictions_test, churn_test$churn))

Predictive accuracy is 94.3%, quite good!

## Pruning

We may remove sections not su powerful in classification in order to avoid over-fitting and to improve accuracy

Let's remember the model cost parameters:

In [None]:
printcp(churn.rp)

First let's find the minimum cross-calidation error:

In [None]:
min(churn.rp$cptable[,"xerror"])

And locate the row of that minimum value:

In [None]:
minrow <- which.min(churn.rp$cptable[,"xerror"])
minrow

Get the cost complexity parameter at that row:

In [None]:
churn.cp <- churn.rp$cptable[minrow, "CP"]
churn.cp

Let's prune the tree by setting the cp parameter to the CP value of the record with minimum cross-validation error:

In [None]:
prune.tree <- prune(churn.rp, cp = churn.cp)

And visualize:

In [None]:
visNetwork::visTree(prune.tree)

### Classification performance of the pruned tree

In [None]:
predictions_train_pruned <- predict(prune.tree, churn_train, type = "class")

In [None]:
caret::confusionMatrix(table(predictions_train_pruned, churn_train$churn))

A lower accuracy inside the train set

How about predictive power?

### Predictive power of the pruned tree

In [None]:
predictions_test_pruned <- predict(prune.tree, churn_test, type = "class")

In [None]:
caret::confusionMatrix(table(predictions_test_pruned, churn_test$churn))

Predictive power is also slightly lower, however we have a less complex tree and some split conditions that may cause over-fitting are eliminated