#### I. The "Framingham Heart Study" Data Set

In this exercise we use the Framingham data set to predict whether a respondent is above a certain age.

Read in the data set, call it "`full`", and drop observations with at least one missing value


In [None]:
full = read.csv("HSML 6295 ds Framingham.csv")
full = na.omit(full)



Show the structure of the data set


In [None]:
str(full)




Show summary statistics 


In [None]:
library(stargazer)
stargazer(full, 
          type = "text", 
          summary.stat = c("n", "mean", "sd", "min", "p25", "median", "p75", "max"),
          title="Full Data Set", digits=1)



Define the the `response` variable as: "The respondent was at least 48 years old at the time of the survey."


In [None]:
full$response = ifelse(full$Age >= 48, 1, 0)
table(full$response)



To build most of the predictive models in this software practice and to create the confusion matrices, it is useful to declare the response variable to be a factor variable with two levels, labeled "Yes" and "No"


In [None]:
full$response = factor(full$response, levels = c(0,1), labels = c("No", "Yes"))
table(full$response)



Drop `Age` from the data set and move the new `response` variable from the last (16th) to first position in data set 


In [None]:
full = subset(full, select = -c(Age))
full = subset(full, select = c(16, 1:15))
# show list of variables in current data set
names(full)



Create a list called `train_id` of 3,658/2 = 1,829 random numbers between 1 and 3,658, the number of observations in the "`full`" data set.


In [None]:
set.seed (12345)
train_id = sample(1:nrow(full), nrow(full)/2)


Split the full data set into two subsets of equal sample size, called "`train`" and "`test`".
To do so, use the random numbers in the `train_id` list created above to tag the observations that will be assigned to the training set. 


In [None]:
train = full[train_id,]




Assign the observations whose ID number is not included in the `train_id` list to the test set


In [None]:
test = full[-train_id,]




Compute summary statistics for the training set


In [None]:
stargazer(train, 
          type = "text", 
          summary.stat = c("n", "mean", "sd", "min", "p25", "median", "p75", "max"),
          title="Training Set", digits=1)



Compute summary statistics for the test set


In [None]:
stargazer(test, 
          type = "text", 
          summary.stat = c("n", "mean", "sd", "min", "p25", "median", "p75", "max"),
          title="Test Set", digits=1)


Note that while the maximum values of most continuous variables vary widely between the training and test sets, the differences in mean and median values are never larger than one unit and sometimes zero.

Define the accuracy, true positive rate (TPR), and false positive rate (FPR) achieved by a given predictive model as functions of the observed (`Actual`) and predicted (`Predicted`) responses.


In [None]:
accuracy = function(Actual, Predicted) {
    round(100*mean(Actual == Predicted),2)
}
# true positive rate
TPR = function(Actual, Predicted) {
    round(100*sum((Actual=="Yes")*(Predicted=="Yes"))/sum(Actual=="Yes"),2)
}
# false positive rate
FPR = function(Actual, Predicted) {
    round(100*sum((Actual=="No")*(Predicted=="Yes"))/sum(Actual=="No"),2)
}



Calculate the number of predictor variables


In [None]:
predictors = ncol(full)-1
predictors


#### II. Bayes Classifier

The simplest prediction assigns to each observation the modal class, i.e. the class that is found most frequently in the training set, *even if this class is found in fewer than half the observations*. This classifier is commonly known as the Bayes classifier. In this software practice, the Bayes classifier is an example of a "null model" in that it does not make use of any predictor variables in the data set. It thus serves as a baseline model.

Compute the predicted values of the response in the *test* set.

1. Find the modal class in the training set


In [None]:
mode = function(x) {
  unique_x = unique(x)
  unique_x[which.max(tabulate(match(x, unique_x)))]
}
mode(train$response)



2. Assign this class to *all* observations in the test set


In [None]:
predicted = rep(mode(train$response), length(test$response))




Generate the confusion matrix for the test set and compute the accuracy, true positive rate (TPR), and false positive rate (FPR).


In [None]:
cm = table(Actual = test$response, Predicted = predicted)
(cm = addmargins(cm, FUN = list(Total = sum), quiet = TRUE))
(accuracy_Bayes = accuracy(Actual = test$response, Predicted = predicted))
(TPR_Bayes = TPR(Actual = test$response, Predicted = predicted))
(FPR_Bayes = FPR(Actual = test$response, Predicted = predicted))


#### III. Logistic Regression

Compute the logistic regression fit of `response` on all 15 predictor variables on the training set  and save the result as `logistic`.


In [None]:
logistic = glm(response ~ ., data=train, family=binomial)
round(coef(summary(logistic)),2)


Compute the predicted values of the response in the *test* set.

1. Compute the predicted probability for each observation in the test set


In [None]:
predicted = predict(logistic, newdata=test, type="response")




2. Convert the predicted probability into a predicted class using the probability threshold of 0.5


In [None]:
predicted = ifelse(predicted > 0.5, "Yes", "No")




Generate the confusion matrix for the test set and compute the accuracy, true positive rate (TPR), and false positive rate (FPR).


In [None]:
cm = table(Actual = test$response, Predicted = predicted)
(cm = addmargins(cm, FUN = list(Total = sum), quiet = TRUE))
(accuracy_logistic = accuracy(Actual = test$response, Predicted = predicted))
(TPR_logistic = TPR(Actual = test$response, Predicted = predicted))
(FPR_logistic = FPR(Actual = test$response, Predicted = predicted))


#### IV. Ridge Regression

Declare matrix of predictors `x` and response variable `y` and define the list ("`grid`") of $\lambda$ (lambda) values for which the ridge regression model is fit:


In [None]:
x = model.matrix(response ~ ., data=train)[,-1]
y = train$response
grid=10^seq(-0.8,-2.8,length=40)



Compute value of $\lambda$, stored as `cv$lambda.min`, that minimizes the training error, defined as the cross-validated prediction error for the *training* set. To fit a ridge regression model, we set `alpha` to 0.


In [None]:
library(glmnet)
set.seed (1)
cv = cv.glmnet(x, y, alpha=0, family = "binomial", lambda = grid)
cv$lambda.min
plot(cv)


The horizontal axis in this graph is drawn at logarithmic scale to show more detail. "Log" refers to the natural logarithm, also abbreviated as "ln".
The value of $\lambda$ shown at the left dotted line, 


In [None]:
round(log(cv$lambda.min),2)



, is the value that minimizes the training error (`cv$lambda.min`). The red dots are the point estimates of the prediction error and the gray bars are one standard error above and below the red dots. The right dotted line in the graph marks the value of $\lambda$ whose point estimate is one standard error larger than that of `cv$lambda.min`:


In [None]:
round(cv$lambda.1se,4)
round(log(cv$lambda.1se),2)


You can think of this "second-best" value of $\lambda$ as the largest value of $\lambda$ that is statistically indistinguishable from `cv$lambda.min`. If our goal is to shrink the coefficient estimates as much as possible, we could choose a value of $\lambda$ as high as 


In [None]:
round(cv$lambda.1se, 4)




Compute the coefficient estimates for the ridge regression model that corresponds to `cv$lambda.min` and save the result as `ridge`.


In [None]:
ridge = glmnet(x, y, alpha=0, lambda=cv$lambda.min, family = "binomial")
round(coef(ridge),2)


The numbers of predictors included in the various ridge regression fits are shown above the top horizontal axis in the graph above. When we fit ridge regression models, the coefficient estimates are "shrunk", i.e. their absolute magnitude is reduced. For instance, the coefficient estimate for the predictor `Stroke` is 1.16 in the logistic regression fit but 1.12 in the ridge regression fit. Similarly, the coefficient estimate for the predictor `Smoker` has shrunk from -0.59 in the logistic regression fit to -0.57 in the ridge regression fit.

If we wanted to shrink the coefficient estimates even further, we could use the "second-best" value of $\lambda$:


In [None]:
ridge.2 = glmnet(x, y, alpha=0, lambda=cv$lambda.1se, family = "binomial")
round(coef(ridge.2),2)


The larger value of $\lambda$ has shrunk the coefficients for `Stroke` and `Smoker` even further, to 0.75 and -0.42, respectively. 

Compute the predicted values of the response in the *test* set.

1. Compute the predicted probability for each observation in the test set


In [None]:
x = model.matrix(response ~ ., test)[,-1]
predicted = predict(ridge, s=cv$lambda.min, newx=x, type = "response")



2. Convert the predicted probability into a predicted class using the probability threshold of 0.5


In [None]:
predicted = ifelse(predicted > 0.5, "Yes", "No")




Generate the confusion matrix for the test set and compute the accuracy, true positive rate (TPR), and false positive rate (FPR).


In [None]:
cm = table(Actual = test$response, Predicted = predicted)
(cm = addmargins(cm, FUN = list(Total = sum), quiet = TRUE))
(accuracy_ridge = accuracy(Actual = test$response, Predicted = predicted))
(TPR_ridge = TPR(Actual = test$response, Predicted = predicted))
(FPR_ridge = FPR(Actual = test$response, Predicted = predicted))


#### V. The Lasso

Compute value of $\lambda$, stored as `cv$lambda.min`, that minimizes the training error, defined as the cross-validated prediction error for the *training* set.
To fit a lasso model, we set `alpha` to 1.


In [None]:
x = model.matrix(response ~ ., data=train)[,-1]
y = train$response
grid=10^seq(-1.8,-2.6,length=40)

library(glmnet)
set.seed (1)
cv = cv.glmnet(x, y, alpha=1, family = "binomial", lambda = grid)
cv$lambda.min
plot(cv)



In the graph, the value of $\lambda$ that minimizes the cross-validated training error, `cv$lambda.min`, is shown at 


In [None]:
round(log(cv$lambda.min),2)



. The top horizontal axis in the graph shows that the lasso model that minimizes the training error includes only 12 predictors.

To see which predictors the optimal lasso model has dropped, we compute the coefficient estimates for the lasso model that corresponds to `cv$lambda.min` and save the result as `lasso`.


In [None]:
lasso = glmnet(x, y, alpha=1, lambda=cv$lambda.min, family = "binomial")
round(coef(lasso),2)


The optimal lasso model no longer includes the predictors `BP.Medication`, `BMI`, and `Glucose`.
Also, note that the absolute magnitudes of the coefficients of the other predictors have shrunk. For instance, the coefficients for `Stroke` and `Smoker` are now 0.80 and -0.57, respectively.

If we wanted to drop even more predictors (and shrink the remaining non-zero coefficient estimates even further), we could use the "second-best" value of $\lambda$, which is shown at the right dotted line in the graph:


In [None]:
round(cv$lambda.1se, 4)
round(log(cv$lambda.1se),2)


The top horizontal axis of the graph shows that this more restrictive model only includes 10 predictors. 
The corresponding lasso model is:


In [None]:
lasso.2 = glmnet(x, y, alpha=1, lambda=cv$lambda.1se, family = "binomial")
round(coef(lasso.2),2)


In addition to `BP.Medication`, `BMI`, and `Glucose`, the more restrictive lasso model drops (shrinks to zero) the coefficients for the predictors `Stroke` and `Male`. Also, the coefficient for `Smoker` has shrunk from -0.57 to -0.51.

Compute the predicted values of the response in the *test* set.

1. Compute the predicted probability for each observation in the test set


In [None]:
x = model.matrix(response ~ ., test)[,-1]
predicted = predict(lasso, s=cv$lambda.min, newx=x, type = "response")



2. Convert the predicted probability into a predicted class using the probability threshold of 0.5


In [None]:
predicted = ifelse(predicted > 0.5, "Yes", "No")




Generate the confusion matrix for the test set and compute the accuracy, true positive rate (TPR), and false positive rate (FPR).


In [None]:
cm = table(Actual = test$response, Predicted = predicted)
(cm = addmargins(cm, FUN = list(Total = sum), quiet = TRUE))
(accuracy_lasso = accuracy(Actual = test$response, Predicted = predicted))
(TPR_lasso = TPR(Actual = test$response, Predicted = predicted))
(FPR_lasso = FPR(Actual = test$response, Predicted = predicted))



Note that we could have obtained the logistic regression model by estimating a ridge (`alpha` = 0) or lasso (`alpha` = 1) regression model and setting `lambda` = 0. (Minimal differences between the coefficient values are due to rounding.)


In [None]:
library(glmnet)
x = model.matrix(response ~ ., data=train)[,-1]
y = train$response
logistic.ridge = glmnet(x, y, alpha=0, lambda=0, family = "binomial")
logistic.lasso = glmnet(x, y, alpha=1, lambda=0, family = "binomial")
round(coef(summary(logistic)),2)
round(coef(logistic.ridge),2)
round(coef(logistic.lasso),2)


#### VI. Single Pruned Tree

Using the training set, grow the unpruned ("fully grown") tree and save the result as `tree`.


In [None]:
library(tree)
tree = tree(response ~ ., data=train)
summary(tree)



Generate the confusion matrix for the *training* set.


In [None]:
predicted = predict(tree, newdata=train, type = "class")
cm = table(Actual = train$response, Predicted = predicted)
cm = addmargins(cm, FUN = list(Total = sum), quiet = TRUE)
cm


Note that the number of misclassified patients is 601 and matches the number reported by the `summary(tree)` command above. 160 patients were false negative: the predicted condition was "No" when their actual condition was "Yes". 441 patients were false positives: the predicted condition was "Yes" when their actual condition was "No".

Plot the unpruned tree


In [None]:
plot(tree)
text(tree, pretty = 0)
title(main = "Unpruned Classification Tree \n")


Note that the number of terminal nodes is 7, matching the number reported by the `summary(tree)` command above. Four terminal nodes predict that the patient is at least 48 years old.

Compute the 20-fold cross-validated prediction error for subtrees of various sizes. The cross-validated prediction error is the average number of misclassified patients in the test set defined by each of the 20 cross-validation folds. The tree sizes are measured by the number of terminal nodes.


In [None]:
set.seed(6295)
cv = cv.tree(tree, FUN=prune.misclass, K=20)



Plot the cross-validated prediction error as a function of the tree size.


In [None]:
plot(cv$dev ~ cv$size, type='b', col="lightseagreen", lwd=2,
     xlab = "Subtree Size (Terminal Nodes)", ylab = "Cross-Validated Prediction Error")
axis(1, at=cv$size)



Save the size of the subtree that minimizes the cross-validated prediction error.


In [None]:
arg_min_cv = cv$size[which.min(cv$dev)]
arg_min_cv



Prune the original tree to the size that minimizes the CV error and save the result as `pruned_tree`.


In [None]:
pruned_tree = prune.misclass(tree, best = arg_min_cv)
summary(pruned_tree)



Plot the pruned tree


In [None]:
plot(pruned_tree)
text(pruned_tree, pretty=0)
title(main = "Pruned Classification Tree \n")


To obtain the pruned tree in this example, we've simply cut back the unpruned tree to the internal nodes that matter, i.e. where going left versus right changes the prediction. For this reason, the pruned tree in this example yields the same predictions as the unpruned tree, and the resulting misclassification error rate (training error) is the same: 0.3286 = 601 / 1829.

Compute the predicted values of the response in the *test* set.


In [None]:
predicted = predict(pruned_tree, newdata=test, type = "class")




Generate the confusion matrix for the test set and compute the accuracy, true positive rate (TPR), and false positive rate (FPR).


In [None]:
cm = table(Actual = test$response, Predicted = predicted)
(cm = addmargins(cm, FUN = list(Total = sum), quiet = TRUE))
(accuracy_tree = accuracy(Actual = test$response, Predicted = predicted))
(TPR_tree = TPR(Actual = test$response, Predicted = predicted))
(FPR_tree = FPR(Actual = test$response, Predicted = predicted))


The number of misclassified patients when we apply the pruned classification tree to the test set is 601, the same number that we found when we applied the unpruned classification tree to the training set. This is a coincidence.

#### VII. Bootstrap Aggregation (Bagging)

Using the training set, grow one unpruned tree for each of 500 bootstrap samples and save the result as `bag`.


In [None]:
library(randomForest)
set.seed(1)
bag = randomForest(response ~ ., data = train, mtry = predictors, importance = TRUE)
bag



Compute the predicted values of the response in the *test* set.


In [None]:
predicted = predict(bag, newdata=test, type = "class")




Generate the confusion matrix for the test set and compute the accuracy, true positive rate (TPR), and false positive rate (FPR).


In [None]:
cm = table(Actual = test$response, Predicted = predicted)
(cm = addmargins(cm, FUN = list(Total = sum), quiet = TRUE))
(accuracy_bag = accuracy(Actual = test$response, Predicted = predicted))
(TPR_bag = TPR(Actual = test$response, Predicted = predicted))
(FPR_bag = FPR(Actual = test$response, Predicted = predicted))


#### VIII. Random Forest

Random forests are grown just like the bootstrap-aggregated forests. The difference is that to grow each tree in a random forest only a subset $m$ of all available predictors is considered. To grow each tree in a bootstrap-aggregated forest *all* available predictors are considered.

Define $m = \sqrt{p}$, the number of predictors considered at each split.


In [None]:
predictors
m = round(sqrt(predictors))
m



Grow a random forest of 500 trees and save the result as `rf`.


In [None]:
library(randomForest)
set.seed(1)
rf = randomForest(response ~ ., data = train, mtry = m, importance = TRUE)
rf



Compute the predicted values of the response in the *test* set


In [None]:
predicted = predict(rf, newdata=test, type = "class")




Generate the confusion matrix for the test set and compute the accuracy, true positive rate (TPR), and false positive rate (FPR).


In [None]:
cm = table(Actual = test$response, Predicted = predicted)
(cm = addmargins(cm, FUN = list(Total = sum), quiet = TRUE))
(accuracy_rf = accuracy(Actual = test$response, Predicted = predicted))
(TPR_rf = TPR(Actual = test$response, Predicted = predicted))
(FPR_rf = FPR(Actual = test$response, Predicted = predicted))



Plot the importance of each predictor


In [None]:
varImpPlot(rf)



#### IX. Boosting

Convert the response variable back to numeric format


In [None]:
train$response = factor(as.numeric(train$response))
table(train$response)
train$response = as.numeric(train$response)-1
table(train$response)
mean(train$response)



Grow a sequence of 5,000 trees using the *training* set and save the result as `boost`.


In [None]:
library(gbm)
set.seed(1)
boost = gbm(response ~ ., data = train, 
                   distribution = "bernoulli", 
                   n.trees=5000, interaction.depth=1)
summary(boost, plotit = FALSE)


Compute the predicted values of the response in the *test* set.

1. Compute the predicted probability for each observation in the test set (after converting the response variable back to numeric format)


In [None]:
test$response = factor(as.numeric(test$response))
test$response = as.numeric(test$response)-1
predicted = predict(boost, newdata=test, n.trees=5000, type = "response")



2. Convert the predicted probability into a predicted class using the probability threshold of 0.5


In [None]:
predicted = ifelse(predicted > 0.5, "Yes", "No")




Generate the confusion matrix for the test set and compute the accuracy, true positive rate (TPR), and false positive rate (FPR).


In [None]:
cm = table(Actual = as.factor(ifelse(test$response == 1, "Yes", "No")), Predicted = predicted)
(cm = addmargins(cm, FUN = list(Total = sum), quiet = TRUE))
(accuracy_boost = accuracy(Actual = as.factor(ifelse(test$response == 1, "Yes", "No")), Predicted = predicted))
(TPR_boost = TPR(Actual = as.factor(ifelse(test$response == 1, "Yes", "No")), Predicted = predicted))
(FPR_boost = FPR(Actual = as.factor(ifelse(test$response == 1, "Yes", "No")), Predicted = predicted))


#### X. Summary

The following table shows the values of 3 performance statistics for the 8 different predictive models that were optimized on the training set and evaluated on the test set. The 8 models are listed in ascending order of the false positive rate.

**Prediction Method**   | **False Positive Rate**             | **True Positive Rate**             | **Accuracy**            
---                     |  ---:                               | ---:                               | ---:                    
Lasso			              | 30.14			 | 71.85			 | 70.97
Ridge Regression			  | 30.26			 | 71.95			 | 70.97
Logistic Regression			| 30.63			 | 72.24			 | 70.97
Random Forest			      | 33.46			 | 75.39			 | 71.46
Bagging			            | 33.95			 | 74.90			 | 70.97
Boosting			          | 37.02			 | 72.24			 | 68.12
Single Pruned Tree			| 54.61			 | 84.55			 | 67.14
Bayes Classifier			  | 100.00		 | 100.00			 | 55.55

The table shows that, with the exception of the bagging and boosting models, raising the false positive rate (FPR) raises the true positive rate (TPR). The intuition is the same as in the construction of ROC curves for a given class of predictive model: the more readily a model predicts a positive response, the more readily that model will both capture the true positives and misclassify as positives responses that actually are negative.

As in the plot of a standard ROC curve, we can plot the true positive rate on the vertical axis against the false positive rate on the horizontal axis:


In [None]:
FPR = c(FPR_lasso, FPR_ridge, FPR_logistic, FPR_rf, FPR_bag, FPR_boost, FPR_tree, FPR_Bayes)
FPR_frontier = c(0, FPR_lasso, FPR_ridge, FPR_logistic, FPR_rf, FPR_tree, FPR_Bayes)
TPR = c(TPR_lasso, TPR_ridge, TPR_logistic, TPR_rf, TPR_bag, TPR_boost, TPR_tree, TPR_Bayes)
TPR_frontier = c(0, TPR_lasso, TPR_ridge, TPR_logistic, TPR_rf, TPR_tree, TPR_Bayes)
roc = data.frame(FPR, TPR)
attr(roc, "row.names") = c("lasso", "ridge", "logistic", "rf", "bag", "boost", "tree", "Bayes")
par(pty = "s")
plot(FPR, TPR, xlim=c(-10,110), ylim = c(-10,110), asp=1,
     xlab = "False Positive Rate (FPR)", ylab = "True Positive Rate (TPR)")
with(roc, text(TPR ~ FPR, labels = row.names(roc), pos = 1, col='red3', cex = 0.8))
lines(FPR_frontier, TPR_frontier, type='b', lwd=2, col='red3')
title(main = "Predictive Performance of 8 Models \n")


There is a cluster of models whose false positive rate is between 30% and 40%, as shown in the figure below:



In [None]:
roc$label_color = "red3"
roc$label_color[roc$FPR > 33.5] = "black"
attr(roc, "row.names") = c(paste("lasso", accuracy_lasso), paste("ridge", accuracy_ridge), 
                           paste("logistic", accuracy_logistic), paste("random forest", accuracy_rf), 
                           paste("bag", accuracy_bag), paste("boost", accuracy_boost), 
                           paste("tree", accuracy_tree), paste("Bayes", accuracy_Bayes))
par(pty = "m")
plot(FPR, TPR, xlim=c(30,38), ylim = c(71.8,75.4),
     xlab = "False Positive Rate (FPR)", ylab = "True Positive Rate (TPR)")
with(roc, text(TPR ~ FPR, labels = row.names(roc), pos = 4, col=roc$label_color, cex = 0.8))
lines(FPR_frontier, TPR_frontier, type='b', lwd=2, col='red3')
title(main = "Predictive Performance of 6 Models \n")


The numbers behind the model names are the models' accuracy values. In this example, 4 models -- lasso, ridge, logistic, and bag -- all yield the same accuracy but their true and false positive rates differ. Thus, if you were to choose among these 4 models, knowing only their accuracy would render them indistinguishable; knowing their TPR and FPR will reveal the trade-offs involved in choosing one model over the other 3.

The 4 models shown in red lie on the frontier: for each of these models, there is no other model to its northwest, i.e. no other model simultaneously achieves a higher TPR *and* a lower FPR. We see that the bagging and boosting models both fail this test: The random-forest model achieves a higher TPR and lower FPR than the bagging and boosting models. The logistic regression model achieves the same TPR as and a lower FPR than the boosting model. The bagging and boosting models are said to be "strictly dominated" because we can raise or maintain the TPR while lowering the FPR by switching to a model on the frontier.

The accuracy values of the two strictly dominated models are strictly smaller than those of the models that are superior in each case: the bagging model (70.97) is dominated by the random-forest model (71.46), while the boosting model (68.12) is dominated by the random-forest model (71.46) and the logistic-regression model (70.97). Also note that the accuracy (70.97) is the same for the bagging model and the logistic-regression model but neither dominates the other: the bagging model achieves a higher TPR than the logistic-regression model but it also comes with a higher FPR. In the figure, the bagging model lies to the northeast of the logistic-regression model, not the northwest.

Once you've dropped the two strictly dominated models, which of the remaining 8 models, all of which lie on the frontier, should you use? The answer depends on how much you value true positives and how costly false positives are to you. Suppose you are classifying 200 patients, and you know that 100 of these are positive and 100 are negative. You just don't know each individual patient's actual status. If you use the logistic-regression model, you will catch 72.24 of the 100 positive patients. But you'll also falsely label 30.63	of the 100 negative patients as positives. Switching to the random-forest model will enable you to raise the number of true positives by 


In [None]:
round((TPR_rf - TPR_logistic),2)



to 75.39 patients. But this switch comes at the cost of raising the number of false positives by 


In [None]:
round((FPR_rf - FPR_logistic),2)



to 33.46. Thus, when you switch from the logistic-regression model to the random-forest model, allowing one additional false positive patient allows you to identify 


In [None]:
round((TPR_rf - TPR_logistic)/(FPR_rf - FPR_logistic),2)



additional true positive patients on average. You can raise the number of true positives even further by switching to the single pruned tree. But this switch yields only `


In [None]:
round((TPR_tree - TPR_rf)/(FPR_tree - FPR_rf),2)



additional true positive patients for every additional false positive patient. 
