# Random Forest Models: Classification and Regression

This notebook is a continuation of the describes how to implement Zero-Inflated Random Forest models to predict board counts at each bus stop.

 * Requiered libraries:

In [1]:
library(randomForest)
library(mlbench)
library(caret)
library(e1071)
library(dplyr)
library(tidyr)
library(readr)
library(ranger)
library(janitor)
library(rFerns)

randomForest 4.6-14

Type rfNews() to see new features/changes/bug fixes.

Loading required package: ggplot2


Attaching package: ‘ggplot2’


The following object is masked from ‘package:randomForest’:

    margin


Loading required package: lattice

“running command 'timedatectl' had status 1”

Attaching package: ‘dplyr’


The following object is masked from ‘package:randomForest’:

    combine


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



Attaching package: ‘ranger’


The following object is masked from ‘package:randomForest’:

    importance



Attaching package: ‘janitor’


The following objects are masked from ‘package:stats’:

    chisq.test, fisher.test




## Zero-Inflated Random Forest Model

The proposed Zero-Inflated Random Forest model can be described as follows:

* Classification: It learns a classifier to predict counts (> 0) and zero counts using [Random Ferns](https://www.jstatsoft.org/article/view/v061i10/v61i10.pdf).
* Regression: It takes the *predicted counts* data to learn a regression model for these counts using Random Forest ([Ranger](https://www.jstatsoft.org/article/view/v077i01)).
    
The function (`RF_Ferns_and_Ranger()`) runs the Zero-Inflated Random Forest model. It requires the following inputs:

* `route_id == rt`.
* `direction_id == di`.
* `stop_id == st`.
* `part == part` (Pre or port-lockdown dataset).
    
Notice that the input paramters defined the file paths to run the models. If the data structure is organized in a different way, the input parameters or the paths must be change.
    
Then, it returns a Zero-Inflated Random Forest model saved as `*.rds`. Also, a chart with the test data with the model predictions saved as `*.csv`. The test RMSE can be calculated from the this chart.

Notice that we are not reporting the train RMSE since the random forest based models are trained individually using different metrics: misclassification rate and RMSE for classification and regression respectively.

In [2]:
RF_Ferns_and_Ranger <- function(rt, di, st, part){
    
    library(randomForest)
    library(mlbench)
    library(caret)
    library(e1071)
    library(dplyr)
    library(tidyr)
    library(readr)
    library(ranger)
    library(janitor)
    library(rFerns)
    
    
    path = paste0('data', '/', 'jmartinez', '/', 'Data_for_RF_Models', '/', 'Board_Counts', '/',
                  paste('route', rt, sep = '_'), '/', paste('direction', di, sep = ''), '/',
                  paste('bus_stop', st, sep = '_'), '/')
    
    if(part == 'pre'){
        file_path_train = paste(path, 'pre_lock_train_data.csv', sep = '/')
        file_path_test = paste(path, 'pre_lock_test_data.csv', sep = '/')
        
        board_train = read_csv(file_path_train)
        board_test = read_csv(file_path_test)
        
        board_train$month = factor(board_train$month)
        board_train$service_kind = factor(board_train$service_kind)
        board_train$hour = factor(board_train$hour)

        board_test$month = factor(board_test$month)
        board_test$service_kind = factor(board_test$service_kind)
        board_test$hour = factor(board_test$hour)
    }
    else if(part == 'post'){
        
        file_path_train = paste(path, 'post_lock_train_data.csv', sep = '/')
        file_path_test = paste(path, 'post_lock_test_data.csv', sep = '/')
        
        board_train = read_csv(file_path_train)
        board_test = read_csv(file_path_test)
        
        board_train$month = factor(board_train$month)
        board_train$service_kind = factor(board_train$service_kind)
        board_train$hour = factor(board_train$hour)

        board_test$month = factor(board_test$month)
        board_test$service_kind = factor(board_test$service_kind)
        board_test$hour = factor(board_test$hour)
    }
    else{
        file_path_train = paste(path, 'train_data.csv', sep = '/')
        file_path_test = paste(path, 'test_data.csv', sep = '/')
        
        board_train = read_csv(file_path_train)
        board_test = read_csv(file_path_test)
        
        board_train$month = factor(board_train$month)
        board_train$service_kind = factor(board_train$service_kind)
        board_train$hour = factor(board_train$hour)

        board_test$month = factor(board_test$month)
        board_test$service_kind = factor(board_test$service_kind)
        board_test$hour = factor(board_test$hour)
    }
    
    train_month_levels = length(levels(board_train$month))
    train_service_kind_levels = length(levels(board_train$service_kind))
    train_hour_levels = length(levels(board_train$hour))    
    
    board_test = board_test %>%
        filter(hour %in% intersect(unique(board_test$hour), unique(board_train$hour)))
    
    if(train_month_levels > 1){
        if(train_service_kind_levels > 1){
            if(train_hour_levels > 1){
                board_train = board_train
                board_test = board_test 
            }
            else{
                board_train = board_train[, -which(names(board_train) == 'hour')]
                board_test = board_test[, -which(names(board_test) == 'hour')]
            }
        }
        else{
            if(train_hour_levels > 1){
                
                board_train = board_train[, -which(names(board_train) == 'service_kind')]
                board_test = board_test[, -which(names(board_test) == 'service_kind')]
            }
            else{
                board_train = board_train[, -which(names(board_train) == c('service_kind', 'hour'))]
                board_test = board_test[, -which(names(board_test) == c('service_kind', 'hour'))]
            }
        }
    }
    else{
        if(train_service_kind_levels > 1){
            if(train_hour_levels > 1){
                board_train = board_train[, -which(names(board_train) == 'month')]
                board_test = board_test[, -which(names(board_test) == 'month')]
            }
            else{
                board_train = board_train[, -which(names(board_train) %in% c('month', 'hour'))]
                board_test = board_test[, -which(names(board_test) %in% c('month', 'hour'))]
            }
        }
        else{
            if(train_hour_levels > 1){
                board_train = board_train[, -which(names(board_train) == c('month', 'service_kind'))]
                board_test = board_test[, -which(names(board_test) == c('month', 'service_kind'))]
            }
            else{
                board_train = board_train[, -which(names(board_train) == c('month', 'service_kind', 'hour'))]
                board_test = board_test[, -which(names(board_test) == c('month', 'service_kind', 'hour'))]
            }
        }
    }
    
    board_train = remove_empty(board_train, which = c('cols'), quiet = TRUE)
    board_test = remove_empty(board_test, which = c('cols'), quiet = TRUE)
    
    train_board_counts = unique(board_train$board_count)
    test_board_counts = unique(board_test$board_count)
    
    n_row_train = nrow(board_train)
    
    if(n_row_train < 60){
        
        return('Insufficient data for analysis!')
    }
    else if(n_row_train >= 60){
        
        if(length(train_board_counts) > 1){
            y_clf_train = board_train$board_count
            y_clf_train = factor(if_else(y_clf_train == 0, 0, 1))
            
            y_clf_test = board_test$board_count
            y_clf_test = factor(if_else(y_clf_test == 0, 0, 1))
            
            Board_train_clf <- data.frame(cbind(y_clf_train, board_train[, -c(1)]))
            Board_test_clf <- data.frame(cbind(y_clf_test, board_test[, -c(1)]))
            
            #---------------------------------------------------------------------------------
            # Training characteristics for model tuning:
            #---------------------------------------------------------------------------------
            
            control <- trainControl(method = 'repeatedcv', 
                                    number = 5, 
                                    repeats = 2,
                                    search = 'random')
        
            #---------------------------------------------------------------------------------
            # Classification using Random Ferns:
            #---------------------------------------------------------------------------------
        
            set.seed(1)
            rf_random <- train(y_clf_train ~ .,
                               data = Board_train_clf,
                               method = 'rFerns',
                               metric = 'Accuracy',
                               tuneLength  = 20, 
                               trControl = control)
            
            RF_Ferns <- print(rf_random)
            
            rf_random_pred <- predict(rf_random, newdata = Board_test_clf)
            
            rf_random_conf_mat <- confusionMatrix(y_clf_test, rf_random_pred)
            rf_random_conf_mat <- data.frame(rf_random_conf_mat[4])
            colnames(rf_random_conf_mat) <- c('Value')
            
            # Index for regression data:
            
            index_for_reg <- which(rf_random_pred == '1', arr.ind = T)
            #---------------------------------------------------------------------------------
            # Regression Model using Ranger:
            
                        
            set.seed(1)
            rf_reg_ranger <- train(board_count ~ .,
                                   data = (board_train %>% filter(board_count > 0)),
                                   method = 'ranger',
                                   metric = 'RMSE',
                                   tuneLength  = 20, 
                                   trControl = control)
            
            mtry <- rf_reg_ranger$bestTune$mtry
            SplitRule <- rf_reg_ranger$bestTune$splitrule
            min_node_size <- rf_reg_ranger$bestTune$min.node.size
            rf_data_reg <- board_train %>% filter(board_count > 0)
            
            rf_quant_reg_ranger <- ranger(board_count ~ .,
                                          quantreg = TRUE,
                                          data = rf_data_reg,
                                          #mtry = mtry) #,
                                          splitrule = SplitRule,
                                          min.node.size = min_node_size)
                        
            RF_Ranger <- print(rf_reg_ranger)
            RF_Quant_Ranger <- print(rf_quant_reg_ranger)
            
                
            #----------------------------------------------------------------------------------
            # Validation:
            
            Board_Test_Val = board_test
            nrow_test = (1:nrow(board_test))
            
            Board_Test_Val$index = nrow_test
            Board_test_reg = Board_Test_Val[index_for_reg, ]
            
            rf_reg_ranger_pred <- predict(rf_reg_ranger, newdata = Board_test_reg)
            rf_reg_ranger_CI_pred <- predict(rf_quant_reg_ranger, Board_test_reg, 
                                             type = 'quantiles', quantiles = c(0.025, 0.95))
            rf_reg_ranger_CI_pred <- data.frame(rf_reg_ranger_CI_pred$predictions)
            colnames(rf_reg_ranger_CI_pred) <- c('lower', 'upper')
            
            Board_test_reg$Ranger_Pred = rf_reg_ranger_pred
            Board_test_reg$Ranger_Pred_Lower_Bound = rf_reg_ranger_CI_pred$lower
            Board_test_reg$Ranger_Pred_Upper_Bound = rf_reg_ranger_CI_pred$upper
            
            Board_Test_Val = left_join(Board_Test_Val, Board_test_reg, by = 'index')
            
            Board_Test_Val = Board_Test_Val %>%
                mutate(RF_Pred = if_else(is.na(Ranger_Pred) == T, 0, Ranger_Pred))
            
            board_test$RF_Pred = Board_Test_Val$RF_Pred
            board_test$RF_Lower_Bound = if_else(is.na(Board_Test_Val$Ranger_Pred_Lower_Bound) == T, 0, Board_Test_Val$Ranger_Pred_Lower_Bound)
            board_test$RF_Upper_Bound = if_else(is.na(Board_Test_Val$Ranger_Pred_Upper_Bound) == T, 0, Board_Test_Val$Ranger_Pred_Upper_Bound)
            
            #return(board_test)
                
            RF_test_RMSE = sqrt(mean((board_test$board_count - board_test$RF_Pred)^{2}))
            
            if(part == 'pre'){
                file_path_clf = paste(path, 'pre_lock_RF_Fern.txt', sep = '/')
                file_path_clf_conf_mat = paste(path, 'pre_Conf_Mat_RF_Fern.csv', sep = '/')
                
                file_path_reg = paste(path, 'pre_lock_RF_Reg.txt', sep = '/')
                file_path_RF_Chart = paste(path, 'pre_RF_Chart.csv', sep = '/')
                
                final_clf_model = paste(path, 'Pre_Random_Ferns_model.rds')
                final_reg_model = paste(path, 'Pre_Random_Forest_RANGER_model.rds')
                
                write.table(RF_Ferns, file_path_clf)
                write.csv(rf_random_conf_mat, file_path_clf_conf_mat)
                
                write.table(RF_Ranger, file_path_reg)
                write.csv(board_test, file_path_RF_Chart)
                
                saveRDS(rf_random, final_clf_model)
                saveRDS(rf_reg_ranger, final_reg_model)
                
            }
            else if(part == 'post'){
                
                file_path_clf = paste(path, 'post_lock_RF_Fern.txt', sep = '/')
                file_path_clf_conf_mat = paste(path, 'post_Conf_Mat_RF_Fern.csv', sep = '/')
                
                file_path_reg = paste(path, 'post_lock_RF_Reg.txt', sep = '/')
                file_path_RF_Chart = paste(path, 'post_RF_Chart.csv', sep = '/')
                
                final_clf_model = paste(path, 'Post_Random_Ferns_model.rds')
                final_reg_model = paste(path, 'Post_Random_Forest_RANGER_model.rds')
                
                write.table(RF_Ferns, file_path_clf)
                write.csv(rf_random_conf_mat, file_path_clf_conf_mat)
                
                write.table(RF_Ranger, file_path_reg)
                write.csv(board_test, file_path_RF_Chart)
                
                saveRDS(rf_random, final_clf_model)
                saveRDS(rf_reg_ranger, final_reg_model)
            }
            else{
                file_path_clf = paste(path, 'RF_Fern.txt', sep = '/')
                file_path_clf_conf_mat = paste(path, 'Conf_Mat_RF_Fern.csv', sep = '/')
                
                file_path_reg = paste(path, 'RF_Reg.txt', sep = '/')
                file_path_RF_Chart = paste(path, 'pre_RF_Chart.csv', sep = '/')
                
                final_clf_model = paste(path, 'Random_Ferns_model.rds')
                final_reg_model = paste(path, 'Random_Forest_RANGER_model.rds')
                
                write.table(RF_Ferns, file_path_clf)
                write.csv(rf_random_conf_mat, file_path_clf_conf_mat)
                
                write.table(RF_Ranger, file_path_reg)
                write.csv(board_test, file_path_RF_Chart)
                
                saveRDS(rf_random, final_clf_model)
                saveRDS(rf_reg_ranger, final_reg_model)
            }
            
            return('Done!')
            
        }
        else{
        
        return('This bus stop does not have variability in the response variable.')
            
        }    
    }    
}

## Vanilla Random Forest

The function (`vanilla_rf()`) runs Random Forest regression model considering the following inputs:

* `route_id == rt`.
* `direction_id == di`.
* `stop_id == st`.
* `part == part` (Pre or port-lockdown dataset).
    
 *Note: The input paramters defined the file paths to run the models. If the data structure is organized in a different way, the input parameters or the paths must be change.*
    
Then, it returns a Random Forest (Ranger) model saved as `*.rds`. Also, a chart with the test data with the model predictions saved as `*.csv`, and a performance file `*.csv` with both the train and test RMSEs.

In [3]:
vanilla_rf <- function(rt, di, st, part){
    
    library(randomForest)
    library(mlbench)
    library(caret)
    library(e1071)
    library(dplyr)
    library(tidyr)
    library(readr)
    library(ranger)
    library(janitor)
    library(rFerns)
    
    
    path = paste0('data', '/', 'jmartinez', '/', 'Data_for_RF_Models', '/', 'Board_Counts', '/',
                  paste('route', rt, sep = '_'), '/', paste('direction', di, sep = ''), '/',
                  paste('bus_stop', st, sep = '_'), '/')
    
    if(part == 'pre'){
        file_path_train = paste(path, 'pre_lock_train_data.csv', sep = '/')
        file_path_test = paste(path, 'pre_lock_test_data.csv', sep = '/')
        
        board_train = read_csv(file_path_train)
        board_test = read_csv(file_path_test)
        
        board_train$month = factor(board_train$month)
        board_train$service_kind = factor(board_train$service_kind)
        board_train$hour = factor(board_train$hour)

        board_test$month = factor(board_test$month)
        board_test$service_kind = factor(board_test$service_kind)
        board_test$hour = factor(board_test$hour)
    }
    else if(part == 'post'){
        
        file_path_train = paste(path, 'post_lock_train_data.csv', sep = '/')
        file_path_test = paste(path, 'post_lock_test_data.csv', sep = '/')
        
        board_train = read_csv(file_path_train)
        board_test = read_csv(file_path_test)
        
        board_train$month = factor(board_train$month)
        board_train$service_kind = factor(board_train$service_kind)
        board_train$hour = factor(board_train$hour)

        board_test$month = factor(board_test$month)
        board_test$service_kind = factor(board_test$service_kind)
        board_test$hour = factor(board_test$hour)
    }
    else{
        file_path_train = paste(path, 'train_data.csv', sep = '/')
        file_path_test = paste(path, 'test_data.csv', sep = '/')
        
        board_train = read_csv(file_path_train)
        board_test = read_csv(file_path_test)
        
        board_train$month = factor(board_train$month)
        board_train$service_kind = factor(board_train$service_kind)
        board_train$hour = factor(board_train$hour)

        board_test$month = factor(board_test$month)
        board_test$service_kind = factor(board_test$service_kind)
        board_test$hour = factor(board_test$hour)
    }
    
    train_month_levels = length(levels(board_train$month))
    train_service_kind_levels = length(levels(board_train$service_kind))
    train_hour_levels = length(levels(board_train$hour))    
    
    board_test = board_test %>%
        filter(hour %in% intersect(unique(board_test$hour), unique(board_train$hour)))
    
    if(train_month_levels > 1){
        if(train_service_kind_levels > 1){
            if(train_hour_levels > 1){
                board_train = board_train
                board_test = board_test 
            }
            else{
                board_train = board_train[, -which(names(board_train) == 'hour')]
                board_test = board_test[, -which(names(board_test) == 'hour')]
            }
        }
        else{
            if(train_hour_levels > 1){
                
                board_train = board_train[, -which(names(board_train) == 'service_kind')]
                board_test = board_test[, -which(names(board_test) == 'service_kind')]
            }
            else{
                board_train = board_train[, -which(names(board_train) == c('service_kind', 'hour'))]
                board_test = board_test[, -which(names(board_test) == c('service_kind', 'hour'))]
            }
        }
    }
    else{
        if(train_service_kind_levels > 1){
            if(train_hour_levels > 1){
                board_train = board_train[, -which(names(board_train) == 'month')]
                board_test = board_test[, -which(names(board_test) == 'month')]
            }
            else{
                board_train = board_train[, -which(names(board_train) %in% c('month', 'hour'))]
                board_test = board_test[, -which(names(board_test) %in% c('month', 'hour'))]
            }
        }
        else{
            if(train_hour_levels > 1){
                board_train = board_train[, -which(names(board_train) == c('month', 'service_kind'))]
                board_test = board_test[, -which(names(board_test) == c('month', 'service_kind'))]
            }
            else{
                board_train = board_train[, -which(names(board_train) == c('month', 'service_kind', 'hour'))]
                board_test = board_test[, -which(names(board_test) == c('month', 'service_kind', 'hour'))]
            }
        }
    }
    
    board_train = remove_empty(board_train, which = c('cols'), quiet = TRUE)
    board_test = remove_empty(board_test, which = c('cols'), quiet = TRUE)
    
    train_board_counts = unique(board_train$board_count)
    test_board_counts = unique(board_test$board_count)
    
    n_row_train = nrow(board_train)
    
    if(n_row_train < 60){
        
        return('Insufficient data for analysis!')
    }
    else if(n_row_train >= 60){
        
        #---------------------------------------------------------------------------------
        # Training characteristics for model tuning:
        #---------------------------------------------------------------------------------
            
        control <- trainControl(method = 'repeatedcv', 
                                number = 5, 
                                repeats = 2,
                                search = 'random')
        
        #---------------------------------------------------------------------------------
        # Regression using RAN:
        #---------------------------------------------------------------------------------         
                        
        set.seed(1)
        rf <- train(board_count ~ .,
                    data = board_train,
                    method = 'ranger',
                    metric = 'RMSE',
                    tuneLength  = 20, 
                    trControl = control)
        
        mtry <- rf$bestTune$mtry
        SplitRule <- rf$bestTune$splitrule
        min_node_size <- rf$bestTune$min.node.size
        
        rf_quant_reg_ranger <- ranger(board_count ~ .,
                                      quantreg = TRUE,
                                      data = board_train,
        #                              mtry = rf$bestTune$mtry,
                                      splitrule = SplitRule,
                                      min.node.size = min_node_size)
        
        RF_Vanilla <- print(mtry)
        
        #----------------------------------------------------------------------------------
        # Validation:
                     
        rf_vanilla_pred <- predict(rf, newdata = board_test)
        board_test$RF_Vanilla_Pred = rf_vanilla_pred
        
        rf_vanilla_CI_pred <- predict(rf_quant_reg_ranger, board_test, 
                                      type = 'quantiles', quantiles = c(0.025, 0.975))
        
        #return(rf_vanilla_CI_pred)
        rf_vanilla_CI_pred <- data.frame(rf_vanilla_CI_pred$predictions)
        colnames(rf_vanilla_CI_pred) <- c('lower', 'upper')
            
        board_test$RF_Vanilla_Pred_Lower_Bound = rf_vanilla_CI_pred$lower
        board_test$RF_Vanilla_Pred_Upper_Bound = rf_vanilla_CI_pred$upper
        
        #return(board_test)
        
        RF_train_RMSE = rf$results$RMSE[1]
        RF_test_RMSE = sqrt(mean((board_test$board_count - board_test$RF_Vanilla_Pred)^{2}))
        
        Performance = data.frame('Train' = RF_train_RMSE, 'Test'= RF_test_RMSE)
            
        if(part == 'pre'){
                            
            file_path_reg = paste(path, 'pre_lock_RF_Vanilla.txt', sep = '/')
            file_path_RF_Chart = paste(path, 'pre_RF_Vanilla_Chart.csv', sep = '/')
            file_Performance_name = paste(path, 'pre_RF_Vanilla_Performance.csv', sep = '/')
                
            final_vanilla_model = paste(path, 'Pre_RF_Vanilla_model.rds')
            
            write.csv(Performance, file_Performance_name)
            write.table(RF_Vanilla, file_path_reg)
            write.csv(board_test, file_path_RF_Chart)
                
            saveRDS(rf, final_vanilla_model)
            
                
        }
        else if(part == 'post'){
                          
            file_path_reg = paste(path, 'post_lock_RF_Vanilla.txt', sep = '/')
            file_path_RF_Chart = paste(path, 'post_RF_Vanilla_Chart.csv', sep = '/')
            file_Performance_name = paste(path, 'post_RF_Vanilla_Performance.csv', sep = '/')
                
            final_vanilla_model = paste(path, 'Post_RF_Vanilla_model.rds')
                
            write.csv(Performance, file_Performance_name)
            write.table(RF_Vanilla, file_path_reg)
            write.csv(board_test, file_path_RF_Chart)
                
            saveRDS(rf, final_vanilla_model)
        }
        else{
            
            file_path_reg = paste(path, 'RF_Vanilla.txt', sep = '/')
            file_path_RF_Chart = paste(path, 'RF_Vanilla_Chart.csv', sep = '/')
            write.csv(Performance, file_Performance_name)
                
            final_vanilla_model = paste(path, 'RF_Vanilla_model.rds')
            
            write.csv(Performance, file_Performance_name)
            write.table(RF_Vanilla, file_path_reg)
            write.csv(board_test, file_path_RF_Chart)
                
            saveRDS(rf, final_vanilla_model)
        }
            
        return('Done!')
            
    }
    else{
        
    return('This bus stop does not have variability in the response variable.')
            
    }    
    
}

### Example 1: 

We can assess the function using the data from `route_id == '1`, `direction_id == '0'`, `stop_id == 287`, and pre-lockown data.

In [113]:
RF_Ferns_and_Ranger('1', '0', '12', 'post')

[1m[1mRows: [1m[22m[34m[34m2301[34m[39m [1m[1mColumns: [1m[22m[34m[34m8[34m[39m

[36m──[39m [1m[1mColumn specification[1m[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (1): service_kind
[32mdbl[39m (7): month, hour, board_count, mean_temp, mean_precip, month_average_boa...


[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.

[1m[1mRows: [1m[22m[34m[34m574[34m[39m [1m[1mColumns: [1m[22m[34m[34m8[34m[39m

[36m──[39m [1m[1mColumn specification[1m[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (1): service_kind
[32mdbl[39m (7): month, hour, board_count, mean_temp, mean_precip, month_average_boa...


[36mℹ[39m Use [30

Random Ferns 

2301 samples
   7 predictor
   2 classes: '0', '1' 

No pre-processing
Resampling: Cross-Validated (5 fold, repeated 2 times) 
Summary of sample sizes: 1841, 1841, 1842, 1840, 1840, 1841, ... 
Resampling results across tuning parameters:

  depth  Accuracy   Kappa    
   1     0.9917533  0.9822207
   2     0.9989107  0.9975982
   3     0.9995647  0.9990340
   5     0.9980392  0.9957028
   6     1.0000000  1.0000000
   7     1.0000000  1.0000000
   9     0.9993492  0.9985621
  10     1.0000000  1.0000000
  11     0.9995652  0.9990360
  13     1.0000000  1.0000000
  14     1.0000000  1.0000000
  15     1.0000000  1.0000000

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was depth = 6.
Random Forest 

1514 samples
   7 predictor

No pre-processing
Resampling: Cross-Validated (5 fold, repeated 2 times) 
Summary of sample sizes: 1211, 1212, 1211, 1211, 1211, 1211, ... 
Resampling results across tuning parameters:

  m

### Example 2:

Similarly, we can assess the function using the data from `route_id == '1`, `direction_id == '0'`, `stop_id == 287`, and pre-lockown data.

In [109]:
vanilla_rf('1', '0', '12', 'post')

[1m[1mRows: [1m[22m[34m[34m2301[34m[39m [1m[1mColumns: [1m[22m[34m[34m8[34m[39m

[36m──[39m [1m[1mColumn specification[1m[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (1): service_kind
[32mdbl[39m (7): month, hour, board_count, mean_temp, mean_precip, month_average_boa...


[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.

[1m[1mRows: [1m[22m[34m[34m574[34m[39m [1m[1mColumns: [1m[22m[34m[34m8[34m[39m

[36m──[39m [1m[1mColumn specification[1m[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (1): service_kind
[32mdbl[39m (7): month, hour, board_count, mean_temp, mean_precip, month_average_boa...


[36mℹ[39m Use [30

[1] 25
