## Introduction
This notebook provides an interactive environment for training a predictive model to estimate patient length of stay (LOS). This workflow is particularly suited for healthcare professionals and researchers interested in ICU resource planning and patient flow management.

Users are only requested to change the code block indicated with 

In [None]:
### ‼️User Action Required

All the other blocks should work without interference.
Warnings and notices are preceded by ⚠️

### Package Installation
This code checks for the presence of required R packages and installs them if they are not already available.
These packages are essential for data preprocessing, model training, ensemble learning, and performance evaluation:
- `caret`: Core package for building and tuning predictive models.
- `caretEnsemble`: Allows combining multiple caret models into an ensemble for improved accuracy.
- `tidyverse`: Collection of packages for data manipulation, visualization, and general workflow.
- `MLmetrics`: Provides machine learning evaluation metrics (e.g., MAE, RMSE).
- `ranger`: Fast implementation of Random Forests, useful for training tree-based models efficiently.

⚠️ You only need to run this block once per session or when setting up a new environment.


In [1]:
if (!require("caret")) {install.packages("caret", dependencies = TRUE) ; library(caret)}
if (!require("caretEnsemble")) {install.packages("caretEnsemble", dependencies = TRUE) ; library(caretEnsemble)}
if (!require("tidyverse")) {install.packages("tidyverse") ; library(tidyverse)}
if (!require("MLmetrics")) {install.packages("MLmetrics") ; library(MLmetrics)}
if (!require("ranger")) {install.packages("ranger"); library(ranger) }
if(!require(DescTools)) {install.packages("DescTools"); library(DescTools) }
if(!require(mice)) {install.packages("mice"); library(mice) }

Carregando pacotes exigidos: caret

Carregando pacotes exigidos: ggplot2

"pacote 'ggplot2' foi compilado no R versão 4.4.2"
Carregando pacotes exigidos: lattice

Carregando pacotes exigidos: caretEnsemble

Carregando pacotes exigidos: tidyverse

── [1mAttaching core tidyverse packages[22m ──────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mpurrr    [39m 1.0.2     [32m✔[39m [34mtidyr    [39m 1.3.1
── [1mConflicts[22m ────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[31m✖[39m [34

pacote 'rootSolve' desempacotado com sucesso e somas MD5 verificadas
pacote 'lmom' desempacotado com sucesso e somas MD5 verificadas
pacote 'expm' desempacotado com sucesso e somas MD5 verificadas
pacote 'Exact' desempacotado com sucesso e somas MD5 verificadas
pacote 'gld' desempacotado com sucesso e somas MD5 verificadas
pacote 'DescTools' desempacotado com sucesso e somas MD5 verificadas

Os pacotes binários baixados estão em
	C:\Users\joana\AppData\Local\Temp\RtmpigiNqe\downloaded_packages


"pacote 'DescTools' foi compilado no R versão 4.4.3"

Anexando pacote: 'DescTools'


Os seguintes objetos são mascarados por 'package:MLmetrics':

    AUC, Gini, MAE, MAPE, MSE, RMSE


Os seguintes objetos são mascarados por 'package:caret':

    MAE, RMSE


Carregando pacotes exigidos: mice

"não há nenhum pacote chamado 'mice'"
Instalando pacote em 'C:/Users/joana/AppData/Local/R/win-library/4.4'
(como 'lib' não foi especificado)

instalando as dependências 'ucminf', 'ordinal', 'pan', 'jomo', 'glmnet', 'mitml' também





  Existe uma versão binária disponível, mas a versão de código-fonte é
  posterior:
       binary source needs_compilation
glmnet  4.1-8  4.1-9              TRUE

pacote 'ucminf' desempacotado com sucesso e somas MD5 verificadas
pacote 'ordinal' desempacotado com sucesso e somas MD5 verificadas
pacote 'pan' desempacotado com sucesso e somas MD5 verificadas
pacote 'jomo' desempacotado com sucesso e somas MD5 verificadas
pacote 'mitml' desempacotado com sucesso e somas MD5 verificadas
pacote 'mice' desempacotado com sucesso e somas MD5 verificadas

Os pacotes binários baixados estão em
	C:\Users\joana\AppData\Local\Temp\RtmpigiNqe\downloaded_packages


instalando o pacote de código-fonte 'glmnet'


"pacote 'mice' foi compilado no R versão 4.4.3"

Anexando pacote: 'mice'


O seguinte objeto é mascarado por 'package:stats':

    filter


Os seguintes objetos são mascarados por 'package:base':

    cbind, rbind




### Load and Validate User Dataset

1. Please change the `data_path` the **path to the data** you want to train the model on. It can be in `.csv` or `.RData` format. If it's an `R.Data` file, please include the object name in the `object_name` variable.

2. If you want to include your **own predictors**, please change the `predictors` variable to include your a dataframe with one column stating the names of your predictors.

If not, the the list of predictor variables used during the original model training is automatically loaded.

⚠️ If not using your own predictors, make sure your dataset includes all required predictors listed in predictors.csv, as well as the target variable UnitLengthStay_trunc.

In [12]:
### ‼️User Action Required

#data_path = "YOUR_PATH"
data_path = "C:\\Users\\joana\\Documentos\\SLOS\\SLOS retraining\\SampledData.RData"
object_name = "sampled_data"
predictors = "C:\\Users\\joana\\Documentos\\SLOS\\SLOS retraining\\predictors.csv"

In [13]:
if (grepl("\\.csv$", data_path, ignore.case = TRUE)) {
  user_data <- read.csv(data_path)
} else if (grepl("\\.RData$", data_path, ignore.case = TRUE)) {
  load(data_path)
  if (!exists(object_name)) {
    stop(paste("The .RData file does not contain an object named", object_name))
  }
  user_data <- get(object_name)
} else {
  stop("Unsupported file type. Please upload a .csv or .RData file.")
}

predictors <- read.csv(predictors)
predictors <- predictors[,2]  

if (!all(predictors %in% names(user_data))) {
  stop("Some required predictors are missing in your dataset.")
}

user_data <- user_data %>%
  select(all_of(predictors), UnitLengthStay_trunc)

In [65]:
unique(testing$AdmissionTypeName)

### Data Pre-processing
We remove zero and near-zero variance features, correlated predictors (for numeric and categorical features) and we impute missing data via the MICE algortihm

In [67]:
set.seed(998)
inTraining <- createDataPartition(user_data$UnitLengthStay,
                                  p = .8, list = FALSE)
training <- user_data[ inTraining,]
training_dummy <- training
testing  <- user_data[-inTraining,]
testing_dummy <- testing

#Identifying and Removing Zero and Near Zero variance features
nzv = nearZeroVar(training, saveMetrics = T, freqCut = 100/2)
nzv["Variaveis"] = row.names(nzv)
descritiva_nzv = nzv%>%
  filter(nzv==T)%>%
  select(Variaveis,freqRatio,percentUnique)
retirados_nzv = descritiva_nzv$Variaveis

training = training %>%
  select(.,-retirados_nzv)
testing = testing %>%
  select(.,-retirados_nzv)

# Identifying and Removing Correlated Predictors (for numeric features)
training_pre_numeric = training %>%
  select_if(., is.numeric)
training_pre_numeric$UnitLengthStay = NULL
descrCor <-  cor(training_pre_numeric, 
                 use="pairwise.complete.obs")

highlyCorDescr <- findCorrelation(descrCor, cutoff = .75)
retirados_cor = colnames(training_pre_numeric[,highlyCorDescr])
training_pre_numeric = 
  training_pre_numeric[,-highlyCorDescr]

testing_pre_numeric = testing %>%
  select_if(., is.numeric)
testing_pre_numeric$UnitLengthStay = NULL
testing_pre_numeric = 
  testing_pre_numeric[,-highlyCorDescr]


# Identifying and Removing Correlated Predictors (for categorical features)
training_pre_factor = training %>%
  select_if(., is.factor)
cramer_tab = PairApply(training_pre_factor,
                       CramerV, symmetric = TRUE)
cramer_tab[which(is.na(cramer_tab[,])==T)] = 0

highlyCorCateg <- findCorrelation(cramer_tab, cutoff = 0.5)
retirados_categ = colnames(training_pre_factor[,highlyCorCateg])
training_pre_factor = training_pre_factor %>%
  select(.,-retirados_categ)

testing_pre_factor = testing %>%
  select_if(., is.factor)
testing_pre_factor = testing_pre_factor %>%
  select(.,-retirados_categ)

training = cbind(training_pre_numeric,training_pre_factor, training$UnitLengthStay)
training$UnitLengthStay = training$`training$UnitLengthStay`
training$`training$UnitLengthStay` = NULL

testing = cbind(testing_pre_numeric,testing_pre_factor, testing$UnitLengthStay)
testing$UnitLengthStay = testing$`testing$UnitLengthStay`
testing$`testing$UnitLengthStay` = NULL


#MICE Imputation
training_imp = training
testing_imp = testing

  #training
set.seed(100)
predictormatrix = quickpred(training_imp,
                          include = c("UnitLengthStay"),
                          exclude = NULL,
                          mincor = 0.3)
imp_gen = mice(data = training_imp,
               predictorMatrix = predictormatrix,
               m=1,
               maxit = 5,
               diagnostics=TRUE)

imp_data = mice::complete(imp_gen,1)
training_imp = imp_data
summary(training_imp)
training_imp$UnitLengthStay_trunc <- training_dummy$UnitLengthStay_trunc
training <- training_imp


  #testing
set.seed(100)
predictormatrix = quickpred(testing_imp,
                            include = c("UnitLengthStay"),
                            exclude = NULL,
                            mincor = 0.3)
imp_gen_test = mice(data = testing_imp,
               predictorMatrix = predictormatrix,
               m=1,
               maxit = 5,
               diagnostics=TRUE)
imp_data_test = mice::complete(imp_gen_test,1)
testing_imp = imp_data_test
summary(testing_imp)
testing_imp$UnitLengthStay_trunc <- testing_dummy$UnitLengthStay_trunc
testing <- testing_imp


 iter imp variable
  1   1
  2   1
  3   1
  4   1
  5   1


 IsMechanicalVentilation IsVasopressors
 0:776                   0:764         
 1: 24                   1: 36         
                                       
                                       
                                       
                                       
                       AdmissionSourceName         AdmissionTypeName
 Cardiovascular intervention room: 18      Emergency surgery: 44    
 Emergency room                  :553      Medical          :676    
 Operating room                  : 81      Scheduled surgery: 79    
 Other                           : 37      Surgical         :  1    
 Other unit at your hospital     : 76                               
 Ward/Floor                      : 35                               
 IsNonInvasiveVentilation IsRespiratoryFailure IsDementia    ChfNyha    Gender 
 0:748                    0:764                0:734      Class23: 40   F:453  
 1: 52                    1: 36                1: 66      Class4 : 12   M:347


 iter imp variable
  1   1
  2   1
  3   1
  4   1
  5   1


 IsMechanicalVentilation IsVasopressors
 0:192                   0:192         
 1:  8                   1:  8         
                                       
                                       
                                       
                                       
                       AdmissionSourceName         AdmissionTypeName
 Cardiovascular intervention room:  5      Emergency surgery:  6    
 Emergency room                  :143      Medical          :169    
 Operating room                  : 17      Scheduled surgery: 25    
 Other                           :  6      Surgical         :  0    
 Other unit at your hospital     : 23                               
 Ward/Floor                      :  6                               
 IsNonInvasiveVentilation IsRespiratoryFailure IsDementia    ChfNyha    Gender 
 0:186                    0:192                0:183      Class23: 13   F: 94  
 1: 14                    1:  8                1: 17      Class4 :  3   M:106

### Model Traning
This section covers the full training pipeline, from splitting the data to building an ensemble model using caretStack.

**Steps**:
1. **Train-Test Split**

- A reproducible 80/20 split is created using createDataPartition().

2. **Cross-Validation Setup**

- `trainControl()` defines a 5-fold cross-validation strategy with progress output (verboseIter) and final model predictions retained (savePredictions = "final")

3. **Model Training with caretList**

- Two base models are trained using caretList(): Linear regression (lm) and Random Forest (ranger) with a tuning grid for mtry, splitrule, and min.node.size. These models are stored in model_list and saved as "user_model_list.RData".

4. **Model Stacking with caretStack**

- A stacked ensemble model is built from the base learners using caretStack(). A secondary Random Forest model (with its own tuning grid) is used to combine the predictions. The final stacked model is saved as "user_trained_SLOS_model.RData".

This ensures both the individual models and the stacked model can be reused or deployed later.



In [46]:
fitControl <- trainControl(
  method = "cv", 
  number = 5, 
  verboseIter = TRUE, 
  returnData = FALSE,
  trim = TRUE,
  savePredictions = "final"
)

In [47]:
model_list <- caretList(
  x = training[, -ncol(training)],
  y = training$UnitLengthStay_trunc,
  trControl = fitControl,
  metric = "RMSE",
  tuneList = list(
    lm = caretModelSpec(method = "lm"),
    rf = caretModelSpec(method = "ranger", tuneGrid = data.frame(
      .mtry = c(5:10),
      .splitrule = "variance",
      .min.node.size = 5
    ))
  )
)

save(model_list, file = "user_model_list.RData")

+ Fold1: intercept=TRUE 
- Fold1: intercept=TRUE 
+ Fold2: intercept=TRUE 
predictions failed for Fold2: intercept=TRUE Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  fator 'AdmissionTypeName' tem novos níveis Surgical
 


"predictions failed for Fold2: intercept=TRUE Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  fator 'AdmissionTypeName' tem novos níveis Surgical
"


- Fold2: intercept=TRUE 
+ Fold3: intercept=TRUE 
- Fold3: intercept=TRUE 
+ Fold4: intercept=TRUE 
- Fold4: intercept=TRUE 
+ Fold5: intercept=TRUE 
- Fold5: intercept=TRUE 


"There were missing values in resampled performance measures."


Aggregating results
Fitting final model on full training set
+ Fold1: mtry= 5, splitrule=variance, min.node.size=5 
- Fold1: mtry= 5, splitrule=variance, min.node.size=5 
+ Fold1: mtry= 6, splitrule=variance, min.node.size=5 
- Fold1: mtry= 6, splitrule=variance, min.node.size=5 
+ Fold1: mtry= 7, splitrule=variance, min.node.size=5 
- Fold1: mtry= 7, splitrule=variance, min.node.size=5 
+ Fold1: mtry= 8, splitrule=variance, min.node.size=5 
- Fold1: mtry= 8, splitrule=variance, min.node.size=5 
+ Fold1: mtry= 9, splitrule=variance, min.node.size=5 
- Fold1: mtry= 9, splitrule=variance, min.node.size=5 
+ Fold1: mtry=10, splitrule=variance, min.node.size=5 
- Fold1: mtry=10, splitrule=variance, min.node.size=5 
+ Fold2: mtry= 5, splitrule=variance, min.node.size=5 
- Fold2: mtry= 5, splitrule=variance, min.node.size=5 
+ Fold2: mtry= 6, splitrule=variance, min.node.size=5 
- Fold2: mtry= 6, splitrule=variance, min.node.size=5 
+ Fold2: mtry= 7, splitrule=variance, min.node.size=5 
- Fo

In [48]:
rfGrid <- expand.grid(
  mtry = 2,
  min.node.size = c(5,10,15,20),
  splitrule = c("variance", "extratrees", "maxstat")
)

stacked_model <- caretStack(
  model_list,
  trControl = fitControl,
  metric = "RMSE",
  method = "ranger",
  tuneGrid = rfGrid
)

save(stacked_model, file = "user_trained_SLOS_model.RData")

+ Fold1: mtry=2, min.node.size= 5, splitrule=variance 
model fit failed for Fold1: mtry=2, min.node.size= 5, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold1: mtry=2, min.node.size= 5, splitrule=variance Error : Missing data in columns: lm.
"


- Fold1: mtry=2, min.node.size= 5, splitrule=variance 
+ Fold1: mtry=2, min.node.size=10, splitrule=variance 
model fit failed for Fold1: mtry=2, min.node.size=10, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold1: mtry=2, min.node.size=10, splitrule=variance Error : Missing data in columns: lm.
"


- Fold1: mtry=2, min.node.size=10, splitrule=variance 
+ Fold1: mtry=2, min.node.size=15, splitrule=variance 
model fit failed for Fold1: mtry=2, min.node.size=15, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold1: mtry=2, min.node.size=15, splitrule=variance Error : Missing data in columns: lm.
"


- Fold1: mtry=2, min.node.size=15, splitrule=variance 
+ Fold1: mtry=2, min.node.size=20, splitrule=variance 
model fit failed for Fold1: mtry=2, min.node.size=20, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold1: mtry=2, min.node.size=20, splitrule=variance Error : Missing data in columns: lm.
"


- Fold1: mtry=2, min.node.size=20, splitrule=variance 
+ Fold1: mtry=2, min.node.size= 5, splitrule=extratrees 
model fit failed for Fold1: mtry=2, min.node.size= 5, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold1: mtry=2, min.node.size= 5, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold1: mtry=2, min.node.size= 5, splitrule=extratrees 
+ Fold1: mtry=2, min.node.size=10, splitrule=extratrees 
model fit failed for Fold1: mtry=2, min.node.size=10, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold1: mtry=2, min.node.size=10, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold1: mtry=2, min.node.size=10, splitrule=extratrees 
+ Fold1: mtry=2, min.node.size=15, splitrule=extratrees 
model fit failed for Fold1: mtry=2, min.node.size=15, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold1: mtry=2, min.node.size=15, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold1: mtry=2, min.node.size=15, splitrule=extratrees 
+ Fold1: mtry=2, min.node.size=20, splitrule=extratrees 
model fit failed for Fold1: mtry=2, min.node.size=20, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold1: mtry=2, min.node.size=20, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold1: mtry=2, min.node.size=20, splitrule=extratrees 
+ Fold1: mtry=2, min.node.size= 5, splitrule=maxstat 
model fit failed for Fold1: mtry=2, min.node.size= 5, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold1: mtry=2, min.node.size= 5, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold1: mtry=2, min.node.size= 5, splitrule=maxstat 
+ Fold1: mtry=2, min.node.size=10, splitrule=maxstat 
model fit failed for Fold1: mtry=2, min.node.size=10, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold1: mtry=2, min.node.size=10, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold1: mtry=2, min.node.size=10, splitrule=maxstat 
+ Fold1: mtry=2, min.node.size=15, splitrule=maxstat 
model fit failed for Fold1: mtry=2, min.node.size=15, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold1: mtry=2, min.node.size=15, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold1: mtry=2, min.node.size=15, splitrule=maxstat 
+ Fold1: mtry=2, min.node.size=20, splitrule=maxstat 
model fit failed for Fold1: mtry=2, min.node.size=20, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold1: mtry=2, min.node.size=20, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold1: mtry=2, min.node.size=20, splitrule=maxstat 
+ Fold2: mtry=2, min.node.size= 5, splitrule=variance 
model fit failed for Fold2: mtry=2, min.node.size= 5, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold2: mtry=2, min.node.size= 5, splitrule=variance Error : Missing data in columns: lm.
"


- Fold2: mtry=2, min.node.size= 5, splitrule=variance 
+ Fold2: mtry=2, min.node.size=10, splitrule=variance 
model fit failed for Fold2: mtry=2, min.node.size=10, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold2: mtry=2, min.node.size=10, splitrule=variance Error : Missing data in columns: lm.
"


- Fold2: mtry=2, min.node.size=10, splitrule=variance 
+ Fold2: mtry=2, min.node.size=15, splitrule=variance 
model fit failed for Fold2: mtry=2, min.node.size=15, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold2: mtry=2, min.node.size=15, splitrule=variance Error : Missing data in columns: lm.
"


- Fold2: mtry=2, min.node.size=15, splitrule=variance 
+ Fold2: mtry=2, min.node.size=20, splitrule=variance 
model fit failed for Fold2: mtry=2, min.node.size=20, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold2: mtry=2, min.node.size=20, splitrule=variance Error : Missing data in columns: lm.
"


- Fold2: mtry=2, min.node.size=20, splitrule=variance 
+ Fold2: mtry=2, min.node.size= 5, splitrule=extratrees 
model fit failed for Fold2: mtry=2, min.node.size= 5, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold2: mtry=2, min.node.size= 5, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold2: mtry=2, min.node.size= 5, splitrule=extratrees 
+ Fold2: mtry=2, min.node.size=10, splitrule=extratrees 
model fit failed for Fold2: mtry=2, min.node.size=10, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold2: mtry=2, min.node.size=10, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold2: mtry=2, min.node.size=10, splitrule=extratrees 
+ Fold2: mtry=2, min.node.size=15, splitrule=extratrees 
model fit failed for Fold2: mtry=2, min.node.size=15, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold2: mtry=2, min.node.size=15, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold2: mtry=2, min.node.size=15, splitrule=extratrees 
+ Fold2: mtry=2, min.node.size=20, splitrule=extratrees 
model fit failed for Fold2: mtry=2, min.node.size=20, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold2: mtry=2, min.node.size=20, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold2: mtry=2, min.node.size=20, splitrule=extratrees 
+ Fold2: mtry=2, min.node.size= 5, splitrule=maxstat 
model fit failed for Fold2: mtry=2, min.node.size= 5, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold2: mtry=2, min.node.size= 5, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold2: mtry=2, min.node.size= 5, splitrule=maxstat 
+ Fold2: mtry=2, min.node.size=10, splitrule=maxstat 
model fit failed for Fold2: mtry=2, min.node.size=10, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold2: mtry=2, min.node.size=10, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold2: mtry=2, min.node.size=10, splitrule=maxstat 
+ Fold2: mtry=2, min.node.size=15, splitrule=maxstat 
model fit failed for Fold2: mtry=2, min.node.size=15, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold2: mtry=2, min.node.size=15, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold2: mtry=2, min.node.size=15, splitrule=maxstat 
+ Fold2: mtry=2, min.node.size=20, splitrule=maxstat 
model fit failed for Fold2: mtry=2, min.node.size=20, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold2: mtry=2, min.node.size=20, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold2: mtry=2, min.node.size=20, splitrule=maxstat 
+ Fold3: mtry=2, min.node.size= 5, splitrule=variance 
model fit failed for Fold3: mtry=2, min.node.size= 5, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold3: mtry=2, min.node.size= 5, splitrule=variance Error : Missing data in columns: lm.
"


- Fold3: mtry=2, min.node.size= 5, splitrule=variance 
+ Fold3: mtry=2, min.node.size=10, splitrule=variance 
model fit failed for Fold3: mtry=2, min.node.size=10, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold3: mtry=2, min.node.size=10, splitrule=variance Error : Missing data in columns: lm.
"


- Fold3: mtry=2, min.node.size=10, splitrule=variance 
+ Fold3: mtry=2, min.node.size=15, splitrule=variance 
model fit failed for Fold3: mtry=2, min.node.size=15, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold3: mtry=2, min.node.size=15, splitrule=variance Error : Missing data in columns: lm.
"


- Fold3: mtry=2, min.node.size=15, splitrule=variance 
+ Fold3: mtry=2, min.node.size=20, splitrule=variance 
model fit failed for Fold3: mtry=2, min.node.size=20, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold3: mtry=2, min.node.size=20, splitrule=variance Error : Missing data in columns: lm.
"


- Fold3: mtry=2, min.node.size=20, splitrule=variance 
+ Fold3: mtry=2, min.node.size= 5, splitrule=extratrees 
model fit failed for Fold3: mtry=2, min.node.size= 5, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold3: mtry=2, min.node.size= 5, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold3: mtry=2, min.node.size= 5, splitrule=extratrees 
+ Fold3: mtry=2, min.node.size=10, splitrule=extratrees 
model fit failed for Fold3: mtry=2, min.node.size=10, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold3: mtry=2, min.node.size=10, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold3: mtry=2, min.node.size=10, splitrule=extratrees 
+ Fold3: mtry=2, min.node.size=15, splitrule=extratrees 
model fit failed for Fold3: mtry=2, min.node.size=15, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold3: mtry=2, min.node.size=15, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold3: mtry=2, min.node.size=15, splitrule=extratrees 
+ Fold3: mtry=2, min.node.size=20, splitrule=extratrees 
model fit failed for Fold3: mtry=2, min.node.size=20, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold3: mtry=2, min.node.size=20, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold3: mtry=2, min.node.size=20, splitrule=extratrees 
+ Fold3: mtry=2, min.node.size= 5, splitrule=maxstat 
model fit failed for Fold3: mtry=2, min.node.size= 5, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold3: mtry=2, min.node.size= 5, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold3: mtry=2, min.node.size= 5, splitrule=maxstat 
+ Fold3: mtry=2, min.node.size=10, splitrule=maxstat 
model fit failed for Fold3: mtry=2, min.node.size=10, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold3: mtry=2, min.node.size=10, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold3: mtry=2, min.node.size=10, splitrule=maxstat 
+ Fold3: mtry=2, min.node.size=15, splitrule=maxstat 
model fit failed for Fold3: mtry=2, min.node.size=15, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold3: mtry=2, min.node.size=15, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold3: mtry=2, min.node.size=15, splitrule=maxstat 
+ Fold3: mtry=2, min.node.size=20, splitrule=maxstat 
model fit failed for Fold3: mtry=2, min.node.size=20, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold3: mtry=2, min.node.size=20, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold3: mtry=2, min.node.size=20, splitrule=maxstat 
+ Fold4: mtry=2, min.node.size= 5, splitrule=variance 
model fit failed for Fold4: mtry=2, min.node.size= 5, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold4: mtry=2, min.node.size= 5, splitrule=variance Error : Missing data in columns: lm.
"


- Fold4: mtry=2, min.node.size= 5, splitrule=variance 
+ Fold4: mtry=2, min.node.size=10, splitrule=variance 
model fit failed for Fold4: mtry=2, min.node.size=10, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold4: mtry=2, min.node.size=10, splitrule=variance Error : Missing data in columns: lm.
"


- Fold4: mtry=2, min.node.size=10, splitrule=variance 
+ Fold4: mtry=2, min.node.size=15, splitrule=variance 
model fit failed for Fold4: mtry=2, min.node.size=15, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold4: mtry=2, min.node.size=15, splitrule=variance Error : Missing data in columns: lm.
"


- Fold4: mtry=2, min.node.size=15, splitrule=variance 
+ Fold4: mtry=2, min.node.size=20, splitrule=variance 
model fit failed for Fold4: mtry=2, min.node.size=20, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold4: mtry=2, min.node.size=20, splitrule=variance Error : Missing data in columns: lm.
"


- Fold4: mtry=2, min.node.size=20, splitrule=variance 
+ Fold4: mtry=2, min.node.size= 5, splitrule=extratrees 
model fit failed for Fold4: mtry=2, min.node.size= 5, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold4: mtry=2, min.node.size= 5, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold4: mtry=2, min.node.size= 5, splitrule=extratrees 
+ Fold4: mtry=2, min.node.size=10, splitrule=extratrees 
model fit failed for Fold4: mtry=2, min.node.size=10, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold4: mtry=2, min.node.size=10, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold4: mtry=2, min.node.size=10, splitrule=extratrees 
+ Fold4: mtry=2, min.node.size=15, splitrule=extratrees 
model fit failed for Fold4: mtry=2, min.node.size=15, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold4: mtry=2, min.node.size=15, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold4: mtry=2, min.node.size=15, splitrule=extratrees 
+ Fold4: mtry=2, min.node.size=20, splitrule=extratrees 
model fit failed for Fold4: mtry=2, min.node.size=20, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold4: mtry=2, min.node.size=20, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold4: mtry=2, min.node.size=20, splitrule=extratrees 
+ Fold4: mtry=2, min.node.size= 5, splitrule=maxstat 
model fit failed for Fold4: mtry=2, min.node.size= 5, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold4: mtry=2, min.node.size= 5, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold4: mtry=2, min.node.size= 5, splitrule=maxstat 
+ Fold4: mtry=2, min.node.size=10, splitrule=maxstat 
model fit failed for Fold4: mtry=2, min.node.size=10, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold4: mtry=2, min.node.size=10, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold4: mtry=2, min.node.size=10, splitrule=maxstat 
+ Fold4: mtry=2, min.node.size=15, splitrule=maxstat 
model fit failed for Fold4: mtry=2, min.node.size=15, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold4: mtry=2, min.node.size=15, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold4: mtry=2, min.node.size=15, splitrule=maxstat 
+ Fold4: mtry=2, min.node.size=20, splitrule=maxstat 
model fit failed for Fold4: mtry=2, min.node.size=20, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold4: mtry=2, min.node.size=20, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold4: mtry=2, min.node.size=20, splitrule=maxstat 
+ Fold5: mtry=2, min.node.size= 5, splitrule=variance 
model fit failed for Fold5: mtry=2, min.node.size= 5, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold5: mtry=2, min.node.size= 5, splitrule=variance Error : Missing data in columns: lm.
"


- Fold5: mtry=2, min.node.size= 5, splitrule=variance 
+ Fold5: mtry=2, min.node.size=10, splitrule=variance 
model fit failed for Fold5: mtry=2, min.node.size=10, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold5: mtry=2, min.node.size=10, splitrule=variance Error : Missing data in columns: lm.
"


- Fold5: mtry=2, min.node.size=10, splitrule=variance 
+ Fold5: mtry=2, min.node.size=15, splitrule=variance 
model fit failed for Fold5: mtry=2, min.node.size=15, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold5: mtry=2, min.node.size=15, splitrule=variance Error : Missing data in columns: lm.
"


- Fold5: mtry=2, min.node.size=15, splitrule=variance 
+ Fold5: mtry=2, min.node.size=20, splitrule=variance 
model fit failed for Fold5: mtry=2, min.node.size=20, splitrule=variance Error : Missing data in columns: lm.
 


"model fit failed for Fold5: mtry=2, min.node.size=20, splitrule=variance Error : Missing data in columns: lm.
"


- Fold5: mtry=2, min.node.size=20, splitrule=variance 
+ Fold5: mtry=2, min.node.size= 5, splitrule=extratrees 
model fit failed for Fold5: mtry=2, min.node.size= 5, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold5: mtry=2, min.node.size= 5, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold5: mtry=2, min.node.size= 5, splitrule=extratrees 
+ Fold5: mtry=2, min.node.size=10, splitrule=extratrees 
model fit failed for Fold5: mtry=2, min.node.size=10, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold5: mtry=2, min.node.size=10, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold5: mtry=2, min.node.size=10, splitrule=extratrees 
+ Fold5: mtry=2, min.node.size=15, splitrule=extratrees 
model fit failed for Fold5: mtry=2, min.node.size=15, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold5: mtry=2, min.node.size=15, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold5: mtry=2, min.node.size=15, splitrule=extratrees 
+ Fold5: mtry=2, min.node.size=20, splitrule=extratrees 
model fit failed for Fold5: mtry=2, min.node.size=20, splitrule=extratrees Error : Missing data in columns: lm.
 


"model fit failed for Fold5: mtry=2, min.node.size=20, splitrule=extratrees Error : Missing data in columns: lm.
"


- Fold5: mtry=2, min.node.size=20, splitrule=extratrees 
+ Fold5: mtry=2, min.node.size= 5, splitrule=maxstat 
model fit failed for Fold5: mtry=2, min.node.size= 5, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold5: mtry=2, min.node.size= 5, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold5: mtry=2, min.node.size= 5, splitrule=maxstat 
+ Fold5: mtry=2, min.node.size=10, splitrule=maxstat 
model fit failed for Fold5: mtry=2, min.node.size=10, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold5: mtry=2, min.node.size=10, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold5: mtry=2, min.node.size=10, splitrule=maxstat 
+ Fold5: mtry=2, min.node.size=15, splitrule=maxstat 
model fit failed for Fold5: mtry=2, min.node.size=15, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold5: mtry=2, min.node.size=15, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold5: mtry=2, min.node.size=15, splitrule=maxstat 
+ Fold5: mtry=2, min.node.size=20, splitrule=maxstat 
model fit failed for Fold5: mtry=2, min.node.size=20, splitrule=maxstat Error : Missing data in columns: lm.
 


"model fit failed for Fold5: mtry=2, min.node.size=20, splitrule=maxstat Error : Missing data in columns: lm.
"


- Fold5: mtry=2, min.node.size=20, splitrule=maxstat 


"There were missing values in resampled performance measures."


Aggregating results
Something is wrong; all the RMSE metric values are missing:
      RMSE        Rsquared        MAE     
 Min.   : NA   Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA   Max.   : NA  
 NA's   :12    NA's   :12    NA's   :12   


ERROR: Error: Stopping


In [54]:
train(
  x = training[, !(names(training) %in% c("UnitLengthStay_trunc"))],
  y = training$UnitLengthStay_trunc,
  method = "lm"
)


"predictions failed for Resample01: intercept=TRUE Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  fator 'AdmissionTypeName' tem novos níveis Surgical
"
"predictions failed for Resample02: intercept=TRUE Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  fator 'AdmissionTypeName' tem novos níveis Surgical
"
"predictions failed for Resample05: intercept=TRUE Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  fator 'AdmissionTypeName' tem novos níveis Surgical
"
"predictions failed for Resample10: intercept=TRUE Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  fator 'AdmissionTypeName' tem novos níveis Surgical
"
"predictions failed for Resample11: intercept=TRUE Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  fator 'AdmissionTypeName' tem novos níveis Surgic

Linear Regression 

800 samples
 18 predictor

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 800, 800, 800, 800, 800, 800, ... 
Resampling results:

  RMSE          Rsquared  MAE         
  5.126018e-15  1         3.453253e-15

Tuning parameter 'intercept' was held constant at a value of TRUE

### Model Prediction and Evaluation
This section handles making predictions with the trained stacked model and evaluating its performance using key metrics.

1. the predict(stacked_model, newdata = testing) call generates **predictions** for the test set using the stacked model.

2. We **evaluate** the trained model performance on three metrics:

- Root Mean Squared Error (RMSE): Measures the average magnitude of the prediction errors.

- Mean Absolute Error (MAE): Measures the average of the absolute errors, giving a sense of how far off the predictions are.

- R-squared (R2): Indicates the proportion of the variance in the dependent variable that is predictable from the independent variables.

These metrics are computed using the functions available in the MLMetrics package.

In [49]:
predictions <- predict(stacked_model, newdata = testing)

In [50]:
testing$UnitLengthStay_trunc

In [51]:
rmse <- RMSE(predictions$pred, testing$UnitLengthStay_trunc)
MAE <- MAE(predictions$pred, testing$UnitLengthStay_trunc)
R2 <- R2(predictions$pred, testing$UnitLengthStay_trunc)
cat("RMSE:", rmse, "\n")
cat("MAE:", MAE, "\n")
cat("R2:", R2, "\n")

RMSE: 0.06773893 
MAE: 0.02725363 
R2: 0.9997469 
