What does the function do?
- Splits data into 80-20%
- Does LOOCV on training set to pick the best hyperparameter model: in case of the stepwise, it is the number of predictors that is allowed in the final model (nvmax: data.frame(nvmax = 1:46)) in case of the ridge and lasso regressions is the lambda (lambda : lambda = 10^seq(-3, 3, length = 100)
- Predicts the result based on the best model chosen on the training set (the lowest RMSE value)
INPUT
- my_df - is the data
- split - the proportion of the testing/training set (e.g. .80)
- type_regres
- "leapForward" - forward stepwise
- "leapBackward" - backward stepwise
- "ridge" - ridge regression (makes the coefficient close to zero but never 0)
- "lasso" - lasso regression (makes the coefficients zero of some predictors)
- seed - a random number: different seed values result in different data splits
OUTPUT
- data table with RMSE, R, MSE values
- coefficient values of the best model that made the prediction on the testing set
- the best model formula
- the plot of the important predictors
- the regression plot of prediction on the testing set
What does the function do?
- Does perform_regression function
- Repeates it N times with different splits by varying the seed
INPUT
- mydtt - is the data
- Niteration - how many times you want to permute
- data_split - the proportion of the testing/training set (e.g. .80)
- type_regres
- "leapForward" - forward stepwise
- "leapBackward" - backward stepwise
- "ridge" - ridge regression (makes the coefficient close to zero but never 0)
- "lasso" - lasso regression (makes the coefficients zero of some predictors)
OUTPUT
- data table with RMSE, R, MSE values
- coefficient values of the best model that made the prediction on the testing set (there will be best model per each permutation)
- the array of seeds used to split data