R_prediction_models

This repository includes scripts for running 4 algorithms to build prediction models in R: logistic regression, regularized regression, and both classification (binary) and regression random forest.

The scripts assume the class and feature data have been pre-processed and are saved as RDS files.
The model-building is done with a valdiation set that is held out for each model. The validation set is used to report model performance.
The scripts make use of a bagging approach where multiple models are developed while selecting a random subset of training instances. The final prediction score for a validation instance is reported as the mean of model repetitions.
Each script performs a parameter sweep to identify optimal parameters using cross-validation within the training set. Regularized regression tunes lamba (beta penalization) and alpha (LASSO vs. ridge regression vs. elastic nets) parameters. Logistic and both random forest algorithms utilize LASSO as a feature selection tool prior to model-building. The lambda beta penalization for feature selection is tuned.

Each script takes the same 8 inputs:

Location of RDS file with class data matrix (scripts were developed for data with multiple response values per instance)
Colunm index of class data matrix to use: Integer
Location of RDS file with feature data matrix
Location of RDS file with vector of feature IDs to include in prediction model
Location of RDS file with vector of instance IDs to hold out as validation set
Proportion of training instances (i.e. non-validation) to include in prediction model: [0, 1]
# of repetitions of prediction models to build, each with a random subset of the training data: Integer
Prefix character string for output files

Example usage:

R --vanilla --slave --args CLASS_MATRIX COL_IND FEAT_MATRIX FEATURE_IDS VALIDATION_IDS TRAINING_PROP MODEL_REPETITIONS OUTPUT_PREFIX < 2_regularized_predictions.R

The scripts produce four output files:

[prefix].prediction.RDS: Object with prediction on the validation set
[prefix].featSelWeightsRDS: Object with feature selection weights (penalized betas)
[prefix].parameters.RDS: Object with the parameters utilized to train the model, for reporting purposes.
[prefix].trainTest.RDS: Object with training and testing instances for each repetition of model-building, for reporting purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
2_LogRegClass_predictions.R		2_LogRegClass_predictions.R
2_LogitRegulClassification_predictions.R		2_LogitRegulClassification_predictions.R
2_ranForClass_predictions.R		2_ranForClass_predictions.R
2_ranForReg_predictions.R		2_ranForReg_predictions.R
2_regularized_predictions.R		2_regularized_predictions.R
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R_prediction_models

About

Releases

Packages

Languages

johnplloyd/R_prediction_model_building

Folders and files

Latest commit

History

Repository files navigation

R_prediction_models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages