<h1 style="text-align: center;">High Dimensional Modelling Activities</h1>

<p style="text-align: center;">July, 2024</p>

## High Dimensional Modelling Activity 1 - Principal Component Analysis

Perform PCA on the `Sonar` dataset:

1. Load the `mlbench` and `factoextra` packages and load the `Sonar` dataset.

2. Select only the numerical sonar measurement variables (i.e. remove the classification variable, `Class`).

3. Use the `princomp` R function to perform PCA.

4. Plot the scree plot and a biplot of the first two PCs.
    * Hint: See the `fviz_eig` and `fviz_pca_biplot` functions

5. **BONUS:** Conduct a logistic regression (`glm` R function) using the original classification variable as the outcome and the first two PCs as variables.

**NOTE:** The variables need to be standardized before Step 3. There are two ways to do this (hints):

1. Check `scale` function in R

2. Look into the arguments of `princomp`

## High Dimensional Modelling Activity 2 - Feature Selection

Perform linear regression modeling on the `Ozone` dataset:

1. Load the `Ozone` dataset from the `mlbench` R package.

2. Remove the first three variables, the outcome that you will regress on is the fourth (`V4` = Daily maximum one-hour-average ozone reading).

3. Remove all of the observations with missing values using `na.omit`.

4. Build the linear regression model using `lm` once but including no predictors (use formula `V4~1`).

5. Use the `step` function to perform stepwise selection
  * NOTE: You must specify the `scope` argument, you may use: `formula(lm(V4~., data=Ozone_Clean))`

6. What were the predictors that were selected? What was the final AIC value?

## High Dimensional Modelling Activity 3 - Regularized Regression (BONUS/CHALLENGE)

Perform regularized regression modelling on the Ozone dataset:

1. Repeat the previous cleaning steps for the data like in Activity 2 (load `Ozone` dataset, filter out first three variables, remove missing data, regression will be on `V4` outcome). This time, scale the predictors and outcome as well using the `scale` function

2. Load the `glmnet` R package

3. Run a Ridge Regression and a LASSO using `glmnet` function (hint: use `alpha=0` or `alpha=1` to specify the former or the latter).

4. Plot the regularization paths of both models using `plot` on the model objects (specify `xvar="lambda"`).

5. Run a Ridge Regression and a LASSO again using `cv.glmnet`, so that cross-validation selects the optimal lambda values.

6. Extract the coefficients for the lambda with minimum CV error by using `coef` function (hint: use the `s` argument in the function specify the lambda to use).

7. Plot the results of the cross-validation using `plot` on the `cv.glmnet` objects.


# High Dimensional Modelling Activity Solutions

## High Dimensional Modelling Activity 1 Solution

<details>
  <summary>Click here for Solution Part 1</summary>
  
  `data(Sonar)`

  `PCA_Sonar = princomp(Sonar[, 1:60], cor = TRUE)`

  `fviz_eig(PCA_Sonar)`

  `fviz_pca_biplot(PCA_Sonar, label = "var", repel = TRUE, col.var = "red")`

  `Sonar_Scores = PCA_Sonar$scores[, 1:2]`
  
  `Sonar_PCA_Data = data.frame(Sonar_Scores, Sonar[, "Class"])`
  
  `names(Sonar_PCA_Data)[3] = "Class"`

  `GLM_Sonar = glm(Class ~ Comp.1 + Comp.2, family = binomial, data = Sonar_PCA_Data)`
  
  `summary(GLM_Sonar)`
</details>



## High Dimensional Modelling Activity 2 Solution



<details>
  <summary>Click here for Solution Part 2</summary>
  
  `data(Ozone)`

  `Ozone_Clean = na.omit(Ozone[, 4:13])`

  `LM1 = lm(V4 ~ 1, data = Ozone_Clean)`

  `step(LM1, scope = formula(lm(V4 ~ ., data = Ozone_Clean)))`
</details>



## High Dimensional Modelling Activity 3 Solution



<details>
  <summary>Click here for Solution Part 3</summary>
  
  `data(Ozone)`

  `Ozone_Clean = as.data.frame(na.omit(scale(Ozone[, 4:13])))`

  `Ozone_Ridge = glmnet(Ozone_Clean[, 2:10], Ozone_Clean$V4, alpha = 0)`
  
  `Ozone_LASSO = glmnet(Ozone_Clean[, 2:10], Ozone_Clean$V4, alpha = 1)`

  `plot(Ozone_Ridge, xvar = "lambda")`
  
  `plot(Ozone_LASSO, xvar = "lambda")`

  `Ozone_Ridge_CV = cv.glmnet(as.matrix(Ozone_Clean[, 2:10]), Ozone_Clean$V4, alpha = 0)`
  
  `Ozone_LASSO_CV = cv.glmnet(as.matrix(Ozone_Clean[, 2:10]), Ozone_Clean$V4, alpha = 1)`

  `coef(Ozone_Ridge_CV, s = Ozone_Ridge_CV$lambda.min)`
  
  `coef(Ozone_LASSO_CV, s = Ozone_LASSO_CV$lambda.min)`

  `plot(Ozone_Ridge_CV)`
  
  `plot(Ozone_LASSO_CV)`
</details>
