# Appendix

## Cross-Validation Forward Selection of Features

The idea is to *EXHAUSTIVELY* run through every combination of ${n \choose r}$ features, where $n$ is the total number of *continuous* features that we start with and $r$ varies from $1$ to $n$.  Then use *cross-validation* over 5 *k-folds* to select the model (built on training data) that produces the least RMSE and the difference between that RMSE vs. the RMSE computed on the testing data, **with Condition Number $\le 1000$**.  This basis is taken *directly* from statsmodels Github [source code](https://www.statsmodels.org/dev/_modules/statsmodels/regression/linear_model.html#RegressionResults.summary) for the OLS fit results `summary` method. ("statsmodels.regression.linear_model — statsmodels: model fit results summary", 2019)  

In this way, we minimize residuals and thereby select the most predictive model, based on the "best" (minimized $RMSE$) **non-colinear** feature-combination subset from the starting set of all features.

The procedure for this is summarized below in pseudo-code:<br><br>
<b>
&nbsp;&nbsp;&nbsp;set $RMSE_{best} := null$<br>
&nbsp;&nbsp;&nbsp;set $\Delta RMSE_{best} := null$<br>
&nbsp;&nbsp;&nbsp;set $feature\_subset_{best} := null$<br><br>
&nbsp;&nbsp;&nbsp;for $r := 1$ to $n$ (where $n := |\{starting features\}|$) {<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $feature\_subsets :=$ build each of $n\_features := {n \choose r}=\frac{n!}{r! \cdot (n-r)!}$ (from $n$ starting features)<br><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for each $feature\_subset$ in $\{feature\_subset: feature\_subset \in feature\_subsets\}$ {<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $kf :=$ build 5-kfolds based on $feature\_subset$<br><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for each $fold$ in $kf$ {<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;split data set into $partition_{test}$ and $partition_{train}$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $lin\_reg\_model :=$ build linear regression from $partition_{train}$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $target_{train\_predicted} :=$ compute predictions with $lin\_reg\_model$ from $partition_{train}$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $target_{test\_predicted} :=$ compute predictions with $lin\_reg\_model$ from $partition_{test}$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $RMSE_{train} :=$ compute Root Mean Squared Error between $target_{train\_actual}$ and $target_{train\_predicted}$&nbsp;&nbsp;&nbsp;(i.e. - RMSE of <i>residuals</i> of $partition_{train}$)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $RMSE_{test} :=$ compute Root Mean Squared Error between $target_{test\_actual}$ and $target_{test\_predicted}$&nbsp;&nbsp;&nbsp;(i.e. - RMSE of <i>residuals</i> of $partition_{test}$)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;append $(RMSE_{train}, RMSE_{test})$ to $scores\_list_{fold}$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $scores\_list_{fold, RMSE_{train}} :=$ extract all $RMSE_{train}$ from $scores\_list_{fold}$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $RMSE := \frac{\sum RMSE_{train}}{size(scores\_list_{fold, RMSE_{train}})}$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $scores\_list_{fold, RMSE_{test}} :=$ extract all $RMSE_{test}$ from $scores\_list_{fold}$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $\Delta RMSE := \frac{\sum |RMSE_{train} - RMSE_{train}|}{size(scores\_list_{fold, RMSE_{train}})}$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if $RMSE_{best}$ is null then {<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $RMSE_{best} := RMSE$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $\Delta RMSE_{best} := \Delta RMSE$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;} else {<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if $RMSE < RMSE_{best}$ AND $lin\_reg\_model.condition\_number \le 1000$ then {<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $RMSE_{best} := RMSE$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $\Delta RMSE_{best} := \Delta RMSE$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;set $feature\_subset_{best} := feature\_subset$<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>
&nbsp;&nbsp;&nbsp;}<br>
</b>

<br><br>
**This results in cross-validation *EXHAUSTIVELY* selecting the best *non-colinear* feature-combination subset, from $n$ starting features, that predicts the outcome, *price*, with the greatest accuracy (lowest $\Delta RMSE$)**.

Do note that this is a greedy algorithm and can take quite a long time depending on the number of starting features.

The total number of all possible combinations the algorithm will select from is $\sum_{r=1}^n {n \choose r} = {n \choose 1} + {n \choose 2} + \cdot \cdot \cdot + {n \choose n}= 2^n-1$.

That number can grow quite large rather quickly.  

For instance, starting with $n=18$ features, we have $\sum_{r=1}^{18} {18 \choose r} = 2^{18}-1 = 262143$ possible combinations!  

**Cross-validating every possible combination of EVERY starting feature over 5 folds will literally take almost all day, if not longer**.