# Problems: LASSO and Model Selection
### 1. *Exhaustive search.* 
    
In this problem, we will look at how to exhaustively search over all possible subsets of features. You are given three python functions:

```python
model = LinearRegression() # Create a linear regression model object
model.fit(X,y) # Fits the model
yhat = model.predict(X) # Predicts targets given features
```
Given training data `Xtr,ytr` and test data `Xts,yts`, write a few lines of python code to:

**Part A**
- Find the best model using only one feature of the data (i.e. one column of `Xtr` and `Xts`).

```python
p = X.shape[1] 
for i in range(p):
    # training data fit
    model = LinearRegression ()
    model.fit(Xtr[:,i])
    
    yhat = model.predict(Xts[:,i])
    mse[i] = np.mean ((yhat−yts)∗∗2)
    iopt = np.argmin(mse)
```

**Part B**
- Find the best model using only two features of the data (i.e. two columns of Xtr and Xts).

```python
p = X.shape[1] 
feat_set_list = []
mse = [] # per set

for i in range(p − 1):
    for j in range(i, p):
        feats = [i, j]
        feat_set_list.append(feats)
        
        model = LinearRegression()
        model.fit(Xtr[:, feats])
        
        yhat = model.predict(Xts[:, feats])
        mse.append(np.mean((yhat−yts)∗∗2))
        
    opt = np.argmin(mse)
    feats_opt = feats[opt]
```

**Part C**
- Suppose we wish to find the best `k` of `p` features via exhaustive searching over all possible subsets of features. How many times would you need to call the fit function? What if `k = 10` and `p = 1000`?

The fit function would need to be called on all possible subsets of 10 features, which would be a search of p choose k subsets. If p = 1000 and k = 10, this would equal approximately 2.63(10)^23, which makes exhaustive search a poor choice here.

### 4. Normalization in Python
You are given the python functions:
```python
model = SomeModel () # Creates a model
model.fit(Z,u) # Fits the model , expecting normalized features
yhat = model.predict(Z) # Predicts targets given features
```

Given training data `Xtr,ytr` and test data `Xts,yts`, write python code to:

- Normalize the training data to remove the mean and standard deviation from both Xtr
and ytr.
- Fit the model on the normalized data.
- Predict the values yhat on the test data.
- Measure the RSS on the test data.

```python
xstd = np.std(Xtr, axis = 0)
xmean = np.mean(Xtr, axis = 0)
Ztr = (Xtr − xmean[:, None ]) / xstd[:, None)
ystd = np.std(y)
ymean = np.mean(y)
u = (y − ymean)/ ystd
model = SomeModel ()
model.fit(Ztr, u)
Zts = (Xts − xmean[:, None ]) / xstd[:, None]
uts = model.predict(Zts)
yhat = ymean + ystd∗ustd
rss = np.sum((yts−yhat)∗∗2)
```