#Sparse Regression

Many of today's regression problems deal with high dimensional data and in some cases we have $n<<d$ - meaning that we have many more features than we do data points.

With so many features, it is likey that only a subset of them actually have any predictive power.

We need to find a "feature selection" method that allows us to "switch off" unimportant features and keep the predictive ones.In doing so we can build a model which performs well on unseen data.

Linear and ridge regression perform poorly on sparse datasets for a number of resaons.

* They treat all dimensions equally without favoring subsets of dimensions

* The relevant dimensions are averaged with irrelevant ones

Consequently, both models predict new data poorly and create models that are hard to interpret.

## Recall

Penalized regression is of the form:

$$w = argmin \; \Vert y - Xw \Vert + \lambda g(w)$$ where $lambda > 0 :$

For ridge regression $g(w)$ is greater

In [None]:
from scipy.optimize import minimize

class LassoRegression():
    

    def fit(self, X, y, lam=1):
        
        p = X.shape[1]
        
        w0 = np.random.rand(p)
        
        func = lambda w,y,x,lam: ((y - x.dot(w))**2).sum() + lam*w.sum()
        
        self.w = minimize(func, w0, args=(y,X,lam))["x"]
    
    def predict(self, X):
        
        return X.dot(self.w)