# Linear Regression (aka ordinary least squares)

Linear regression, or ordinary least square(OLS), is the simplest and most classic linear method for regression. Linear regression finds the parameters w and b that minimize the mean squared error between predictions and the true regression targets, y , on the training set. The mean squared error is the sum of the squared differences between the predictions and the true values, divided by the number of samples. Linear regression has no parameters, which is a benefit, but it also has no way to control model complexity.

In [2]:
import mglearn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
X,y = mglearn.datasets.make_wave(n_samples=60)
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=42)

In [3]:
lr = LinearRegression().fit(X_train,y_train)

In [4]:
print("lr.coef_: {}".format(lr.coef_))
print("lr.intercept_:{}".format(lr.intercept_))

lr.coef_: [0.39390555]
lr.intercept_:-0.031804343026759746


In [5]:
print("Training set score : {:.2f}".format(lr.score(X_train,y_train)))
print("Test set score: {:.2f}".format(lr.score(X_test,y_test)))

Training set score : 0.67
Test set score: 0.66


An R^2 of around 0.66 is not very good, but we can see that the scores on the training and test sets are very close together. This means we are likely underfitting, not overfitting. Fot this one-dimensional dataset, there is little danger of overfitting, as the model is very simple(or restricted). However, with high dimentional datasets(datasets with large features) , linear model becomes more powerful, and there is a high chance of overfitting. 

# Boston Housing Dataset

In [6]:
X,y = mglearn.datasets.load_extended_boston()


In [8]:
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=0)


In [9]:
lr2 = LinearRegression().fit(X_train,y_train)

In [10]:
print("Training set score : {:.2f}".format(lr2.score(X_train,y_train)))
print("Test set score : {:.2f}".format(lr2.score(X_test,y_test)))

Training set score : 0.95
Test set score : 0.61


R^2 on test set is much worse. This is a clear sign of overfitting. So, we should find a model that allows us to control complexity. Ex: ridge regression