**Linear Regression**


*   Supervised learning
  * Given: feature and ground truth value of the sample (training set)
  * Goal: predict labels (classification)
*   Unsupervised learning
  * Given: features only, no labels
  * Goal: find meaningful groups (clustering)

**Supervised Learning**
* given n samples {(x1,y1),...,(xn,yn)}
* learn a mapping function f(x) -> y
  * y is continuous: regression;
  * y is discrete: classification

**Regression Applications**
* Stock predictions
* Weather prediction

**Build linear regression model**
* learn a mapping function f(x) -> y
* f(x) is a linear combination of input features
```
  f(x_i) = w0 + w1(x_i,1) + w2(x_i,2) + ... + wd(x_i,d)
  x_i = (x_i,1, x_i,2, ... , x_i,d) is the feature of the i-th sample
  w = (w0, w1, w2, ..., wd) is the model weight
  w0 is the *bias*
```
* f(x_i) = w^T(x_i)

**Model optimization**
* minimize prediction error
* Loss function = min_w (1/n)sum(y - f(x))^2 = min 1/n||y-Xw||^2


```
[1, x1, x2, ..., xd][w0]
[1, ..., ..., ... ] [...]
[1, ..., ..., ... ] [wd]
```
* column of 1's lets w0 be included
* find weight where derivative of loss function is 0
* w = (X^T*X)^-1 *X^Ty

**Loss functions**
 * Mean absolute error (MAE)
 * Mean squared error (MSE)
 * Root mean squared error (RMSE)

**Gradient descent**

machine learning is an optimization problem

find a model parameter to minimize the loss function

iterative process

**Training and Testing data**

In [None]:
# split samples

house_fea = df.drop('median_house_value', axis=1).values
house_price = df['median_house_value'].values

X_train, X_test, y_train, y_test = train_test_split(house_fea, house_price, test_size = 0.2, random_state = 42)

normalizer = StandardScaler()
X_train = normalizer.fit_transform(X_train)
X_test = normalizer.fit_transform(X_test)

**Train the model**

In [None]:
lr = LinearRegression()

lr.fit(X_train, y_train)

print("bias: " + str(lr.intercept_))
print("coeffs: "+ str(lr.coef_)) #gives weight for each feature

In [None]:
#evaluate the model
y_train_pred = lr.predict(X_train)

mae = mean_absolute_error(y_train_pred, y_train)
mse = mean_squared_error(y_train_pred, y_train)
rmse = np.sqrt(mse)

print("Prediction for training set: ")
print("MAE: {}".format(mae))
print("MSE: {}".format(mse))
print("RMSE: {}".format(rmse))

In [None]:
labels = ['House1', 'House2', 'House3', 'House4', 'House5']
x = np.arange(len(labels))
width = 0.35

fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, y_test[0:5], width, label= "ground truth")
rects2 = ax.bar(x + width/2, y_test_pred[0:5], width, label= "predicted")

ax.set_ylabel("Price")
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()

**How to fit nonlinear data?**

-- apply a nonlinear transformation to features

-- can increase the order of the model

**Overfittng**: errors on training data are very small, but errors on new points are likely to be large

**How to avoid overfitting**
* add a regularization term
* make some w_i very small or approach to zero