# Housing Prices!

### Housing prices are all different and depends on various features such as area or number of bedrooms. Let's create a model with using linear regression to predicts housing prices by analyzing different types of features.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression

In [3]:
housing = pd.read_csv("Housing.csv")
housing_price_area_story = housing[["price", "area", "bedrooms", "bathrooms", "stories"]]
housing_price_area_story.head()

Unnamed: 0,price,area,bedrooms,bathrooms,stories
0,13300000,7420,4,2,3
1,12250000,8960,4,4,4
2,12250000,9960,3,2,2
3,12215000,7500,4,2,2
4,11410000,7420,4,1,2


### Before creating a model by using simple linear regression or multiple linear regression, let's just practice generating a constant model (always predicts the same constant number; it omits any relationships between specific variables).

### Model: Constant Model

We're using a constant model as a model to predict, our theta hat (predicted value) will be just a constant number (theta_0):

$$ \hat{y} = \theta_0 $$

Our current predicted value is 1 dimentional, we don't have a new input and it will always predict $\theta_0$.

### Loss Function: Mean Squared Error

We will use the L2 loss (Mean Squared Error) as a loss function and our cost function will be:

$$ R(\theta) = \frac{1}{n}\sum_{i=1}^n(y_i - \hat{y_i})^2 $$

$$ since, \space \hat{y} = \theta_0$$

$$ R(\theta) = \frac{1}{n}\sum_{i=1}^n(y_i - \theta_0)^2 $$

### Fit the Model

We now can fit the model by differentiating our cost function by $\theta_0$ and we should set this equation equal to 0:

$$ \frac{d}{d\theta}R(\theta) = \frac{1}{n}\sum_{i=1}^n\frac{d}{d\theta}(y_i - \theta_0)^2 $$

$$ = \frac{-2}{n}\sum_{i=1}^n(y_i - \theta_0) $$

Set this equation to 0:

$$ 0 = \frac{-2}{n}\sum_{i=1}^n(y_i - \theta_0) $$

get rid of $\frac{-2}{n}$

$$ 0 = \sum_{i=1}^n(y_i - \theta_0) $$

$$ = (\sum_{i=1}^ny_i) - (\sum_{i=1}^n\theta_0) $$

$$ = (\sum_{i=1}^ny_i) - n \cdot \theta_0 $$

send $n \cdot \theta_0$ to the left side

$$ n \cdot \theta_0 = \sum_{i=1}^ny_i $$

$$ \theta_0 = \frac{1}{n}\sum_{i=1}^ny_i $$

$$ \theta_0 = \bar{y} $$

We got a $\bar{y}$ as our optimal parameter for the constant model with using MSE.

$$ \hat{\theta_0} = \bar{y} $$

### Evaluate the Constant Model's Performance.

In [5]:
# get every price
prices = housing_price_area_story["price"]

In [None]:
# """
# make it work
# """

# def mse_for_constant_model(theta, prices):
#     return np.mean(np.array([(price - theta) ** 2 for price in prices]), axis=0)

# thetas = np.linspace(1000000, 14000000, 1000)
# thetas_with_using_l2_loss = mse_for_constant_model(thetas, prices)

# # Plotting the loss surface
# plt.plot(thetas, thetas_with_using_l2_loss)
# plt.xlabel(r'$\theta_0$')
# plt.ylabel(r'MSE')

# # Optimal point
# theta_hat = np.mean(prices)
# plt.scatter([theta_hat], [mse_for_constant_model(theta_hat, prices)], s=50, label = r"$\hat{\theta}_0$")
# plt.legend();
# # plt.show()