# Linear Regression Multiple Variables

Sample problem of predicting home price in monroe, New Jersey 

Below is the table containing home prices in monroe twp, NJ. Here price depends on area (square feet), bed rooms and age of the home (in years). Given these prices we have to predict prices of new homes based on area, bed rooms and age.

In [1]:
import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt

In [2]:
df = pd.read_csv("datasets/monroe_twp_nj.csv")
df

Unnamed: 0.1,Unnamed: 0,Area,Price,Bedrooms,Years
0,0,2600,550000,3.0,20
1,1,3000,565000,4.0,15
2,2,3200,610000,,18
3,3,3600,595000,3.0,30
4,4,4000,760000,5.0,8
5,5,4100,810000,6.0,8


```
Given these home prices find out price of a home that has,

3000 sqr ft area, 3 bedrooms, 40 year old

2500 sqr ft area, 4 bedrooms, 5 year old
```

In [13]:
# First check dataset for NaN

In [3]:
df.isna().any()

Unnamed: 0    False
Area          False
Price         False
Bedrooms       True
Years         False
dtype: bool

We will use regression with multiple variables here. Price can be calculated using following equation,

![title](images/9.1.png)

b is intercept

Things we need to handle:
- Data Preprocessing: Handling NA values
- Linear Regression Using Multiple Variables

We can replace NaN with thee median value

In [6]:
number_to_fill = round(df.Bedrooms.median())
print(number_to_fill)
df["Bedrooms"] = df.Bedrooms.fillna(round(df.Bedrooms.median()))

4


In [7]:
df

Unnamed: 0.1,Unnamed: 0,Area,Price,Bedrooms,Years
0,0,2600,550000,3.0,20
1,1,3000,565000,4.0,15
2,2,3200,610000,4.0,18
3,3,3600,595000,3.0,30
4,4,4000,760000,5.0,8
5,5,4100,810000,6.0,8


Create Linear regression model

In [8]:
reg = linear_model.LinearRegression()

Now train your model with fit method
- first arguments independant variables 
- second is dependable variable

In [9]:
reg.fit(df[["Area","Bedrooms", "Years"]], df.Price)

LinearRegression()

Coefficients are m1, m2 and m3

In [10]:
reg.coef_

array([  112.06244194, 23388.88007794, -3231.71790863])

Intercept is b

In [11]:
reg.intercept_

221323.00186540408

```
Now predict the price:

3000 sqr ft area, 3 bedrooms, 40 year old
```

In [12]:
reg.predict([[3000, 3, 40]])

array([498408.25158031])

To see how the price was calculated take coefficient and multiply by inputs + intercept

In [35]:
(112.06244194*3000)+(23388.88007794*3)+(-3231.71790863*40) + 221323.00186540408

498408.2515740241

```
Predict 
2500 sqr ft area, 4 bedrooms, 5 year old
```

In [36]:
reg.predict([[2500,4,5]])

array([578876.03748933])