### Instructions

---

#### Goal of the Project

This project is designed for you to practice and solve the activities that are based on the concepts covered in the following lessons:

 1. Multiple linear regression - Introduction



---

#### Getting Started:

1. Click on this link to open the Colab file for this project.

 https://colab.research.google.com/drive/1Es_u-8opC2RM5MyEzJZHjhrfLXkFWBAr

2. Create a duplicate copy of the Colab file as described below.

  - Click on the **File menu**. A new drop-down list will appear.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/lesson-0/0_file_menu.png' width=500>

  - Click on the **Save a copy in Drive** option. A duplicate copy will get created. It will open up in the new tab on your web browser.

  <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/lesson-0/1_create_colab_duplicate_copy.png' width=500>

3. After creating the duplicate copy of the notebook, please rename it in the **YYYY-MM-DD_StudentName_Project62** format.

4. Now, write your code in the prescribed code cells.


---

### Problem Statement

A real estate company wishes to analyse the prices of properties based on various factors such as area, number of rooms, bathrooms, bedrooms, etc. Create a multiple linear regression model which is capable of predicting the sale price of houses based on multiple factors and evaluate the accuracy of this model.








---

### List of Activities

**Activity 1:** Analysing the Dataset

**Activity 2:** Data Preparation
  
**Activity 3:** Train-Test Split

**Activity 4:**  Model Training

**Activity 5:** Model Prediction and Evaluation







---


#### Activity 1:  Analysing the Dataset

- Create a Pandas DataFrame for **Housing** dataset using the below link. This dataset consists of following columns:


|Field|Description|
|---:|:---|
|price|Sale price of a house in INR|
|area|Total size of a property in square feet|
|bedrooms|Number of bedrooms|
|bathrooms|Number of bathrooms|
|storeys|Number of storeys excluding basement|
|mainroad|yes, if the house faces a main road|
|livingroom|yes, if the house has a separate living room or a drawing room for guests|
|basement|yes, if the house has a basement|
|hotwaterheating|yes, if the house uses gas for hot water heating|
|airconditioning|yes, if there is central air conditioning|
|parking|number of cars that can be parked|
|prefarea|yes, if the house is located in the preferred neighbourhood of the city|


  **Dataset Link:** https://s3-student-datasets-bucket.whjr.online/whitehat-ds-datasets/house-prices.csv

- Print the first five rows of the dataset. Check for null values and treat them accordingly.






In [None]:
# Import modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Load the dataset
# Dataset Link: 'https://s3-student-datasets-bucket.whjr.online/whitehat-ds-datasets/house-prices.csv'
df = pd.read_csv("https://s3-student-datasets-bucket.whjr.online/whitehat-ds-datasets/house-prices.csv")
# Print first five rows using head() function
df.head()

Unnamed: 0,price,area,bedrooms,bathrooms,stories,mainroad,guestroom,basement,hotwaterheating,airconditioning,parking,prefarea,furnishingstatus
0,13300000,7420,4,2,3,yes,no,no,no,yes,2,yes,furnished
1,12250000,8960,4,4,4,yes,no,no,no,yes,3,no,furnished
2,12250000,9960,3,2,2,yes,no,yes,no,no,2,yes,semi-furnished
3,12215000,7500,4,2,2,yes,no,yes,no,yes,3,yes,furnished
4,11410000,7420,4,1,2,yes,yes,yes,no,yes,2,no,furnished


In [None]:
# Check if there are any null values. If any column has null values, treat them accordingly
df.isnull().sum()

price               0
area                0
bedrooms            0
bathrooms           0
stories             0
mainroad            0
guestroom           0
basement            0
hotwaterheating     0
airconditioning     0
parking             0
prefarea            0
furnishingstatus    0
dtype: int64

---

#### Activity 2: Data Preparation

This dataset contains many columns having categorical data i.e. values 'Yes' or 'No'. However for linear regression, we need numerical data. So you need to convert all 'Yes' and 'No' values to 1s and 0s, where
- 1 means 'Yes'
- 0 means 'No'

Similarly, replace

- `unfurnished` with 0
- `semi-furnished` with 1
- `furnished` with 2

**Hint:** To replace all 'Yes' values with 1 and 'No' values with 0, use `replace()` function of the DataFrame object.

For ex: `df.replace(to_replace="yes", value=1, inplace=True)` $\Rightarrow$ replaces the "yes" values in all columns with 1. If you need to make changes inplace, use `inplace` boolean argument.



In [None]:
# Replace all the non-numeric values with numeric values.
df.replace(to_replace="yes", value=1, inplace=True)
df.replace(to_replace="no", value=0, inplace=True)
df.replace(to_replace="unfurnished", value= 0, inplace=True)
df.replace(to_replace="semi-furnished", value=1, inplace=True)
df.replace(to_replace="furnished", value=2, inplace=True)
df.head(10)

Unnamed: 0,price,area,bedrooms,bathrooms,stories,mainroad,guestroom,basement,hotwaterheating,airconditioning,parking,prefarea,furnishingstatus
0,13300000,7420,4,2,3,1,0,0,0,1,2,1,2
1,12250000,8960,4,4,4,1,0,0,0,1,3,0,2
2,12250000,9960,3,2,2,1,0,1,0,0,2,1,1
3,12215000,7500,4,2,2,1,0,1,0,1,3,1,2
4,11410000,7420,4,1,2,1,1,1,0,1,2,0,2
5,10850000,7500,3,3,1,1,0,1,0,1,2,1,1
6,10150000,8580,4,3,4,1,0,0,0,1,2,1,1
7,10150000,16200,5,3,2,1,0,0,0,0,0,0,0
8,9870000,8100,4,1,2,1,1,1,0,1,2,1,2
9,9800000,5750,3,2,4,1,1,0,0,1,1,1,0


---

#### Activity 3: Train-Test Split

You need to predict the house prices based on several factors. Thus, `price` is the target variable and other columns except `price` will be feature variables.

Split the dataset into training set and test set such that the training set contains 67% of the instances and the remaining instances will become the test set.

In [None]:
# Split the DataFrame into the training and test sets.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
feature = list(df.columns.values[1:])

X = df[feature]
y = df['price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 42)

---

#### Activity 4: Model Training

Implement multiple linear regression using `sklearn` module in the following way:

1. Reshape the target variable array into two-dimensional arrays by using `reshape(-1, 1)` function of the numpy module.
2. Deploy the model by importing the `LinearRegression` class and create an object of this class.
3. Call the `fit()` function on the LinearRegression object.

In [None]:
# Create two-dimensional NumPy arrays for the target variable

y_train_reshaped = y_train.values.reshape(-1,1)
y_test_reshaped = y_test.values.reshape(-1,1)

# Build linear regression model
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train_reshaped)

# Print the value of the intercept
print('Intercept is:',lin_reg.intercept_[0])

# Print the names of the features along with the values of their corresponding coefficients.
for i in list(zip(X.columns.values,lin_reg.coef_[0])):
    print(i[0],i[1])

Intercept is: -276654.39716309495
area 251.3401999267822
bedrooms 92716.60526930448
bathrooms 1126479.3774358043
stories 396248.42774732393
mainroad 410635.15569710976
guestroom 320496.71121046523
basement 484622.2788531308
hotwaterheating 623047.39290368
airconditioning 678375.3422621787
parking 292410.46314066974
prefarea 524417.2428236585
furnishingstatus 200615.3570355712


---

#### Activity 5: Model Prediction and Evaluation

Predict the values for both training and test sets by calling the `predict()` function on the LinearRegression object. Also, calculate the $R^2$, MSE, RMSE and MAE values to evaluate the accuracy of your model.

In [None]:
# Predict the target variable values for training and test set
y_train_pred = lin_reg.predict(X_train)
y_test_pred = lin_reg.predict(X_test)

In [None]:
# Evaluate the linear regression model using the 'r2_score', 'mean_squared_error' & 'mean_absolute_error' functions of the 'sklearn' module.
from sklearn.metrics import r2_score,mean_squared_error, mean_absolute_error
y_train_pred = lin_reg.predict(X_train)
y_test_pred = lin_reg.predict(X_test)
print("R2_score :",r2_score(y_train_reshaped, y_train_pred))
print("Mean squared Error :",mean_squared_error(y_train_reshaped,y_train_pred))
print("Mean Absolute Error:",mean_absolute_error(y_train_reshaped,y_train_pred))
print("----"*15)
print("R2_score :",r2_score(y_test_reshaped,y_test_pred))
print("Mean squared Error :",mean_squared_error(y_test_reshaped,y_test_pred))
print("Mean absolute error :",mean_absolute_error(y_test_reshaped,y_test_pred))


R2_score : 0.68603602364727
Mean squared Error : 971946527815.6637
Mean Absolute Error: 720751.2129481052
------------------------------------------------------------
R2_score : 0.6557070707485257
Mean squared Error : 1475542475754.5508
Mean absolute error : 906953.7908301718


---

### Submitting the Project:

1. After finishing the project, click on the **Share** button on the top right corner of the notebook. A new dialog box will appear.

  <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/2_share_button.png' width=500>

2. In the dialog box, make sure that '**Anyone on the Internet with this link can view**' option is selected and then click on the **Copy link** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/3_copy_link.png' width=500>

3. The link of the duplicate copy (named as **YYYY-MM-DD_StudentName_Project62**) of the notebook will get copied

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/4_copy_link_confirmation.png' width=500>

4. Go to your dashboard and click on the **My Projects** option.
   
   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/5_student_dashboard.png' width=800>

  <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/6_my_projects.png' width=800>

5. Click on the **View Project** button for the project you want to submit.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/7_view_project.png' width=800>

6. Click on the **Submit Project Here** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/8_submit_project.png' width=800>

7. Paste the link to the project file named as **YYYY-MM-DD_StudentName_Project62** in the URL box and then click on the **Submit** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/9_enter_project_url.png' width=800>

---