**Predicting Boston Housing Prices**

The Boston Housing Dataset consists of prices of houses in various places in Boston. The dataset is described by information such as Crime rate (CRIM), Nitric Oxide concentration (NOX), Age of the house owners (AGE), etc. You can find the dataset [here](https://www.kaggle.com/c/boston-housing/data) to get detailed explanation of other attributes of the dataset and also download it.

Our goal is to predict the prices of houses in Boston using this dataset. As you can already guess, this is a Linear Regression problem since we were given both the input and target labels.

In [None]:
#Import the necessary packages
import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
#Load the dataset
data = pd.read_csv('../input/boston_train.csv')

In [None]:
#Show the first 5 rows in the csv
data.head()

In [None]:
#Get an overview of the data to see if there are any empty values in the dataset
data.info()

In [None]:
#Let's to get the dimension of the dataset.
data.shape

In [None]:
#Now let's get a summary of our data to see the distribution.
data.describe()

In [None]:
# Get the titles of the columns in our dataset
data.columns

In [None]:
# The columns we'll use to predict the target
X = data[['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax', 'ptratio',
 'black', 'lstat']]
#OR X = data.drop('medv', axis=1)

# The target column
Y = data['medv']

In [None]:
# Split our prepared data into train and test set
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33, random_state=5)

In [None]:
#Let's fit our model using Linear Regression
from sklearn.linear_model import LinearRegression

#Create an instance of LinearRegression
lm = LinearRegression()
#fit the model on our training data
lm.fit(X_train, Y_train)

#let's grab the predictions from our test set
Y_predictions = lm.predict(X_test)

In [None]:
# Plot the Actual Prices against the Predicted Prices
plt.scatter(Y_test, Y_predictions)

plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.title("Plot of Actual prices VS Predicted prices")

In [None]:
#Checking the residual
sns.distplot((Y_test - Y_predictions))

In [None]:
# MODEL EVALUATION
from sklearn import metrics

MAE = metrics.mean_absolute_error(Y_test, Y_predictions)

MSE = metrics.mean_squared_error(Y_test, Y_predictions)

RMSE = np.sqrt(metrics.mean_squared_error(Y_test, Y_predictions))

print('MAE: ', MAE )
print('MSE: ', MSE)

print('RMSE: ', RMSE)