**Importing the Dependencies**

In [1]:
import numpy as np #calculates arrays and linear algebra operations
import pandas as pd #handle data, especially DataFrame data table structures
import matplotlib.pyplot as plt #draw charts and visualizations
import seaborn as sns #built on matplotlib for beautiful and easy to use statistical plots
import sklearn.datasets #sample datasets and data processing tools in machine learning
from sklearn.model_selection import train_test_split #split the dataset into training and test sets
from sklearn import metrics #provides functions to evaluate the performance of machine learning models, such as MAE, MSE, RMSE, R²,...
from xgboost import XGBRegressor #powerful and efficient tool for predicting continuous values using gradient boosting

**Import the Boston House Price Data**

In [40]:
import requests #load data from URL

url = "https://lib.stat.cmu.edu/datasets/boston"

response = requests.get(url, verify=False)
house_price_dataset = response.text




In [34]:
print(house_price_dataset)

 The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
 prices and the demand for clean air', J. Environ. Economics & Management,
 vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
 ...', Wiley, 1980.   N.B. Various transformations are used in the table on
 pages 244-261 of the latter.

 Variables in order:
 CRIM     per capita crime rate by town
 ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
 INDUS    proportion of non-retail business acres per town
 CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
 NOX      nitric oxides concentration (parts per 10 million)
 RM       average number of rooms per dwelling
 AGE      proportion of owner-occupied units built prior to 1940
 DIS      weighted distances to five Boston employment centres
 RAD      index of accessibility to radial highways
 TAX      full-value property-tax rate per $10,000
 PTRATIO  pupil-teacher ratio by town
 B        100

**Loading the dataset to a Pandas DataFrame**

In [48]:
# split data to put into DataFrame
dataframe = house_price_dataset.split('\n')

# delete unnecessary lines
data = dataframe[22:]

# divide data into parts
parsed_data = []
for i in range(0, len(data), 2):
    row1 = data[i].strip().split()
    if i+1 < len(data):
        row2 = data[i+1].strip().split()
        parsed_data.append(row1 + row2)

# push data into DataFrame
column_names = [
    'CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX',
    'RM', 'AGE', 'DIS', 'RAD', 'TAX',
    'PTRATIO', 'B', 'LSTAT', 'MEDV'
]
df = pd.DataFrame(parsed_data, columns=column_names)

# change string to number
hourse_price_dataframe = df.apply(pd.to_numeric)

**Print first 5 rows of our DataFrame**

In [49]:
hourse_price_dataframe.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.9,5.33,36.2


**Checking the number of rows and columns in the data frame**

In [50]:
hourse_price_dataframe.shape

(506, 14)

**Check for missing values**

In [51]:
hourse_price_dataframe.isnull().sum()

CRIM       0
ZN         0
INDUS      0
CHAS       0
NOX        0
RM         0
AGE        0
DIS        0
RAD        0
TAX        0
PTRATIO    0
B          0
LSTAT      0
MEDV       0
dtype: int64

**Statistical measures of the dataset**

In [52]:
hourse_price_dataframe.describe()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
count,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0
mean,3.613524,11.363636,11.136779,0.06917,0.554695,6.284634,68.574901,3.795043,9.549407,408.237154,18.455534,356.674032,12.653063,22.532806
std,8.601545,23.322453,6.860353,0.253994,0.115878,0.702617,28.148861,2.10571,8.707259,168.537116,2.164946,91.294864,7.141062,9.197104
min,0.00632,0.0,0.46,0.0,0.385,3.561,2.9,1.1296,1.0,187.0,12.6,0.32,1.73,5.0
25%,0.082045,0.0,5.19,0.0,0.449,5.8855,45.025,2.100175,4.0,279.0,17.4,375.3775,6.95,17.025
50%,0.25651,0.0,9.69,0.0,0.538,6.2085,77.5,3.20745,5.0,330.0,19.05,391.44,11.36,21.2
75%,3.677083,12.5,18.1,0.0,0.624,6.6235,94.075,5.188425,24.0,666.0,20.2,396.225,16.955,25.0
max,88.9762,100.0,27.74,1.0,0.871,8.78,100.0,12.1265,24.0,711.0,22.0,396.9,37.97,50.0
