## About the Boston Dataset 

#### Referance :-https://towardsdatascience.com/machine-learning-project-predicting-boston-house-prices-with-regression-b4e47493633d

The dataset used in this project comes from the UCI Machine Learning Repository. This data was collected in 1978 and each of the 506 entries represents aggregate information about 14 features of homes from various suburbs located in Boston.

Boston

![Image of Boston](BostonImage.png)

#### What do the column headers mean?


1. CRIM: This is the per capita crime rate by town
2. ZN: This is the proportion of residential land zoned for lots larger than 25,000 sq.ft.
3. INDUS: This is the proportion of non-retail business acres per town.
4. CHAS: This is the Charles River dummy variable (this is equal to 1 if tract bounds river; 0        otherwise)
5. NOX: This is the nitric oxides concentration (parts per 10 million)
6. RM: This is the average number of rooms per dwelling
7. AGE: This is the proportion of owner-occupied units built prior to 1940
8. DIS: This is the weighted distances to five Boston employment centers
9. RAD: This is the index of accessibility to radial highways
10. TAX: This is the full-value property-tax rate per 10,000 dollors
11. PTRATIO: This is the pupil-teacher ratio by town
12. B: This is calculated as 1000(Bk — 0.63)², where Bk is the proportion of people of African       American descent by town
13. LSTAT: This is the percentage lower status of the population
14. MEDV: This is the median value of owner-occupied homes in 1000s dollors

In [4]:
import numpy as np
import matplotlib.pyplot as plt 

import pandas as pd  
import seaborn as sns 

%matplotlib inline

In [5]:
from sklearn.datasets import load_boston
boston_dataset = load_boston()

In [6]:
print(boston_dataset.keys())

dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])


In [7]:
boston_dataset.DESCR

".. _boston_dataset:\n\nBoston house prices dataset\n---------------------------\n\n**Data Set Characteristics:**  \n\n    :Number of Instances: 506 \n\n    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.\n\n    :Attribute Information (in order):\n        - CRIM     per capita crime rate by town\n        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.\n        - INDUS    proportion of non-retail business acres per town\n        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)\n        - NOX      nitric oxides concentration (parts per 10 million)\n        - RM       average number of rooms per dwelling\n        - AGE      proportion of owner-occupied units built prior to 1940\n        - DIS      weighted distances to five Boston employment centres\n        - RAD      index of accessibility to radial highways\n        - TAX      full-value property-tax rate per $10,000

In [8]:
boston = pd.DataFrame(boston_dataset.data, columns=boston_dataset.feature_names)
boston.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33
