# Introduction
The Data Set is a Housing Prices data set taken from kaggle.com.
The dataset contain following attributes of each house :
    • lotsize: The squared cover area of the house (in msq.)
    • bedrooms: No. of bedrooms in the house.
    • bathrms: No. of bathrooms in the house.
    • stories: No. of stories.
    • garagepl: Availability of garage.
    • price: Price (in Dollars)
   
## Data Set : HOUSING PRICES

In [1]:
# Importing Packages
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import svm
from sklearn.metrics import recall_score

In [2]:
# Reading the Data Set
df = pd.read_csv("Housing.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,lotsize,bedrooms,bathrms,stories,garagepl,price
0,1,5850,3,1,2,1,42000.0
1,2,4000,2,1,1,0,38500.0
2,3,3060,3,1,1,0,49500.0
3,4,6650,3,1,2,0,60500.0
4,5,6360,2,1,1,0,61000.0


In [3]:
#Loading Attributes
X = df[['lotsize','bedrooms','bathrms','stories','garagepl']]
y = df['price']

# Description

## Naive Bayes Classifier

Naïve Bayes is used to make predictions of the likelihood that whether an
event will occur with the evidence that’s present in the given data. There are
three types of Naïve Bayes Models.
    •Multinomial: good for when features describe frequency counts.
    •Bernoulli: good for making prediction from binary features
    •Gaussian: good for predictions from normally distributed features.
Naïve Bayes assumes that predictors are independent of one another.
Training here is very fast as only the probability of the class in the input
values need to be calculated.

In [4]:
gnb = GaussianNB()
Y_pred = gnb.fit(X, y)
prediction = Y_pred.predict(X)
print('Total Data = '+str(df['price'].count()))
print('Wrongly Predicted data = %d'%((y != prediction).sum()))
print('Accuracy = ' + str(accuracy_score(y,prediction)*100))


Total Data = 546
Wrongly Predicted data = 376
Accuracy = 31.1355311355


## KNN (K Nearest Neighbour) Classifier

K-NN is a supervised classfier where the classifier memorises the
observations from labeled training set and predicts the labels for a test
unlabeled set. It makes these prediction on the basis of how similar the
training observations are similar to the test observations. The probability of
the instance being classified with a certain label depends on how similar it is
to that label. It is used for recommendation systems, stock price predictions.
It assumes that dataset has little noise and it is labeled.
For prediction, the entire training set is searched to find the K most similar
instances and for classification, it uses the most common class value.

In [5]:
# Splitting Test & Train data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

In [6]:
#  Using KNN Classifier
knn = KNeighborsClassifier(n_neighbors=5)
# Training the Data
knn.fit(X_train,y_train)
print('KNN Score : '+str(knn.score(X_test,y_test)*100))

KNN Score : 0.729927007299


## SVM Classifier

In machine learning, support vector machines are supervised learning models with
associated learning algorithms that analyse data used for classification and
regression analysis. When data are not labeled, supervised learning is not possible,
and an unsupervised learning approach is required, which attempts to find natural
clustering of the data to groups, and then map new data to these formed groups.
A support vector machine constructs a hyperplane or set of hyperplanes in a highor
infinite-dimensional space, which can be used for classification, regression, or
other tasks like outliers detection. The hyperplanes in the higher-dimensional
space are defined as the set of points whose dot product with a vector in that
space is constant.

In [7]:
model = svm.SVC(gamma=1)
# Training The Data
model.fit(X_train,y_train)
# Predicting the Data
y_predicted = model.predict(X_test)
print('SVM Score : '+str(model.score(X,y)*100))
print('Recall Score ='+str(recall_score(y_true=y_test,y_pred=y_predicted,average='micro')))

SVM Score : 67.032967033
Recall Score =0.021897810219


As we can see the results for the given dataset, SVM provides the best accuracy
and prediction in comparison to KNN and naive Bayes