5. Use Naive bayes, K-nearest, and Decision tree classification algorithms and build classifiers.
Divide the data set into training and test set. Compare the accuracy of the different classifiers
under the following situations:
    1. a) Training set = 75% Test set = 25% b) Training set = 66.6% (2/3rd of total), Test set = 33.3%
    2. Training set is chosen by i) hold out method ii) Random subsampling iii) Cross-Validation. Compare the accuracy of the classifiers obtained.
    3. Data is scaled to standard format.

In [1]:
!pip install imblearn



In [2]:
from sklearn.datasets import load_iris
from sklearn import naive_bayes
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split, cross_val_score
from imblearn.under_sampling import RandomUnderSampler
import numpy as np 

In [3]:
iris = load_iris()

x = iris.data 
y = iris.target

In [4]:
def naiveBayes(xTrain, yTrain, xTest, yTest):
    clf = naive_bayes.GaussianNB()
    clf.fit(xTrain, yTrain)
    yPred = clf.predict(xTest)
    print("Naive Bayes Classifier:")
    print("Accuracy:", accuracy_score(yTest, yPred))
    print()

def kNN(xTrain, yTrain, xTest, yTest, k=5):
    clf = KNeighborsClassifier(n_neighbors=k)
    clf.fit(xTrain, yTrain)
    yPred = clf.predict(xTest)
    print("K-Nearest Neighbors Classifier (k={}):".format(k))
    print("Accuracy:", accuracy_score(yTest, yPred))
    print()

def decisionTree(xTrain, yTrain, xTest, yTest):
    clf = DecisionTreeClassifier()
    clf.fit(xTrain, yTrain)
    yPred = clf.predict(xTest)
    print("Decision Tree Classifier:")
    print("Accuracy:", accuracy_score(yTest, yPred))
    print()

In [5]:
# 5.1 
print ("Training set = 75% Test set = 25%")

x_train, x_test, y_train, y_test = train_test_split(x, y,test_size=0.25)
naiveBayes(x_train, y_train, x_test, y_test)
kNN(x_train, y_train, x_test, y_test)
decisionTree(x_train, y_train, x_test, y_test)

print ("Training set = 66.6% (2/3rd of total), Test set =33.3%")
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=1/3)
naiveBayes(x_train, y_train, x_test, y_test)
kNN(x_train, y_train, x_test, y_test)

Training set = 75% Test set = 25%
Naive Bayes Classifier:
Accuracy: 0.8947368421052632

K-Nearest Neighbors Classifier (k=5):
Accuracy: 0.9736842105263158

Decision Tree Classifier:
Accuracy: 0.9473684210526315

Training set = 66.6% (2/3rd of total), Test set =33.3%
Naive Bayes Classifier:
Accuracy: 0.98

K-Nearest Neighbors Classifier (k=5):
Accuracy: 1.0



In [6]:
# 5.2
# Holdout Method
print ("Holdout Method")
x_train_holdout, x_test_holdout, y_train_holdout, y_test_holdout = train_test_split(x, y, test_size=0.25)
naiveBayes(x_train_holdout, y_train_holdout, x_test_holdout, y_test_holdout)
kNN(x_train_holdout, y_train_holdout, x_test_holdout, y_test_holdout)
decisionTree(x_train_holdout, y_train_holdout, x_test_holdout, y_test_holdout)

# Cross-Validation
print ("-" * 100)
print ("Cross validation")
naiveBayes_cv = cross_val_score(naive_bayes.GaussianNB(), x, y, cv=5)
kNN_cv = cross_val_score(KNeighborsClassifier(n_neighbors=5), x, y, cv=5)
decisionTree_cv = cross_val_score(DecisionTreeClassifier(), x, y, cv=5)
print("Naive Bayes Cross-Validation Accuracy:", np.mean(naiveBayes_cv))
print("K-Nearest Neighbors Cross-Validation Accuracy (k=5):", np.mean(kNN_cv))
print("Decision Tree Cross-Validation Accuracy:", np.mean(decisionTree_cv))

# Random subsampling
print ("-" * 100)
print ("Random Subsampling")
rus = RandomUnderSampler(random_state=42)
x_resampled, y_resampled = rus.fit_resample(x, y)

x_train, x_test, y_train, y_test = train_test_split(x_resampled, y_resampled, test_size=0.25)
naiveBayes(x_train, y_train, x_test, y_test)
kNN(x_train, y_train, x_test, y_test)
decisionTree(x_train, y_train, x_test, y_test)

Holdout Method
Naive Bayes Classifier:
Accuracy: 0.9736842105263158

K-Nearest Neighbors Classifier (k=5):
Accuracy: 0.9736842105263158

Decision Tree Classifier:
Accuracy: 0.9473684210526315

----------------------------------------------------------------------------------------------------
Cross validation
Naive Bayes Cross-Validation Accuracy: 0.9533333333333334
K-Nearest Neighbors Cross-Validation Accuracy (k=5): 0.9733333333333334
Decision Tree Cross-Validation Accuracy: 0.9666666666666668
----------------------------------------------------------------------------------------------------
Random Subsampling
Naive Bayes Classifier:
Accuracy: 0.9210526315789473

K-Nearest Neighbors Classifier (k=5):
Accuracy: 0.9473684210526315

Decision Tree Classifier:
Accuracy: 0.9473684210526315



In [7]:
# 5.3
from sklearn.preprocessing import StandardScaler
print ("Data is scaled")
scaler = StandardScaler()
x_scaled = scaler.fit_transform(x)
x_train, x_test, y_train, y_test = train_test_split(x, y,test_size=0.25)
naiveBayes(x_train, y_train, x_test, y_test)
kNN(x_train, y_train, x_test, y_test)
decisionTree(x_train, y_train, x_test, y_test)

Data is scaled
Naive Bayes Classifier:
Accuracy: 0.9736842105263158

K-Nearest Neighbors Classifier (k=5):
Accuracy: 0.9736842105263158

Decision Tree Classifier:
Accuracy: 0.9473684210526315

