# Project for different classifiers

In this project I will be using iris dataset from sklearn datasets library and use three different classifier methods for classifying the data. 

## Dataset

For this dataset there exists 3 different classes with 50 samples in each class, 150 samples total. The dimensionality of dataset is 4 and features being "true" and "false".

The three different classes are **setosa, versicolor, virginica.**

Our goal is to distinct each class using machine learning classification and three (3) different classifier to get the desired results

In [None]:
# IMPORTS

from sklearn.neighbors import KNeighborsClassifier
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.gaussian_process.kernels import RBF

from sklearn.metrics import classification_report, accuracy_score, f1_score, confusion_matrix, precision_recall_fscore_support, precision_score, recall_score
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Classifiers
kernel = 1.0 * RBF([1.0])

knn = KNeighborsClassifier(n_neighbors=3)
gauss = GaussianProcessClassifier(kernel=kernel)
forest = RandomForestClassifier(max_depth=4, random_state=111)

In [None]:
# Load data

iris = datasets.load_iris()

X = iris.data[:, :2]  # we only take the first two features.
y = np.array(iris.target, dtype=int)

# test and train split

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.8, random_state=42)

In [None]:
# training the models

knn.fit(X_train, y_train)
gauss.fit(X_train, y_train)
forest.fit(X_train, y_train)

In [None]:
# predictions for each model

y_pred_knn = knn.predict(X_test)
y_pred_knn

y_pred_gauss = gauss.predict(X_test)
y_pred_gauss

y_pred_forest = forest.predict(X_test)
y_pred_forest

In [None]:
# evaluation of prediction to test data

def evaluate_metrics(test, prediction):
    results_pos = {}
    results_pos['accuracy'] = accuracy_score(test, prediction)
    precision, recall, f_beta, _ = precision_recall_fscore_support(test, prediction)
    results_pos['recall'] = recall
    results_pos['precision'] = precision
    results_pos['f1score'] = f_beta
    return results_pos

In [None]:
print("Metrics for the KNN algorithm")
evaluate_metrics(y_test, y_pred_knn)

In [None]:
print("Metrics for the Gaussian algorithm")
evaluate_metrics(y_test, y_pred_gauss)

In [None]:
print("Metrics for the random forest algorithm")
evaluate_metrics(y_test, y_pred_forest)

In [None]:
# Plotting the results


# create a mesh to plot in
h = 0.02  # step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# list of classifiers
classifiers = [knn, gauss, forest]

# List of titles for each plot accordingly
titles = ["KNN", "Gaussian", "Random Forest"]


plt.figure(figsize=(15,5))

for i,  clf in enumerate(classifiers):
    plt.subplot(1, 3, i + 1)
    Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])

    # Put the result into a color plot
    Z = Z.reshape((xx.shape[0], xx.shape[1], 3))
    plt.imshow(Z, extent=(x_min, x_max, y_min, y_max), origin="lower", alpha=.9)
    plt.colorbar()

    plt.scatter(X[:, 0], X[:, 1], c=np.array(["r", "g", "b"])[y], edgecolors=(0, 0, 0))
    plt.title(titles[i])
    plt.xlabel(xlabel=iris.feature_names[0])
    plt.ylabel(ylabel=iris.feature_names[1])
    


## Conclusions

As we can see from the results above we used KNN, Gaussian and random forest algorithms for the classification. Only two features were plotted, namely sepal length and sepal width. The intensity of color indicates higher probability of data point classified to corresponding class. 

By comparing the results we clearly notice that each algorithm creates different kind of probability for the classification. KNN probabilities overlap somewhat with each other but in random forest there is clearly rectangular division in the classification and it provides rough classification for each class. In Gaussian classifier the probability changes smoothly according to clusters.

## Evaluation

As we used the metrics to find best classifier to our model, we saw that Gaussian classifier got highest accuracy in all of the categories so it would be great option to use it in the future.