**Connecting** to Google Drive

In [82]:
from google.colab import drive
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Importing Python Modules

In [83]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(style="white", color_codes=True)

Connecting to Iris Dataset in Google Drive

In [84]:
path="/content/drive/MyDrive/Iris Dataset/Iris.csv"
iris=pd.read_csv(path)
iris.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


## Visualizations



Checking how many of each type of Flower species there is in the Dataset

In [None]:
iris["Species"].value_counts()

Visualizing the data using the seaborn module

In [None]:
sns.FacetGrid(iris, hue="Species", height=6).map(plt.scatter, "PetalLengthCm", "SepalWidthCm").add_legend()

# Assignments and Splits

Mapping the species to 3 different class numbers and reassigning iris["Species"]?

In [87]:
flower_mapping = {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 2}
iris["Species"] = iris["Species"].map(flower_mapping)
iris.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,0
1,2,4.9,3.0,1.4,0.2,0
2,3,4.7,3.2,1.3,0.2,0
3,4,4.6,3.1,1.5,0.2,0
4,5,5.0,3.6,1.4,0.2,0


Preparing the Inputs (x) and Outputs (y), and training and testing splits

In [None]:
x = iris[["SepalLengthCm","SepalWidthCm","PetalLengthCm","PetalWidthCm"]].values
y = iris[["Species"]].values
y

In [89]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y)
expected = y_test

# Logistic Regression model

In [90]:
from sklearn.linear_model import LogisticRegression

In [91]:
model = LogisticRegression()
model.fit(x_train,y_train) # Training happens behind this model.fit function call

Checking for Accuracy using .score method

In [119]:
model.score(x_test,y_test)  # Not the correct way to measure a model, don't use .score, do it by hand?

1.0

Making Predictions

In [93]:
predicted = model.predict(x_test)
predicted

array([1, 2, 0, 0, 1, 2, 0, 0, 1, 2, 2, 0, 0, 2, 2, 1, 1, 1, 0, 2, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 2, 1, 1, 1, 2, 0, 1, 1])

Summarizing the fit of the model into something we can understand

In [94]:
from sklearn import metrics

A better way to measure accuracy for each class/species

In [95]:
print(metrics.classification_report(expected, predicted))
# f1-score is kind of like the average score for precision and recall, a better
# way to measure your accuracy

print(metrics.confusion_matrix(expected, predicted))
# Measures (out of the 150 flowers) how many were predicted correctly
# for versicular, 3 were classified as virginica
# and for virginica, 1 was classified as versicular
# all 50 of setosa were identified correctly

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        17
           1       1.00      1.00      1.00        12
           2       1.00      1.00      1.00         9

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38

[[17  0  0]
 [ 0 12  0]
 [ 0  0  9]]


# KNeighbors Classifier Model

In [105]:
from sklearn.neighbors import KNeighborsClassifier
model_k = KNeighborsClassifier()

In [106]:
model_k.fit(x_train, y_train)

In [107]:
predicted_k = model_k.predict(x_test)
predicted_k

array([1, 2, 0, 0, 1, 2, 0, 0, 1, 2, 2, 0, 0, 2, 2, 1, 2, 1, 0, 2, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 2, 1, 1, 1, 2, 0, 1, 1])

In [130]:
print(metrics.classification_report(expected, predicted_k))
print(metrics.confusion_matrix(expected, predicted_k))
print("\n")
model_k.score(x_test,y_test)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        17
           1       1.00      0.92      0.96        12
           2       0.90      1.00      0.95         9

    accuracy                           0.97        38
   macro avg       0.97      0.97      0.97        38
weighted avg       0.98      0.97      0.97        38

[[17  0  0]
 [ 0 11  1]
 [ 0  0  9]]




0.9736842105263158

# Support Vector Model (SVM)

In [110]:
from sklearn import svm
model_svm = svm.SVC()

In [111]:
model_svm.fit(x_train, y_train)

In [132]:
predicted_svm = model_svm.predict(x_test)
predicted_svm

array([1, 2, 0, 0, 1, 2, 0, 0, 1, 2, 2, 0, 0, 2, 2, 1, 1, 1, 0, 2, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 2, 1, 1, 1, 2, 0, 1, 1])

In [131]:
print(metrics.classification_report(expected, predicted_svm))
print(metrics.confusion_matrix(expected, predicted_svm))
model_svm.score(x_test,y_test)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        17
           1       1.00      1.00      1.00        12
           2       1.00      1.00      1.00         9

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38

[[17  0  0]
 [ 0 12  0]
 [ 0  0  9]]


1.0

# Random Forest Model


In [120]:
from sklearn.ensemble import RandomForestClassifier
model_forest = RandomForestClassifier()

In [121]:
model_forest.fit(x_train, y_train)

In [133]:
predicted_tree = model_forest.predict(x_test)

In [134]:
print(metrics.classification_report(expected, predicted_tree))
print(metrics.confusion_matrix(expected, predicted_tree))
model_forest.score(x_test, y_test)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        17
           1       0.92      0.92      0.92        12
           2       0.89      0.89      0.89         9

    accuracy                           0.95        38
   macro avg       0.94      0.94      0.94        38
weighted avg       0.95      0.95      0.95        38

[[17  0  0]
 [ 0 11  1]
 [ 0  1  8]]


0.9473684210526315

# **Logistic Regression Models and SVMs work best with the Iris dataset, with the KNeighbors Classifier Model in second and the Random Forest Model in last.**