# Flower Classification using Naive Bayes

In this notebook I will build a model to classify the classic Iris dataset. This dataset shows a set of measurements of the petals and sepals of three flower species: setosa, versicolor and virginica.

### Step 01: Importing Libs

I will use pandas library for data structures, and sklearn for everything that involves learning

In [1]:
import pandas as pd

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

### Step 02: Loading Iris Dataset

The iris dataset is available on sklearn. Separating Training and Testing Base.

In [6]:
iris = load_iris()

X, y = iris.data, iris.target
class_names = iris.target_names


X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

### Step 03: Training model and get Accuracy

The code trains a Naive Bayes Gaussian model with training data, makes understanding with test data and prints the accuracy of these differences.

In [7]:
gnb = GaussianNB()
y_pred = gnb.fit(X_train, y_train).predict(X_test)

accuracy = gnb.score(X_test, y_test)
print("Accuracy: ", accuracy)

Accuracy:  1.0


### Step 04: How well the Model is Performing?

Generate a detailed classification report to evaluate the model's performance for each class, allowing you to identify which classes are being correctly classified and which may need improvement.

In [8]:
print(classification_report(y_test, y_pred, target_names=iris.target_names))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        13
  versicolor       1.00      1.00      1.00        16
   virginica       1.00      1.00      1.00         9

    accuracy                           1.00        38
   macro avg       1.00      1.00      1.00        38
weighted avg       1.00      1.00      1.00        38



### Step 05: Confusion Matrix

Displays a confusion matrix to evaluate the performance of the classification model. The confusion matrix is a table that allows you to visualize the performance of the classification algorithm, showing the count of true positives, false positives, true negatives and false negatives for each class

In [9]:
c_matrix = confusion_matrix(y_test, y_pred)
c_table = pd.DataFrame(data=c_matrix, index=iris.target_names, columns=[x + " (prev)" for x in iris.target_names])
print(c_table)

            setosa (prev)  versicolor (prev)  virginica (prev)
setosa                 13                  0                 0
versicolor              0                 16                 0
virginica               0                  0                 9


### Considerations

It is a very simple dataset, which is why the results were so excellent, the model correctly classified all species