# Machine Learning - Iris Dataset

This notebook builds a machine learning model based on the Iris Dataset.

## Problem Definition

>Can we predict the species of iris based on an image of the flower?

## Data & Features

The data was downloaded from Kaggle: https://www.kaggle.com/datasets/uciml/iris

The columns in this dataset are:
* Id
* SepalLengthCm
* SepalWidthCm
* PetalLengthCm
* PetalWidthCm
* Species

## Evaluation

The evaluation metric for this project will be accuracy.

## Libraries

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
from joblib import dump, load

## Data Preparation

In [2]:
# Load in iris dataset
iris = load_iris()

In [3]:
# Assign iris data and target sets to variables
X = iris.data
y = iris.target

# Features are column names, assign to variables
feature_names = iris.feature_names
target_names = iris.target_names

In [4]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1701)

len(X_train), len(y_train), len(X_test), len(y_test)

(120, 120, 30, 30)

## Modeling & Evaluation

In [5]:
# Create model
knn = KNeighborsClassifier(n_neighbors=3)

# Train model
knn.fit(X_train, y_train)

# Generate predictions
y_pred = knn.predict(X_test)

In [6]:
# Evaulate predictions
print(metrics.accuracy_score(y_test, y_pred))

0.9333333333333333


The model does a great job on the test data.  Don't need any parameter tuning right now.

## Make predictions on new data

In [7]:
# New sample data to get predictions for
sample = [[3,5,4,2], [2,3,5,4]]
predictions = knn.predict(sample)
pred_species = [iris.target_names[p] for p in predictions]
print("predictions:", pred_species)

predictions: ['versicolor', 'virginica']


## Saving and loading the model

In [8]:
# Save the model
dump(knn, 'iris-model-knn.joblib')

['iris-model-knn.joblib']

In [9]:
# Load the model
model = load('iris-model-knn.joblib')

# Test predictions again
model.predict(X_test)
sample = [[3,5,4,2], [2,3,5,4]]
predictions = model.predict(sample)
pred_species = [iris.target_names[p] for p in predictions]
print("predictions:", pred_species)

predictions: ['versicolor', 'virginica']
