# **Iris Classification with TinyML**
**`TinyML`**

## **Scenario**
A portable device used to classify iris flowers as **`Setosa`** or **`Versicolor`** exemplifies the application of **`TinyML`** in real-life scenarios, showcasing the power of machine learning on small, low-power devices. Such a device can be deployed in agricultural settings to assist farmers in quickly identifying plant species, enabling more efficient crop management and biodiversity monitoring. By leveraging TinyML, this device can operate **offline**, providing **instant**, on-the-spot classification without the need for cloud connectivity, thus ensuring privacy and reducing latency. This real-world application highlights the potential of TinyML to bring intelligent solutions to remote and resource-constrained environments.

In [1]:
# block warnings
import warnings
warnings.filterwarnings('ignore')

## **Importing Data**

In [8]:
import pandas as pd
import numpy as np

df = pd.read_csv('iris_dataset.csv')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


## **Viewing Data**

In [9]:
df.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


## **Data Cleaning**

In [11]:
# checking for null values
df.isna().sum()

sepal_length    0
sepal_width     0
petal_length    0
petal_width     0
species         0
dtype: int64

In [12]:
# checking for NaN values
df.duplicated().sum()

np.int64(3)

In [13]:
# removing duplicates
df.drop_duplicates(inplace=True)
df.reset_index(drop=True)
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


## **Data Preprocessing**

In [14]:
# defining numerical and categorical columns
num_cols = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
cat_cols = ['species']

- ### **Standardizing Data**

In [15]:
from sklearn.preprocessing import StandardScaler

# dictionary to store scalers
scalers = {}

# scaling numerical columns
for column in num_cols:
    scaler = StandardScaler()
    df[column] = scaler.fit_transform(df[[column]])
    scalers[column] = scaler

df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,-0.915509,1.019971,-1.357737,-1.3357,Iris-setosa
1,-1.15756,-0.128082,-1.357737,-1.3357,Iris-setosa
2,-1.39961,0.331139,-1.414778,-1.3357,Iris-setosa
3,-1.520635,0.101529,-1.300696,-1.3357,Iris-setosa
4,-1.036535,1.249582,-1.357737,-1.3357,Iris-setosa


- ### **Extracting Scaler `Mean` and `Std. Deviation`**
These scaler data will be used in **`Arduino`** to scale the input data into same scale as trained data.

In [16]:
# extracting parameters
scaler_params = {column: {'mean': scaler.mean_[0], 'std': scaler.scale_[0]} for column, scaler in scalers.items()}

# print these parameters for use in Arduino
for column, params in scaler_params.items():
    print(f'Column: {column}')
    print(f'  Mean: {params["mean"]}')
    print(f'  Std: {params["std"]}')
    print()

Column: sepal_length
  Mean: 5.856462585034014
  Std: 0.8262749807650088

Column: sepal_width
  Mean: 3.05578231292517
  Std: 0.435519746063226

Column: petal_length
  Mean: 3.780272108843538
  Std: 1.7531173189835423

Column: petal_width
  Mean: 1.2088435374149662
  Std: 0.7552920028261023



- ### **Encoding target labels**

In [17]:
df['species'].unique() 

array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)

In [18]:
# Define a mapping dictionary
species_id = {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 2}

# Apply .map() to transform 'fruit' column
df['species'] = df['species'].map(species_id)

df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,-0.915509,1.019971,-1.357737,-1.3357,0
1,-1.15756,-0.128082,-1.357737,-1.3357,0
2,-1.39961,0.331139,-1.414778,-1.3357,0
3,-1.520635,0.101529,-1.300696,-1.3357,0
4,-1.036535,1.249582,-1.357737,-1.3357,0


- ### **Splitting Data**

In [19]:
from sklearn.model_selection import train_test_split

X = df.drop('species', axis = 1)
y = df['species']

# keeping only two columns
X_binary = X[y != 2]
y_binary = y[y != 2]

# split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_binary, y_binary, test_size=0.2, random_state=42)

## **Machine Learning**

In [20]:
# performing logistic regression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.linear_model import LogisticRegression

# training model
model = LogisticRegression()
model.fit(X_train, y_train)

# making classifications
pred = model.predict(X_test)

# checking score
LR = accuracy_score(y_test, pred)

# printing Classification Report
print(classification_report(y_test, pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        12
           1       1.00      1.00      1.00         8

    accuracy                           1.00        20
   macro avg       1.00      1.00      1.00        20
weighted avg       1.00      1.00      1.00        20



- ### **Extracting Intercept and Coefficient of Logistic Regression Model**

In [21]:
# Print the intercept and coefficients
intercept = model.intercept_[0]
print("Intercept:", intercept)

coefficients = model.coef_[0]
print("Coefficients:", coefficients)

Intercept: 2.3823336105565747
Coefficients: [ 1.01339574 -1.17619725  1.64983467  1.48109115]


- ### **Testing with Data**

In [22]:
# selecting the row
row = 1

# printing results
print(f'Input: {list(X_test.iloc[row])}')
print(f'Output: {y_test.iloc[row]}')
print(f'Predicted: {model.predict(np.array(list(X_test.iloc[row])).reshape(1, -1))}')

Input: [-1.7626850854004361, 0.3311392614880281, -1.4147781680016716, -1.3357000122338663]
Output: 0
Predicted: [0]


In [23]:
# selecting the row
row = 4

# printing results
print(f'Input: {list(X_test.iloc[row])}')
print(f'Output: {y_test.iloc[row]}')
print(f'Predicted: {model.predict(np.array(list(X_test.iloc[row])).reshape(1, -1))}')

Input: [0.17371627885076682, -0.8169143101803691, 0.7527892610870106, 0.5178877323226384]
Output: 1
Predicted: [1]


Thus, we have a `100%` accuracy with `Logistic Regression`. We have also successfully extracted the `mean` and `standard deviation` of the standard scaler and also the `intercept` and `coefficients` of the line of best fit. Furthur, we will implement these data into `Embedded Systems` and predict the resultant.

# **THANK YOU**
---