# Iris Flower Classification Neural Network with PyTorch
A lightweight 3-layer neural network implemented in PyTorch for classifying Iris flower species (setosa, versicolor, virginica) based on four morphological features (sepal length/width, petal length/width). The model demonstrates:
* **Architecture:** Input layer (4 nodes) → Hidden Layer 1 (8 nodes, ReLU) → Hidden Layer 2 (9 nodes, ReLU) → Output Layer (3 nodes)
* **Training:** Uses Adam optimizer (lr=0.01) with CrossEntropyLoss over 100 epochs
* **Performance:** Achieves ~95%+ accuracy with proper initialization (random_state=32)

In [41]:
import torch 
import torch.nn as nn
import torch.nn.functional as F

* in features: sepal length, sepal width, petal length, petal width
* out features: Iris Setosa, Iris Versicolour, or Iris Virginica

![](https://raw.githubusercontent.com/damiannolan/iris-neural-network/14a9df14a57ab9d350b7bc92b2903fa1f25c4f1c/img/iris_model.png)

In [42]:
class Model(nn.Module):
    # Input Layer (4 features of flowers) --> HL1 (number of neurons) --> HL2(n) --> Ouput(3 Classes of Iris Flower)
    # fc -- fully connected 1 , fully connected 2 
    def __init__(self, in_features=4, h1=8, h2=9, out_features=3):
        super().__init__()
        self.fc1 = nn.Linear(in_features,h1)    # start from in_features and move to h1 , fc(fully connected)
        self.fc2 = nn.Linear(h1,h2)             # start from h1 and move to h2 
        self.out = nn.Linear(h2,out_features)   # start from h2 and move to out_features 
        
                                                # Relu stands for rectified linear unit
    def forward(self,x):
        x = F.relu(self.fc1(x))                 # if output is less than 0 , then use 0 , else leave what it is. 
        x = F.relu(self.fc2(x))                 # if output is less than 0 , then use 0 , else leave what it is. 
        x = self.out(x)
        return x 

`forward` function in your code implements **forward propagation** in the neural network. Here's a breakdown of how it works:
1. Takes Input x:
    * x represents the input data (e.g., 4 features of Iris flowers: sepal length, width, petal length, width).
2. Passes Through Layers:
    * Step 1: 
        * x = F.relu(self.fc1(x))
        * Input x is passed through the first fully connected layer (fc1), then the ReLU activation function is applied.
        * ReLU replaces negative values with 0 and keeps positive values unchanged.
    * Step 2: 
        * x = F.relu(self.fc2(x))
        * The output from fc1 is passed through the second fully connected layer (fc2), followed by another ReLU.
    * Step 3: x = self.out(x)
        * The final layer (out) produces raw scores (logits) for the 3 Iris flower classes without activation (no softmax here!).
3. Returns Output:
    * The raw scores (logits) for each class are returned. These will later be fed into a loss function (e.g., CrossEntropyLoss, which internally applies softmax). 

### Here's a complete example with actual numbers to show how the calculations work in your Iris classifier:
#### Example Input (1 Iris flower with 4 features):

```py
 x = [5.1, 3.5, 1.4, 0.2]  # sepal_len, sepal_wid, petal_len, petal_wid
```

#### Layer 1 (fc1) Parameters:
Let's assume these random weights and biases were initialized:

**Weights (8×4 matrix):**

```   W1 = [
    [0.1, -0.2, 0.3, -0.4],  # Neuron 1 weights
    [0.5, -0.1, 0.2, -0.3],  # Neuron 2 weights
    [-0.2, 0.3, -0.1, 0.4],  # Neuron 3 weights
    [0.4, -0.3, 0.2, -0.1],  # Neuron 4 weights
    [0.2, 0.1, -0.3, 0.4],   # Neuron 5 weights
    [-0.1, 0.4, -0.2, 0.3],  # Neuron 6 weights
    [0.3, -0.4, 0.1, -0.2],  # Neuron 7 weights
    [-0.3, 0.2, -0.4, 0.1]   # Neuron 8 weights
]
```
**Bias (8×1 vector):**
```py
b1 = [0.1, -0.1, 0.2, -0.2, 0.3, -0.3, 0.4, -0.4]
```

### Calculation for fc1:
#### 1. Matrix Multiplication (x × W1^T):
```py
# For first neuron:
(5.1×0.1) + (3.5×-0.2) + (1.4×0.3) + (0.2×-0.4) = 0.51 - 0.7 + 0.42 - 0.08 = 0.15

# Similarly for all 8 neurons:
z1 = [0.15, 1.27, -0.38, 1.08, 0.82, -0.27, 0.53, -1.12]
```

#### 2. Add Bias
```py
z1 + b1 = [0.15+0.1, 1.27-0.1, -0.38+0.2, 1.08-0.2, 0.82+0.3, -0.27-0.3, 0.53+0.4, -1.12-0.4]
        = [0.25, 1.17, -0.18, 0.88, 1.12, -0.57, 0.93, -1.52]
```

#### Apply ReLU:
```py
ReLU(z1 + b1) = [max(0,0.25), max(0,1.17), max(0,-0.18), 
                max(0,0.88), max(0,1.12), max(0,-0.57),
                max(0,0.93), max(0,-1.52)]
             = [0.25, 1.17, 0, 0.88, 1.12, 0, 0.93, 0]
```

#### Visualization:
| Operation |	Neuron 1    |   Neuron 2   |   Neuron 3  |    Neuron 4 |  Neuron 5    | Neuron 6   | Neuron 7  |  Neuron 8 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| W×x |   0.15    |  1.27   |  -0.38 |   1.08    |  0.82   |  -0.27 |   0.53    |  -1.12 |
| + bias  |   0.25 | 1.17    |    -0.18  |    0.88 |  1.12    | -0.57  | 0.93 |   -1.52 |
| ReLU    |   0.25   |   1.17  |    0    |  0.88   |  1.12  |   0    | 0.93   | 0 |


In [43]:
# Pick a manual seed for randomization 
torch.manual_seed(41)
# Create instance of a model
model = Model()

In [44]:
import pandas as pd 
import matplotlib.pyplot as plt 
%matplotlib inline

In [45]:
url = 'https://gist.githubusercontent.com/curran/a08a1080b88344b0c8a7/raw/0e7a9b0a5d22642a06d3d5b9bcbad9890c8ee534/iris.csv'
my_df = pd.read_csv(url)

In [46]:
my_df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [47]:
my_df['species'] = my_df['species'].replace('setosa',0.0)
my_df['species'] = my_df['species'].replace('virginica',1.0)
my_df['species'] = my_df['species'].replace('versicolor',2.0)
my_df

  my_df['species'] = my_df['species'].replace('versicolor',2.0)


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,1.0
146,6.3,2.5,5.0,1.9,1.0
147,6.5,3.0,5.2,2.0,1.0
148,6.2,3.4,5.4,2.3,1.0


In [48]:
# Train Test Split! Set X,y
X = my_df.drop('species',axis=1)
y = my_df['species']

* **X (Features):**
    * Contains all columns except 'species' (sepal_length, sepal_width, petal_length, petal_width)
    * These are the input measurements the model will use to make predictions.
    * Shape: (150, 4) for 150 flowers with 4 features each.
* **y (Target):**
    * Contains only the 'species' column (converted to numbers: 0.0=setosa, 1.0=virginica, 2.0=versicolor)
    * These are the correct answers the model will learn to predict.
    * Shape: (150,) (a 1D array of labels).

In [49]:
# Convert these to numpy arrays 
X = X.values
y = y.values

This is typically followed by splitting X and y into training and test sets
* Features (X) → The model learns patterns from these measurements.
* Target (y) → The model tries to predict these labels correctly.
* Train/Test Split (coming next) → Ensures you can evaluate the model on unseen data.

In [50]:
from sklearn.model_selection import train_test_split

In [51]:
# Train test split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=32)

In [52]:
# Convert X features to float tensors
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)

In [53]:
# Convert y labels to tensor logs
y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

When we have a neural network with multiple output values like [1.43, -0.4, 0.23], we often run the data through argmax to make the output easy to interpret, but because ArgMax has a terrible derivative, we cant use it with backpropagation. So, in ourder to train a neural network we use a softmax function, and the softmax output values are the predicted probabilities between 0 and 1.

When the output is restricted between 0 and 1, we use CrossEntropy to determine how well the neural network fits the data. 
$$  CrossEntropy = -\log(e^{\text{softmax}}) $$ 

Now, to get the total error for the Neural Network, all we do is add up the **CrossEntropy** values. And, we can use Backpropagation to adjust the weights and biases and hopefully minimize the total error. 

In [54]:
# Set the criterion of the model to measure the error
criterion = nn.CrossEntropyLoss()
# Choose Adam optimzer, lr = learning rate (if error doesn't go down after a bunch of iterations (epochs) , lower our learning rate)
optimizer = torch.optim.Adam(model.parameters() ,lr=0.01)

In [55]:
# train our model 
# epochs? (one run thru all the training data in our network)
epoch = 100 
losses = []
for i in range(epoch):
    # Go forward and get a prediction 
    y_pred = model.forward(X_train)     # Get predicted result
    
    # Measure a loss 
    loss = criterion(y_pred, y_train)   # predicted value vs the y_train
    
    # Keep track of losses
    losses.append(loss.detach().numpy())
    
    # Print every 10 epochs 
    if i % 10 == 0:
        print(f'Epoch : {i} and loss : {loss}')
    
    # Do some backpropagation: take the error rate of forward propagation and feed it back thru the network to finetune the weights 
    optimizer.zero_grad()       # "Reset error tracking before the next attempt"
    loss.backward()             # "Trace error backward to see which weights caused it"
    optimizer.step()            # "Update weights to reduce future errors"

Epoch : 0 and loss : 1.1369255781173706
Epoch : 10 and loss : 1.054518461227417
Epoch : 20 and loss : 0.9172936081886292
Epoch : 30 and loss : 0.6350035071372986
Epoch : 40 and loss : 0.4044587016105652
Epoch : 50 and loss : 0.2485925257205963
Epoch : 60 and loss : 0.1463107168674469
Epoch : 70 and loss : 0.09416623413562775
Epoch : 80 and loss : 0.07249684631824493
Epoch : 90 and loss : 0.06299241632223129


In [56]:
# Evaulate model in test data
with torch.no_grad():
    y_eval = model.forward(X_test)
    loss = criterion(y_eval, y_test)

* `with torch.no_grad():`
    * Purpose: Temporarily turns off gradient tracking.
    * Why?
        * During evaluation, you don't need to calculate gradients (no weight updates).
        * Saves memory and speeds up computation.
    * Analogy: Like taking a test without a teacher grading your mistakes afterward.

* `y_eval = model.forward(X_test)`
    * What it does:
        * Passes the test data (X_test) through the model to get predictions (y_eval).
    * These are raw logits (unnormalized scores for each class).
    * Example output for 3-class Iris:

* `loss = criterion(y_eval, y_test)`
    * What it does:
        * Computes the loss (error) between predictions (y_eval) and true labels (y_test).
    * Uses CrossEntropyLoss, which:
        * Applies softmax to convert logits → probabilities.
        * Compares probabilities to true labels.

In [57]:
loss

tensor(0.0404)

In [58]:
correct = 0 
with torch.no_grad():
    for i, data in enumerate(X_test):
        y_val = model.forward(data)
        
        if y_test[i] == 0:
            x = "setosa"
        elif y_test[i] == 1:
            x = "virginica"
        else:
            x = "versicolor"
        
        
        # What type of flower class our network thinks it is 
        print(f'{i+1}.) {str(y_val)} \t {y_test[i]} \t {y_val.argmax().item()}') 

        # Correct or not
        if y_val.argmax().item() == y_test[i]:
            correct += 1

print(f'We got {correct} correct')

1.) tensor([-3.0885,  1.4384,  5.0835]) 	 2 	 2
2.) tensor([ 15.0765, -14.9693,   6.9545]) 	 0 	 0
3.) tensor([ 13.3655, -13.6654,   7.3508]) 	 0 	 0
4.) tensor([-3.3278,  1.4888,  5.6038]) 	 2 	 2
5.) tensor([-7.9918,  6.5556,  2.9374]) 	 1 	 1
6.) tensor([-8.8077,  6.8451,  4.3425]) 	 1 	 1
7.) tensor([ 12.6298, -13.0770,   7.4646]) 	 0 	 0
8.) tensor([ 13.7895, -13.9569,   7.1781]) 	 0 	 0
9.) tensor([-2.6056,  0.8098,  5.6859]) 	 2 	 2
10.) tensor([ 14.2774, -14.4782,   7.4792]) 	 0 	 0
11.) tensor([-3.5350,  1.5664,  5.9480]) 	 2 	 2
12.) tensor([-9.3633,  7.8234,  2.8512]) 	 1 	 1
13.) tensor([-0.6575, -0.9401,  5.6566]) 	 2 	 2
14.) tensor([ 0.0114, -1.7342,  6.3398]) 	 2 	 2
15.) tensor([-7.9024,  6.2620,  3.6053]) 	 1 	 1
16.) tensor([-9.3989,  8.0891,  2.1191]) 	 1 	 1
17.) tensor([-3.6551,  2.0553,  4.7556]) 	 2 	 2
18.) tensor([-6.7769,  5.1319,  3.9586]) 	 1 	 1
19.) tensor([-0.5989, -1.1255,  6.1067]) 	 2 	 2
20.) tensor([ 15.3395, -15.4729,   7.7586]) 	 0 	 0
21.) tensor

We can see that  `16.) tensor([-5.4799,  3.9468,  4.1003]) 	 1 	 2` , is incorrect, if the `random_state=41`, but when `random_state=32`, it gives correct result and also the loss `tensor(0.0404)` is close to the minimum loss obtained while training `0.06299241632223129`

In [None]:
new_iris = torch.tensor([4.7, 3.2, 1.3, 0.2])
# Evaulate model in new data
with torch.no_grad():
    print(model(new_iris))

Whichever is the biggest number `13.8397`, is the species of the provided input data ie, setosa in our case