# Lab 07: Evaluation Loops
In lecture, we looked at how to evaluate machine learning algorithms. 

In this lab, we'll:
- Build on last week's Iris data set manipulation and SimpleNeuralNetwork
- Write the code for evaluating the neural network

The goal is to understand clearly what are evaluations are doing, in code.

If you haven't done it yet, `pip install scikit-learn` in order to have access to the `sklearn` library

**SUBMISSION**

You should submit a `lab07-yourname.py` file on Moodle, NOT a Jupyter Notebook. PLEASE DO NOT USE ARABIC IN THE FILENAME.

**LAB CLASS SOLUTIONS**

You may use the code that is covered in class time, but you _must_ (re-)type it yourself!!  So, during lab class, I recommend that you open a `lab07-yourname.py` file in VSCode and try to run bits of it as we go along.

**DUE DATE**

14th May, 2025 -- 1 week

**GRADING**

This Lab is worth 12.5% of your overall course grade. Completeness, correct output/answers, and style are all part of the criteria. There is an optional extra credit portion that is challenging, and can be worth an extra 50% on this assignment.

**LATE WORK**

Late work will be penalized by 25 points. However, it can be submitted until the day of the Final Exam.


# Importing data and SimpleNeuralNetwork
We'll start with essentially what we did last time. Here's the data.

In [None]:
from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt
from neuralnetworks import *

iris = datasets.load_iris()
X = iris.data
y = iris.target

# Index of flower species to remove
idx_remove_species = y < 2

# # Limit to 2 flower species
y = y[idx_remove_species]
X = X[idx_remove_species]

# Limit to 2 features
X = X[:, [0, 2]]
print(f"Input has shape {X.shape}")
print(f"Output has shape {y.shape}")

## EXERCISE: Shuffle and split the data
Write a function that randomly splits the data `X` and `y`, with 80% going into a train set and 20% going into a test set. 

**Hint 1**: Instead of directly splitting the data, pick indices to be in one group or the other. This way, you can split both `X` and `y` the same way.

**Hint 2**: Use `np.shuffle()` for the randomization of indices.



In [None]:
### YOUR CODE HERE


Now let's visualize that data. We'll plot the 2 remaining X (feature) values on a scatter plot and show what class they are part of with color.

Also, we'll define a SimpleNeuralNetwork with random guessed weights, and see what kind of surface it defines. We've implemented a `plot_decision_surface()` method in `SimpleNeuralNetwork` that is called with `X` and `y`. See below.

In [None]:
nn = SimpleNeuralNetwork(2, 3)
# nn.plot_decision_surface(X_train, y_train)


## EXERCISE (UNGRADED): Train the network, one epoch at a time
We have `forward` and `backward` methods in our `SimpleNeuralNetwork`, appropriate for updating the weights for one training epoch. 

* Write a simple for loop that calls these `forward` and `backward` 1000 times. Use 0.03 for your `learning_rate`.
* After you're done, visualize the decision surface again with `nn.plot_decision_surface()`. 

THIS PORTION DOES NOT NEED TO BE TURNED IN, SINCE WE WILL MODIFY/UPDATE IT LATER.

In [None]:
### YOUR CODE HERE -- UNGRADED


## EXERCISE: Get performance on the dev set
It's great that we can learn weights and biases via the backprop algorithm (what we just did by calling `forward` and `backward`)! But instead of just looking at the plot, let's try to characterize our performance.

* Split the data again so you have 80% `train` + 10% `dev` + 10% `other`.
* Train your epochs again on the `train` set only.
* Run `predict` on your `dev` data, then compare with the gold standard.
* Compare the `predict`ions with the gold standard to calculate an accuracy score.

In [None]:
### YOUR CODE HERE


## EXTRA CREDIT
There are many other things that could be done with this data and setup. Do any of the following to get 50 points of extra credit on the lab:

* Track the train/dev set loss, and test for convergence (when the loss stops changing much). What criteria did you find could work programmatically for convergence?
* Use more difficult data, for example, the harder-to-separate classes `1` and `2` (versicolor and virginica), and ensure the learner can separate these. What hyperparameter changes do you need to make to distinguish between them? Show a record of the different hyperparameters you tried.
* Change the underlying `neuralnetworks` module to accommodate 3 classes. This requires replacing the `sigmoid` activation at the output with `softmax`, and choosing the top probability (i.e., `argmax`) in the output layer.

These are hard! Good luck.