# Class 3: Training Neural Networks

**Week 8: Introduction to Neural Networks and Deep Learning**

Welcome to Class 3 of Week 8! Today, we’ll dive into **training neural networks**, exploring how they learn from data using forward/backward propagation, gradient descent, and loss functions. We’ll train the Iris classification model from Class 2, evaluate its performance, and visualize the training process.

## Objectives
- Understand forward and backward propagation.
- Learn how gradient descent optimizes weights.
- Explore loss functions for classification.
- Train and evaluate a neural network using TensorFlow.
- Experiment with training hyperparameters (epochs, batch size).

## Agenda
1. How neural networks learn: Forward/backward propagation.
2. Gradient descent and loss functions.
3. Training the Iris model (demo).
4. Exercise: Train and tune the model.

Let’s get started!

## 1. How Neural Networks Learn: Forward/Backward Propagation

Training a neural network involves adjusting **weights** and **biases** to minimize errors in predictions. This happens in two steps:

- **Forward Propagation**:
  - Inputs pass through layers (weighted sums, activation functions) to produce predictions.
  - Example: For Iris, 4 features → hidden layer (ReLU) → output layer (softmax) → probabilities for 3 classes.
  - The **loss function** measures how far predictions are from true labels.

- **Backward Propagation** (Backpropagation):
  - Computes **gradients** of the loss with respect to weights/biases using the chain rule.
  - Gradients indicate how to adjust parameters to reduce loss.

**Analogy**: Forward propagation is like guessing an answer; backpropagation is like learning from the mistake to guess better next time.

## 2. Gradient Descent and Loss Functions

**Gradient Descent** updates weights/biases in the direction that reduces the loss:
- Formula: `weight = weight - learning_rate * gradient`
- **Learning Rate**: Controls step size (e.g., 0.001). Too big → overshoots; too small → slow learning.
- **Optimizer**: Algorithms like **Adam** (used in our model) improve on basic gradient descent.

**Loss Functions** measure prediction error:
- **Sparse Categorical Crossentropy**: Used for multi-class classification (like Iris).
  - Compares predicted probabilities (softmax outputs) to true labels.
- Others: Mean squared error (regression), binary crossentropy (two classes).

**Training Process**:
- **Epoch**: One pass through the entire training dataset.
- **Batch Size**: Number of samples processed before updating weights (e.g., 32).
- Goal: Minimize loss while improving accuracy.

Let’s train our Iris model to see this in action.

## 3. Training the Iris Model (Demo)

We’ll use the neural network from Class 2 (4 inputs → 10 hidden neurons → 3 outputs) and train it on the Iris dataset. We’ll:
- Load and preprocess the data.
- Build the model.
- Train it for 50 epochs.
- Evaluate on the test set.
- Plot training loss and accuracy.

Run the code below.

In [None]:
# Import libraries
import tensorflow as tf
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import numpy as np

# Load and preprocess Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Build the model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation='relu', input_shape=(4,)),
    tf.keras.layers.Dense(3, activation='softmax')
])

# Compile the model
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train the model
history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)

# Evaluate on test set
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f'\nTest Loss: {test_loss:.4f}')
print(f'Test Accuracy: {test_accuracy:.4f}')

# Plot training history
plt.figure(figsize=(12, 4))

# Plot loss
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Loss Over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

# Plot accuracy
plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Accuracy Over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

**Explanation**:
- **Data Prep**: Standardized features (like Class 2) for faster training.
- **Model**: Same 4-10-3 architecture (4 inputs, 10 hidden, 3 outputs).
- **Training**:
  - `epochs=50`: 50 passes through the data.
  - `batch_size=32`: Updates weights after every 32 samples.
  - `validation_split=0.2`: Uses 20% of training data to monitor performance.
- **Evaluation**: Test loss/accuracy shows how well the model generalizes.
- **Plots**:
  - **Loss**: Should decrease as the model learns.
  - **Accuracy**: Should increase, ideally close to 1.0 for Iris.
  - **Validation**: Tracks overfitting (if val_loss rises while loss drops).

What do you notice about the trends? Are training and validation metrics similar?

## 4. Exercise: Train and Tune the Model

Your turn! Train a new model by modifying the training process or architecture. Try one or more of:
- Change the number of **epochs** (e.g., 100 instead of 50).
- Adjust the **batch size** (e.g., 16 or 64).
- Modify the model (e.g., add a hidden layer with 8 neurons).
- Change the **optimizer** (e.g., `sgd` instead of `adam`).

**Task**:
1. Copy the code below and make at least one change.
2. Train the model and evaluate it.
3. Plot the loss and accuracy.
4. Answer the questions below.

Use the template to start.

In [None]:
# Build your model
your_model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation='relu', input_shape=(4,)),
    tf.keras.layers.Dense(3, activation='softmax')
])

# Compile your model
your_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train your model
your_history = your_model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)

# Evaluate
your_test_loss, your_test_accuracy = your_model.evaluate(X_test, y_test, verbose=0)
print(f'\nYour Test Loss: {your_test_loss:.4f}')
print(f'Your Test Accuracy: {your_test_accuracy:.4f}')

# Plot results
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(your_history.history['loss'], label='Training Loss')
plt.plot(your_history.history['val_loss'], label='Validation Loss')
plt.title('Your Loss Over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot(your_history.history['accuracy'], label='Training Accuracy')
plt.plot(your_history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Your Accuracy Over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

**Questions**:
1. What changes did you make (e.g., epochs, batch size, model)?
2. What’s your final test accuracy? How does it compare to the demo’s?
3. Look at the plots. Is the model overfitting? (Hint: Check if validation loss rises while training loss drops.)

Write your answers below.

## Your Answers

1. **Changes made**: ______
2. **Test accuracy and comparison**: ______
3. **Overfitting observation**: ______

## Wrap-Up

Great work! Today, you:
- Learned how forward/backward propagation works.
- Understood gradient descent and loss functions.
- Trained a neural network on Iris and evaluated it.
- Visualized training progress and tuned hyperparameters.

**Homework**:
- Try training on a new dataset, like `load_digits` from scikit-learn (8x8 digit images).
  - Hint: Adjust `input_shape` to `(64,)` and output to `Dense(10, activation='softmax')` for 10 classes.
- Experiment with more epochs or a different learning rate (e.g., `optimizer=tf.keras.optimizers.Adam(learning_rate=0.01)`).
- Optional: Read about optimizers in [TensorFlow’s guide](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers).

**Next Class**: We’ll apply neural networks to a larger dataset (MNIST) and compare them to scikit-learn models for our mini-project!

**Tip**: Ensure TensorFlow is working:
```bash
pip install tensorflow
```