# What are Hyperparameters?

## Definition
Hyperparameters are configuration variables that control the learning process in machine learning and deep learning models. Unlike model parameters that are learned during training, hyperparameters must be set before training begins.

## Key Characteristics
1. **External Configuration**
   - Set before training
   - Not learned from data
   - Require manual tuning or automated optimization

2. **Impact on Learning**
   - Control model complexity
   - Influence training speed
   - Affect model performance
   - Determine convergence behavior

## Categories
1. **Model Hyperparameters**
   - Define model architecture
   - Example: 
      - number of layers
         - More layers → more complex patterns
         - But risk of vanishing/exploding gradients
      - number of hidden units (neurons). Controls network width
         - More units → more capacity to learn
         - Trade-off with computational cost

2. **Training Hyperparameters**
   - Control learning process
   - Example: learning rate, batch size, epochs

3. **Regularization Hyperparameters**
   - Prevent overfitting
   - Example: dropout rate, L1/L2 penalties

## Importance
- Critical for model success
- Different from learned parameters
- Require experimentation
- Often problem-specific
- Can significantly impact:
  - Training speed
  - Model performance
  - Generalization ability
  - Resource utilization

In [3]:
import torch
import torch.nn as nn

# Architecture Hyperparameters set by hand
n_input = 28 * 28  # input features (28x28 pixels for MNIST)
n_hidden_layers = 3
n_neurons = [128, 64, 32] # neurons in each hidden layer
n_output = 10 # output classes (digits 0-9)

class NeuralNetwork(nn.Module):
    def __init__(self, n_input, n_hidden_layers, n_neurons, n_output):
        super().__init__()

        self.layers = nn.ModuleList()

        # Input layer
        self.layers.append(nn.Linear(n_input, n_neurons[0]))
        self.layers.append(nn.ReLU())

        # Hidden layers
        for i in range(n_hidden_layers - 1):
            self.layers.append(nn.Linear(n_neurons[i], n_neurons[i+1]))
            self.layers.append(nn.ReLU())

        # Output layer
        self.layers.append(nn.Linear(n_neurons[-1], n_output))

    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x

model = NeuralNetwork(n_input, n_hidden_layers, n_neurons, n_output)

print("Model Architecture:")
print(model)
print("\nTotal number of parameters:", sum(p.numel() for p in model.parameters()))

Model Architecture:
NeuralNetwork(
  (layers): ModuleList(
    (0): Linear(in_features=784, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=64, bias=True)
    (3): ReLU()
    (4): Linear(in_features=64, out_features=32, bias=True)
    (5): ReLU()
    (6): Linear(in_features=32, out_features=10, bias=True)
  )
)

Total number of parameters: 111146
