**Deep Learning Day 2**

### Administrative Notes
1. **Next Lecture**: 2nd January 2025.  
2. **Public Defense**: Scheduled for 10th February 2025. Ensure submission before 9th February 2025.

### Extra Information
1. **Handling Large Datasets**: If a dataset does not fit in RAM, use methods like `partial_fit` to train the model incrementally.
2. **Gradient Descent Alternatives**: Consider methods like Kolmogorov-Arnold or Simulated Annealing for optimization tasks.
3. **Neural Network Layers**: A layer without an activation function behaves as Linear Regression.
4. **Multithreading in Data Loading**: The `num_workers` parameter specifies the number of CPU cores utilized for parallel processing.
5. **Activation Functions**:  
   - **ReLU**: Rectified Linear Unit, commonly used for introducing non-linearity.  
   - **Clamped Activation Function**: A variation of ReLU with an upper limit.
6. **Initial Model Fitting**: Start by fitting the model on a small batch (e.g., 8 records). If it does not overfit, investigate potential coding errors.
7. **Model Saving/Loading**: Similar to `pickle`, save or load models to/from a file for reuse.

### Building Models: TensorFlow (TF) vs. PyTorch
1. **Object-Oriented Programming (OOP)**: A common approach for model-building in both frameworks.
2. **TensorFlow Functional API**:  
   - Supports multiple input/output configurations.  
   - Enables variable reuse and flexible combination of variables.
3. **PyTorch Models**: Similar principles apply, with layers and models defined using OOP.
4. **Data Pipelines**:  
   - **TensorFlow**: Use `tf.data.Dataset` for efficient data handling.  
   - **PyTorch**: Leverage `Dataset` and `DataLoader` for sequential data processing.
5. **Tutorials**:  
   - [TensorFlow Data Performance Guide](https://www.tensorflow.org/guide/data_performance).  
   - [PyTorch Data Pipeline Tutorial](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html).

### Bias-Variance Tradeoff and Error Analysis
1. **Regularization**:  
   - **L1 Regularization**: Reduces weights, potentially turning some to zero.  
   - **L2 Regularization**: Penalizes large weights.  
   - Includes kernel, bias, and activity regularizers.
2. **Dropout**:  
   - Randomly deactivates units in a layer during training.  
   - Specify the dropout rate as the percentage of units to deactivate.  
   - Note: Do not use dropout during inference.  
   - When applied just after the input layer, it acts like "feature selection" or "data denoising" but is not recommended for this purpose.
3. **Data Splitting**:  
   - Splits such as 70/30 or 80/20 are guidelines, not strict rules.  
   - For large datasets (e.g., 1M rows): Use 980K for training, 10K for validation, and 10K for testing.
4. **Bias-Variance Error Analysis**:  
   - **High Bias**: Address with more data, additional features, or a more complex model.  
   - **High Variance**: Mitigate with simpler models, regularization, or improved data quality.
5. **Training/Validation Curves**: Use tools like TensorBoard (`%load_ext tensorboard`) and callbacks (e.g., `ModelCheckpoint`) for monitoring.

### Optimization
#### I. Vanishing/Exploding Gradients
1. Use appropriate weight initialization techniques (e.g., Xavier/Glorot initialization).
2. Leverage mini-batch gradient descent and `partial_fit` to handle these issues.

#### II. Advanced Optimizers
1. **Momentum**: Uses exponential moving averages to accelerate gradient descent.
2. **RMSprop**: Adjusts learning rates using root mean square propagation.
3. **Adam**: Combines momentum and RMSprop for adaptive learning.

#### III. Hyperparameter Tuning
1. Activation functions are often the last hyperparameter to tune (default to ReLU unless issues arise).
2. Grid search is limited for large search spaces; prefer random or Bayesian search techniques.
3. Examples of using Optuna for hyperparameter optimization:  
   - [TensorFlow Example](https://github.com/optuna/optuna-examples/blob/main/tensorflow/tensorflow_eager_simple.py).  
   - [PyTorch Example](https://github.com/optuna/optuna-examples/blob/main/pytorch/pytorch_simple.py).
4. Key hyperparameters to tune:  
   - Learning rate.  
   - Number of hidden units.  
   - Number of hidden layers.  
   - Momentum term.  
   - Mini-batch size.

#### IV. Batch Normalization
1. Normalize inputs using techniques like Z-score normalization.
2. Apply batch normalization to standardize activations of the prior layer.
