ML Debugging Notebooks

A comprehensive collection of Jupyter notebooks demonstrating how to debug common machine learning problems. Each notebook provides hands-on examples, visualizations, and practical solutions to help you identify and fix issues in your ML models.

📚 Notebooks

1. Overfitting and Underfitting Detection

Learn to identify and fix the bias-variance tradeoff problems:

Topics Covered:
- Detecting overfitting vs underfitting using learning curves
- Impact of model complexity on performance
- Regularization techniques (L1, L2, Ridge)
- Proper model selection and validation
Key Techniques: Learning curves, cross-validation, regularization

2. Data Leakage Detection and Prevention

Understand and prevent data leakage that inflates model performance:

Topics Covered:
- Target leakage identification
- Train-test contamination
- Temporal leakage in time-series
- Proper data splitting and preprocessing
Key Techniques: Correlation analysis, pipeline usage, temporal validation

3. Vanishing and Exploding Gradients

Debug gradient problems in deep neural networks:

Topics Covered:
- Identifying vanishing/exploding gradients
- Weight initialization strategies
- Batch normalization
- Gradient clipping
Key Techniques: Gradient monitoring, proper initialization (He/Xavier), ReLU activation

4. Class Imbalance Issues

Handle imbalanced datasets effectively:

Topics Covered:
- Detecting class imbalance problems
- Resampling techniques (SMOTE, undersampling)
- Class weights and cost-sensitive learning
- Appropriate metrics for imbalanced data
Key Techniques: SMOTE, class weights, ROC-AUC, precision-recall curves

5. Feature Scaling and Normalization Problems

Master feature scaling for better model performance:

Topics Covered:
- When and why to scale features
- Different scaling techniques (StandardScaler, MinMaxScaler, RobustScaler)
- Impact on various algorithms
- Common scaling mistakes
Key Techniques: StandardScaler, MinMaxScaler, RobustScaler, proper pipeline usage

🚀 Getting Started

Prerequisites

Python 3.8 or higher
pip package manager

Installation

Clone this repository:

git clone https://github.com/macanderson/MLLanguageModels.git
cd MLLanguageModels

Install required dependencies:

pip install -r requirements.txt

Launch Jupyter Notebook:

jupyter notebook

Open any notebook from the notebooks/ directory and start learning!

📋 Requirements

The notebooks use the following main libraries:

NumPy - Numerical computing
Pandas - Data manipulation
Scikit-learn - Machine learning algorithms
TensorFlow - Deep learning (for gradient problems)
Matplotlib & Seaborn - Visualization
imbalanced-learn - Handling imbalanced datasets

See requirements.txt for the complete list.

🎯 Learning Path

Recommended order for beginners:

Start with Overfitting/Underfitting to understand model performance basics
Learn Feature Scaling to prepare data properly
Study Data Leakage to avoid common pitfalls
Tackle Class Imbalance for real-world scenarios
Explore Gradient Problems for deep learning applications

For experienced practitioners:

Jump to any notebook based on your current debugging needs
Each notebook is self-contained with complete examples

🔍 What You'll Learn

Each notebook follows a consistent structure:

Problem Overview - What the issue is and why it matters
Symptoms - How to recognize the problem
Hands-on Examples - Code demonstrating the problem
Solutions - Multiple approaches to fix the issue
Best Practices - Guidelines to prevent future occurrences
Debugging Checklist - Quick reference for troubleshooting
Exercises - Practice problems to reinforce learning

💡 Use Cases

These notebooks are perfect for:

Students learning machine learning fundamentals
Data Scientists debugging model performance issues
ML Engineers implementing production-ready models
Researchers understanding common pitfalls
Interview Preparation for ML/DS roles

🤝 Contributing

Contributions are welcome! If you have suggestions for:

Additional debugging scenarios
Improved explanations
New visualization techniques
Bug fixes

Please open an issue or submit a pull request.

📝 License

This project is open source and available under the MIT License.

📧 Contact

For questions or feedback, please open an issue on GitHub.

Happy Debugging! 🐛🔧

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
notebooks		notebooks
.gitignore		.gitignore
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Debugging Notebooks

📚 Notebooks

1. Overfitting and Underfitting Detection

2. Data Leakage Detection and Prevention

3. Vanishing and Exploding Gradients

4. Class Imbalance Issues

5. Feature Scaling and Normalization Problems

🚀 Getting Started

Prerequisites

Installation

📋 Requirements

🎯 Learning Path

🔍 What You'll Learn

💡 Use Cases

🤝 Contributing

📝 License

📧 Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

macanderson/MLLanguageModels

Folders and files

Latest commit

History

Repository files navigation

ML Debugging Notebooks

📚 Notebooks

🚀 Getting Started

Prerequisites

Installation

📋 Requirements

🎯 Learning Path

🔍 What You'll Learn

💡 Use Cases

🤝 Contributing

📝 License

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages