Skip to content

macanderson/MLLanguageModels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML Debugging Notebooks

A comprehensive collection of Jupyter notebooks demonstrating how to debug common machine learning problems. Each notebook provides hands-on examples, visualizations, and practical solutions to help you identify and fix issues in your ML models.

📚 Notebooks

Learn to identify and fix the bias-variance tradeoff problems:

  • Topics Covered:
    • Detecting overfitting vs underfitting using learning curves
    • Impact of model complexity on performance
    • Regularization techniques (L1, L2, Ridge)
    • Proper model selection and validation
  • Key Techniques: Learning curves, cross-validation, regularization

Understand and prevent data leakage that inflates model performance:

  • Topics Covered:
    • Target leakage identification
    • Train-test contamination
    • Temporal leakage in time-series
    • Proper data splitting and preprocessing
  • Key Techniques: Correlation analysis, pipeline usage, temporal validation

Debug gradient problems in deep neural networks:

  • Topics Covered:
    • Identifying vanishing/exploding gradients
    • Weight initialization strategies
    • Batch normalization
    • Gradient clipping
  • Key Techniques: Gradient monitoring, proper initialization (He/Xavier), ReLU activation

Handle imbalanced datasets effectively:

  • Topics Covered:
    • Detecting class imbalance problems
    • Resampling techniques (SMOTE, undersampling)
    • Class weights and cost-sensitive learning
    • Appropriate metrics for imbalanced data
  • Key Techniques: SMOTE, class weights, ROC-AUC, precision-recall curves

Master feature scaling for better model performance:

  • Topics Covered:
    • When and why to scale features
    • Different scaling techniques (StandardScaler, MinMaxScaler, RobustScaler)
    • Impact on various algorithms
    • Common scaling mistakes
  • Key Techniques: StandardScaler, MinMaxScaler, RobustScaler, proper pipeline usage

🚀 Getting Started

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Installation

  1. Clone this repository:
git clone https://github.com/macanderson/MLLanguageModels.git
cd MLLanguageModels
  1. Install required dependencies:
pip install -r requirements.txt
  1. Launch Jupyter Notebook:
jupyter notebook
  1. Open any notebook from the notebooks/ directory and start learning!

📋 Requirements

The notebooks use the following main libraries:

  • NumPy - Numerical computing
  • Pandas - Data manipulation
  • Scikit-learn - Machine learning algorithms
  • TensorFlow - Deep learning (for gradient problems)
  • Matplotlib & Seaborn - Visualization
  • imbalanced-learn - Handling imbalanced datasets

See requirements.txt for the complete list.

🎯 Learning Path

Recommended order for beginners:

  1. Start with Overfitting/Underfitting to understand model performance basics
  2. Learn Feature Scaling to prepare data properly
  3. Study Data Leakage to avoid common pitfalls
  4. Tackle Class Imbalance for real-world scenarios
  5. Explore Gradient Problems for deep learning applications

For experienced practitioners:

  • Jump to any notebook based on your current debugging needs
  • Each notebook is self-contained with complete examples

🔍 What You'll Learn

Each notebook follows a consistent structure:

  1. Problem Overview - What the issue is and why it matters
  2. Symptoms - How to recognize the problem
  3. Hands-on Examples - Code demonstrating the problem
  4. Solutions - Multiple approaches to fix the issue
  5. Best Practices - Guidelines to prevent future occurrences
  6. Debugging Checklist - Quick reference for troubleshooting
  7. Exercises - Practice problems to reinforce learning

💡 Use Cases

These notebooks are perfect for:

  • Students learning machine learning fundamentals
  • Data Scientists debugging model performance issues
  • ML Engineers implementing production-ready models
  • Researchers understanding common pitfalls
  • Interview Preparation for ML/DS roles

🤝 Contributing

Contributions are welcome! If you have suggestions for:

  • Additional debugging scenarios
  • Improved explanations
  • New visualization techniques
  • Bug fixes

Please open an issue or submit a pull request.

📝 License

This project is open source and available under the MIT License.

📧 Contact

For questions or feedback, please open an issue on GitHub.


Happy Debugging! 🐛🔧

About

ML Language Model exploration repository.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •