This is a collection of repos devoted to learning machine learning with Kaggle.
- Regression with Housing Prices
- Classification with Titanic
- MNIST Solutions
- Classification with Spaceship Titanic
- NLP with Disaster Tweets
Follow me on Kaggle
This is a deep dive into learning to solve machine learning regression problems. Supervised learning with a continuous target value.
The data set used is from the the Kaggle Competition House Prices - Advanced Regression Techniques
Given several dozen predictors/featues, we want to accurately predict the sale price of a house.
- Quickstart
- Lasso, Ridge, and ElasticNet Regression
- Polynomial Features
- Target and Feature Distributions
- Simple Imputer and Label Encoding
- Robust Regression - RANSAC
- [SelectK Best Features]
- Variance Inflation Factor (VIF)
- Recursive Feature Elimination (RFE)
- Mutual Information Gain
- [Forward Feature Selection]
- Stochastic Gradient Descent
- Lasso, Ridge, and ElasticNet with log(target)
- [Outliers]
- Decision Tree and Random Forests
- GridSearchCV
- MLPRegressor
- Gradient Boosted Trees - XGBoost/Catboost/LightGBM
- [GBDT Feature Importance]
- [SHAP Values]
- [XGBoost + CV with OOF Results]
- [XGBoost + Optuna]
- [Data Transformation]
- Support Vector Machines
- Tensorflow
- [KerasTuner]
-
- [Target Encoding]
- [Ensemble Learning - Blending]
- [Ensemble Learning - Stacking]
- [Robust Regression - RANSAC]
- [Nonlinear Regression]
- PyTorch
- Basic EDA
- [Enhanced EDA]
- Feature Engineering
- Linear Regression from Scratch
- [DSML Feature Selection]
- Linear Regression
- Lasso - L1
- Ridge - L2
- Polynomial
- Residuals
- Collinearity
- Interactions
- Mathematics
- Solving Ax=b using numpy
- Normal Equations
- Decision Trees
- Gradient Boosted Decision Trees (GBDT)
- Support Vector Machines
- Principal Component Analysis (PCA)
- Stochastic Gradient Descent
- Deep Neural Networks (DNN)
- Activation Functions
In addition, we will cover other topics important to machine learning:
- Feature Engineering
- Data Transformation
- Scaling
- Gaussian Normal
- log transform
- skew, kurtosis
- Data Transformation
- Missing Values
- Outliers
- Z-score
- IQR Method
- https://www.kaggle.com/code/nareshbhat/outlier-the-silent-killer
- Hypothesis Testing
- DBSCAN Clustering
- Loss Functions
- MAE
- RMSE
- Huber
- Feature Selection
- Forward Selection
- Reverse Selection
- SHAP
- Permutation Importance
- Mutual Information
- Hyperparameter Optimization