Skip to content

Python scripts for regression models, using the Scikit-Learn framework: Diagnostic plots, confidence intervals & approximate Shapley values

License

Notifications You must be signed in to change notification settings

macemaclean/regression-model-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Regression model tools

Python scripts for regression models, using the Scikit-Learn framework:

  • Diagnostic plots
  • Bootstrapped confidence intervals for predictions
  • Approximate Shapley values

Diagnostic plots

While ML models do not generally have the same residual distribution assumptions as for classical linear regression, there is still value in examining residual plots.

import lightgbm as lgb
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from regression_diagnostics import RegressionDiagnostics
import warnings
warnings.filterwarnings('ignore')

# Load the boston house-prices dataset and fit a regression model
boston = load_boston()

X = pd.DataFrame(boston["data"], columns=boston.feature_names)
y = boston.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit model
lgb_model = lgb.LGBMRegressor()
lgb_model.fit(X_train, y_train)

# Generate diagnostic plots
diagnostics = RegressionDiagnostics(lgb_model)
diagnostics.fit(X_test, y_test)
# Fitted values against actual values
diagnostics.fitted_actual()

Fitted values against actual values

# Residuals against fitted values
diagnostics.residuals_fitted()

Residuals against fitted values

# Histogram of residuals
diagnostics.hist_residuals()

Residuals against fitted values

# QQ plot of residuals
diagnostics.qq_plot()

Residuals against fitted values

Bootstrapped confidence intervals for predictions

A script to generate local bootstrapped confidence intervals for predictions using observed residuals for k nearest neighbours in a reference data set. Increasing the value of k obtains results closer to a global error interval.

Residuals against fitted values

About

Python scripts for regression models, using the Scikit-Learn framework: Diagnostic plots, confidence intervals & approximate Shapley values

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published