Skip to content

kennethleungty/Logistic-Regression-Assumptions

Repository files navigation

Assumptions of Logistic Regression, Clearly Explained

Understanding and implementing the assumption checks behind one of the most important statistical techniques in data science - Logistic Regression

  • Link to TowardsDataScience article: https://towardsdatascience.com/assumptions-of-logistic-regression-clearly-explained-44d85a22b290
  • Logistic regression is a highly effective modeling technique that has remained a mainstay in statistics since its development in the 1940s.
  • Given its popularity and utility, data practitioners should understand the fundamentals of logistic regression before using it to tackle data and business problems.
  • In this project, we explore the key assumptions of logistic regression with theoretical explanations and practical Python implementation of the assumption checks.

Contents

(1) Logistic_Regression_Assumptions.ipynb

  • The main notebook containing the Python implementation codes (along with explanations) on how to check for each of the 6 key assumptions in logistic regression

(2) Box-Tidwell-Test-in-R.ipynb

  • Notebook containing R code for running Box-Tidwell test (to check for logit linearity assumption)

(3) /data

  • Folder containing the public Titanic dataset (train set)

(4) /references

  • Folder containing several sets of lecture notes explaining advanced regression

Special Thanks

  • @dataninj4 for correcting imports and adding .loc referencing in diagnosis_df cell so that it runs without errors in Python 3.6/3.8
  • @ArneTR for rightly pointing out that VIF calculation should include a constant, and correlation matrix should exclude target variable

References