# Logistic Regression Explained: A Beginner-Friendly Guide

This notebook walks through the fundamentals of logistic regression using the <a href="https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database">**Pima Indians Diabetes Database**</a> as an example. The goal is to provide a clear and practical introduction to the logistic regression algorithm, with an emphasis on understanding the results—not on performance optimization.

Topics covered include:

1. Logistic regression background  
2. Dataset and model
3. Interpretation of model output  
   - Odds ratio  
   - Predicted probability  
   - Level of impact for each feature

In [1]:
import math
import numpy as np
from IPython.display import Image
import pandas as pd
from sklearn.linear_model import LogisticRegression
import statsmodels.api as sm
from sklearn.preprocessing import StandardScaler


# 1. Logistic Regression: Background

Logistic regression is a classification algorithm used to model the probability that a given input belongs to a specific class.

Given a set of features $(x_1, x_2, ..., x_n)$, logistic regression assumes a linear relationship between the features and the **log-odds** of the positive class $(y = 1$):

$$
\text{log-odds} = \log\left(\frac{p}{1 - p}\right) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n
$$

Where:
- $(p)$ is the probability of the input belonging to the positive class.
- $(\beta_0, \beta_1, \dots, \beta_n)$ are the model coefficients.

To convert the log-odds to a probability, we apply the **sigmoid function**:

$$
\text{sigmoid}(z) = \frac{1}{1 + e^{-z}}
$$

Substituting the log-odds equation into the sigmoid gives the final probability model:

$$
p = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n)}}
$$

This equation is the foundation of logistic regression.


# 2. Dataset and model

### Pima Indians Diabetes dataset
This data is publicly available at: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database

##### Context (copied from web)
This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

##### Acknowledgements (copied from web)
Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press.

#### Input Features:
- **Pregnancies** *(Discrete)* – Number of times pregnant
- **Glucose** *(Continuous)* – Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- **BloodPressure** *(Continuous)* – Diastolic blood pressure (mm Hg)
- **SkinThickness** *(Continuous)* – Triceps skin fold thickness (mm)  
- **Insulin** *(Continuous)* – 2-Hour serum insulin (mu U/ml)
- **BMI** *(Continuous)* – Body mass index (weight in kg/(height in m)^2)
- **DiabetesPedigreeFunction** *(Continuous)* – Diabetes pedigree function
- **Age** *(Discrete)* – Age in years  

#### Output Variable:
- **Outcome** *(Binary)* – Indicates diabetes diagnosis (1 = diabetic, 0 = non-diabetic)

In [2]:
df = pd.read_csv('/Users/thung/Downloads/PimasIndians_diabetes.csv')

# Preview the first few rows of the dataset
display(df.head())

# Print dataset dimensions
print(f'The dataset has {df.shape[0]} rows and {df.shape[1]} columns')

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


The dataset has 768 rows and 9 columns


### 2.1 Feature Engineering

The original dataset contains only continuous and discrete numeric variables. To demonstrate how logistic regression handles binary and categorical features, we engineer the following new variables:

- **BP_hypertension_s1** *(Binary)* – Assigned 1 if diastolic blood pressure ≥ 80 mmHg, and 0 otherwise  
- **Age_bracket** *(Categorical)* – Age grouped into brackets:
    - 1: 21–25  
    - 2: 26–35  
    - 3: 36–45  
    - 4: 46–55  
    - 5: 56–65  
    - 6: 65+

> Note: There is no group for ages under 21 since the dataset's minimum age is 21.

In [3]:
def age_bracket(age):
    """
    Categorizes a given age into predefined age brackets.

    Args:
        age (int): Age in years

    Returns:
        int: Age bracket label
    """
    if age < 26:
        return 1  # Ages 21–25
    elif age < 36:
        return 2  # Ages 26–35
    elif age < 46:
        return 3  # Ages 36–45
    elif age < 56:
        return 4  # Ages 46–55
    elif age < 66:
        return 5  # Ages 56–65
    else:
        return 6  # Age 65+

In [4]:
df = df.assign(
    BP_hypertension_s1 = lambda x: np.where(x['BloodPressure'] >= 80, 1, 0),
    Age_bracket = lambda x: x['Age'].apply(age_bracket) 
)

#Stage 1 hypertension count
print("Distribution of Stage 1 Hypertension (Binary Feature):")
display(df['BP_hypertension_s1'].value_counts())

#Age bracket count
print("Distribution of Age Brackets (Categorical Feature):")
display(df['Age_bracket'].value_counts())

Distribution of Stage 1 Hypertension (Binary Feature):


BP_hypertension_s1
0    563
1    205
Name: count, dtype: int64

Distribution of Age Brackets (Categorical Feature):


Age_bracket
1    267
2    231
3    152
4     68
5     37
6     13
Name: count, dtype: int64

In [5]:
def scale_single_sample(col_data):
    """
    Standardizes a single column using z-score normalization.

    Args:
        col_data (pd.Series): A column of numerical values

    Returns:
        np.ndarray: Scaled values (mean = 0, std = 1)
    """
    transformed_data = StandardScaler().fit_transform(col_data.values.reshape(-1, 1))
    return transformed_data

In [6]:
# Apply standard scaling to selected continuous variables
scaled_df = df.copy()

for col in ['Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction']:
    scaled_df[col] = scale_single_sample(df[col])

# Preview the scaled dataset
scaled_df.head()


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome,BP_hypertension_s1,Age_bracket
0,6,0.848324,0.149641,0.90727,-0.692891,0.204013,0.468492,50,1,0,4
1,1,-1.123396,-0.160546,0.530902,-0.692891,-0.684422,-0.365061,31,0,0,2
2,8,1.943724,-0.263941,-1.288212,-0.692891,-1.103255,0.604397,32,1,0,2
3,1,-0.998208,-0.160546,0.154533,0.123302,-0.494043,-0.920763,21,0,0,1
4,0,0.504055,-1.504687,0.90727,0.765836,1.409746,5.484909,33,1,0,2


### 2.2 Baseline model
We'll build our baseline logistic regression model using all available features. This model will serve as our foundation for interpreting the relationship between predictor variables and diabetes outcomes.

In [7]:
X = scaled_df.drop(['Outcome'], axis=1)
y = scaled_df['Outcome']

X_const = sm.add_constant(X)
# Fit the model
LR_model = sm.Logit(y, X_const).fit()

# Print summary
print("\nModel Summary:")
print(LR_model.summary())

Optimization terminated successfully.
         Current function value: 0.470619
         Iterations 6

Model Summary:
                           Logit Regression Results                           
Dep. Variable:                Outcome   No. Observations:                  768
Model:                          Logit   Df Residuals:                      757
Method:                           MLE   Df Model:                           10
Date:                Mon, 09 Jun 2025   Pseudo R-squ.:                  0.2724
Time:                        23:57:24   Log-Likelihood:                -361.44
converged:                       True   LL-Null:                       -496.74
Covariance Type:            nonrobust   LLR p-value:                 2.483e-52
                               coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------
const                       -1.9473      0.481     -4.049      0

# 3. Interpreting Logistic Regression Results
Understanding logistic regression coefficients requires careful interpretation, as they represent changes in log-odds rather than direct probability changes. This section explores three key interpretation frameworks: odds ratios, probability impacts, and practical significance levels.

### 3.1 Odd ratio
The <b>odd ratio</b> represents the ratio of the probability of success to the probability of no success:
$$
\text{odd ratio} = \frac{\text{P of success}}{\text{P of no success}} = \frac{\text{P(success)}}{\text{1 - P(success)}}
$$

Since logistic regression assumes a linear relationship between the features and the log-odds of the event belonging to the positive class, we will transform the coefficients to odd ratio: $\text{Odds Ratio}=e^{\beta}$, where $\beta$ is an arbitrary coefficient of the feature. The odd ratio provides a clearer interpretation of the effect of each predictor variable on the odds of the outcome variable.

### Binary Predictor Variables
For a binary predictor variable, the coefficient represents the difference in the <b>log odds</b> of the outcome between the reference category (0) and the other category (1), holding all other variables in constant.

##### Example: BP_hypertension_s1
* Coefficient: $0.1870$
* Odds Ratio: $e^{0.1870}=1.201$
* Interpretation: having stage 1 hypertension (change from 0 to 1) <b>increases</b> the odd of having diabetes by 1.201 times higher than not having stage 1 hypertension, holding all other variables in constant.

### Continuous Predictor Variables
For continuous predictors, the coefficient represents the change in log-odds for a one-unit increase in the predictor.

##### Example: BloodPressure (standardized)
* Coefficient: $-0.3040$
* Odds Ratio: $e^{0.1870}=0.738$
* Interpretation: one-unit increase in diastolic blood pressure (measured in standard deviation) <b>decreases</b> the odd of not having diabetes by a factor of $0.738$, holding all other variables in constant.
 

### 3.2 Probability
While odds ratios provide relative risk measures, understanding how coefficients affect prediction probabilities offers practical insights into variable importance.

##### Coefficient Magnitude and Context Dependency
The impact of any predictor on probability depends heavily on the baseline probability level. The same coefficient produces different probability changes depending on whether the baseline probability is high or low.


### Demonstrating Probability Impact: Binary Variables


##### BP_hypertension_s1 (example)

The probability function of the logistic regression is:
$$
p = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n)}}
$$

Let $\alpha$ represent the sum of all other terms (constant baseline), we obtain:

$$
p = \frac{1}{1 + e^{-(0.1870*1 + \alpha)}}
$$

Now, let's consider two cases: (1) the constant $\alpha$ is <b>large</b> and (2) the constant $\alpha$ is <b>small</b>:

Case (1), let $\alpha=3$, then:
$$
p_{baseline} = \frac{1}{1 + e^{-(3)}} = 0.953
$$

$$
\hat{p} = \frac{1}{1 + e^{-(0.1870*1+3)}} --> \hat{p} = \frac{1}{1 + e^{-3.1870}},     \mathrm{BP\_hypertension\_s1Contribution} = \hat{p} - p_{baseline} = 0.960 - 0.953 = 0.007 = 0.7\%
$$

Case (2), let $\alpha=0.5$, then:

$$
p_{baseline} = \frac{1}{1 + e^{-(0.5)}} = 0.622
$$

$$
\hat{p} = \frac{1}{1 + e^{-(0.1870*1+0.5)}} --> \hat{p} = \frac{1}{1 + e^{-0.6870}},    \mathrm{BP\_hypertension\_s1Contribution} = \hat{p} - p_{baseline} = 0.665 - 0.622 = 0.043 = 4.3\%
$$

When the constant is large ($\alpha=3)$, having stage 1 hypertension <b>increases</b> the model's predicted probability of having diabetes by <b>0.4%</b>. On the other hand, When the constant is small ($\alpha=0.5)$, having stage 1 hypertension <b>increases</b> the model's predicted probability of having diabetes by <b>4.3%</b>.

### 3.3 Demonstrating Probability Impact: Continuous Variables

##### BloodPressure (example)
Using the median standardized blood pressure value (≈1.5, equivalent to 72 mmHg):

Now, let's consider two cases: (1) the constant $\alpha$ is <b>large</b> and (2) the constant $\alpha$ is <b>small</b>:

Case (1), let $\alpha=3$, then:
$$
p_{baseline} = \frac{1}{1 + e^{-(3)}} = 0.953
$$
* Taking into account of <b>BloodPressure</b>:
$$
\hat{p} = \frac{1}{1 + e^{-(-0.304*1.5+3)}} --> \hat{p} = \frac{1}{1 + e^{-2.544}},     \text{BloodPressureContribution} = \hat{p} - p_{baseline} = 0.927 - 0.953 = -0.026 = -2.6\%
$$

Case (2), let $\alpha=0.5$, then:
$$
p_{baseline} = \frac{1}{1 + e^{-(0.5)}} = 0.622
$$
* Taking into account of <b>BloodPressure</b>:
$$
\hat{p} = \frac{1}{1 + e^{-(-0.304*1.5+0.5)}} --> \hat{p} = \frac{1}{1 + e^{-1.0540}},     \text{BloodPressureContribution} = \hat{p} - p_{baseline} = 0.511 - 0.622 = -0.111 = -11.1\%
$$

When the constant is large ($\alpha=3)$, having a diastolic blooe pressure of 72 <b>decreases</b> the model's predicted probability of having diabetes by <b>2.6%</b>. On the other hand, When the constant is small ($\alpha=0.5)$, having a diastolic blooe pressure of 72 <b>decreases</b> the model's predicted probability of having diabetes by <b>11.1%</b>. 

### 3.3 Evaluation framework to assess predictor variable's impact 
##### <b>Statistical significance</b>
Check whether the coefficient is statistically significant (P-value ≤ 0.05), which gives insights on the probability of obtaining a particular coefficient value under the observed data, given the null hypothesis is true. 
##### <b>Coefficient magnitude comparison</b>
Relative comparison with other model coefficients
##### <b>Variable Scale Considerations</b>
Account for different predictor ranges and transformations
* In our model:
    - BP_hypertension_s1 (binary): Range 0-1
    - BloodPressure (standardized): Range approximately -3.6 to +2.7
* The larger range of continuous variables means they're less sensitive to coefficient values compared to binary variables. A coefficient of 0.2 creates different practical impacts depending on the variable's natural range.

### 4. Quantitative Impact Classification System
This section introduces a method to categorize predictor impact into <b>Small</b>, <b>Moderate</b>, or <b>Large</b>. Using our model, we can write the logistic regression algorithm as:

$$
p = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4 + \beta_5 x_5 + \beta_6 x_6 + \beta_7 x_7 + \beta_8 x_8 + \beta_9 x_9 + \beta_{10} x_{10})}} =  \frac{1}{1 + e^{-(-1.9473 + 0.1249 x_1 + 1.1197 x_2 - 0.3040 x_3 + 0.0161 x_4 - 0.1347 x_5 + 0.6951 x_6 + 0.3082 x_7 + 0.0207 x_8 + 0.1870 x_9 - 0.0641 x_{10})}}
$$

Where: <BR>
$\beta_0$ = Constant,<br>
$\beta_1$ = Pregancies,<br>
$\beta_2$ = Glucose,<br>
$\beta_3$ = BloodPressure,<br>
$\beta_4$ = SkinThickness,<br>
$\beta_5$ = Insulin,<br>
$\beta_6$ = BMI,<br>
$\beta_7$ = DiabetesPedigreeFunction,<br>
$\beta_8$ = Age,<br>
$\beta_9$ = BP_hypertension_s{1},<br>
$\beta_{10}$ = Age_bracket
 

For each predictor variable (e.g. BP_hypertension_s1 ($\beta_9$)):
1. Compute the linear combination of all other predictors multiplied by their coefficient magnitudes, plus the intercept term
2. Calculate 50th, 70th, and 85th percentiles of threshold values
3. Multiply the target predictor (BP_hypertension_s1) by its associating coefficient magnitude and add it to the 50th percentile value compueted in (2). This gives us the benchmark value for BP_hypertension_s1
4. Compare the benchmark value with threshold values:
    - Small: benchmark value < 70th percentile
    - Moderate: 70th percentile <= benchmark value < 85th percentile
    - Large: benchmark value ≥ 85th percentile

Note: The threshold values are adjustable.

In [8]:
def interpret_logistic_regression(model, outcomes_df):
    """
    Interpret the output of a logistic regression model by calculating statistical significance,
    odds ratios, and practical impact scale for each predictor variable.
    
    This function provides a comprehensive interpretation of logistic regression results,
    including statistical significance testing, odds ratio calculations, and a novel
    impact scale classification based on percentile analysis.
    
    Parameters
    ----------
    model : statsmodels logistic regression model
        The fitted logistic regression model using statsmodels package
    outcomes_df : pandas.DataFrame
        The dataset used to fit the logistic regression model. Must contain all 
        predictor variables used in the model.
    
    Returns
    -------
    pandas.DataFrame
        A DataFrame containing interpretation results with columns:
        - Pval: Formatted p-value ('<0.001', '<0.01', '<0.05', or actual value)
        - Coefficient: coefficient value 
        - OddRatio: Odds ratio
        - is_significant: Binary indicator (1 if p≤0.05, 0 otherwise)
        - scale_of_impact: Impact classification ('Small', 'Moderate', 'Large', or 'N/A')
    
    Notes
    -----
    Impact Scale Calculation:
    For each predictor variable, the function calculates a "baseline impact" by:
    1. Compute the linear combination of all other predictors multiplied by their coefficient magnitudes, plus the intercept term
    2. Calculate 50th, 70th, and 85th percentiles of threshold values
    3. Multiply the target predictor (BP_hypertension_s1) by its associating coefficient magnitude and add it to the 50th percentile value compueted in (2). This gives us the benchmark value for BP_hypertension_s1
    4. Compare the benchmark value with threshold values:
    - Small: benchmark value < 70th percentile
    - Moderate: 70th percentile <= benchmark value < 85th percentile
    - Large: benchmark value ≥ 85th percentile
        
    The function only returns statistically significant predictors (p≤0.05) in the
    main results table, but reports non-significant variables separately.
    """
    
    # Extract p-values and determine statistical significance
    pval_df = model.pvalues.to_frame()
    pval_df.columns = ['Pval']
    pval_df['is_significant'] = pval_df['Pval'].apply(lambda x: 1 if x <= 0.05 else 0)
    
    # Extract coefficients and calculate odds ratios
    coef_df = model.params.to_frame()
    coef_df.columns = ['Coefficient']
    coef_df['OddRatio'] = coef_df['Coefficient'].apply(lambda x: math.exp(x))
    
    # Merge coefficient and p-value data
    results_df = coef_df.merge(pval_df, left_index=True, right_index=True, how='inner')
    
    # Format p-values according to reporting conventions
    def format_pvalue(pval):
        """Format p-values according to statistical reporting conventions."""
        if pval <= 0.001:
            return '<0.001'
        elif pval <= 0.01:
            return '<0.01'
        elif pval <= 0.05:
            return '<0.05'
        else:
            return f'{pval:.3f}'
    
    results_df['Pval'] = results_df['Pval'].apply(format_pvalue)
    results_df['Coefficient'] = np.round(results_df['Coefficient'], 3)
    results_df['OddRatio'] = np.round(results_df['OddRatio'], 2)
    
    # Calculate scale of impact for each predictor variable
    impact_classifications = []
    
    for predictor_var in results_df.index:
        if predictor_var == 'const':
            # Constant term doesn't have a meaningful impact scale
            impact_classifications.append('N/A')
            continue
        
        # Get all other predictor variables (excluding constant and current predictor)
        other_predictors = [var for var in results_df.index 
                           if 'const' not in var and predictor_var not in var]
        
        # Calculate baseline impact from other variables
        baseline_impact = pd.Series(0, index=np.arange(len(outcomes_df)))
        
        for other_var in other_predictors:
            if other_var in outcomes_df.columns:
                data_values = outcomes_df[other_var]
                coefficient_magnitude = np.abs(results_df.loc[other_var, "Coefficient"])
                baseline_impact += data_values * coefficient_magnitude
        
        # Add constant term to baseline
        if 'const' in results_df.index:
            baseline_impact += np.abs(results_df.loc['const', "Coefficient"])
        
        # Calculate percentiles for impact classification
        percentile_50 = np.percentile(baseline_impact, 50)
        percentile_70 = np.percentile(baseline_impact, 70)
        percentile_85 = np.percentile(baseline_impact, 85)
        
        # Calculate baseline magnitude including current predictor
        current_predictor_magnitude = np.abs(results_df.loc[predictor_var, 'Coefficient'])
        baseline_with_predictor = np.abs(results_df.loc[predictor_var, "Coefficient"]) * current_predictor_magnitude + percentile_50
        
        # Classify impact scale
        if baseline_with_predictor >= percentile_85:
            impact_classifications.append('Large')
        elif baseline_with_predictor >= percentile_70:
            impact_classifications.append('Moderate')
        elif baseline_with_predictor >= percentile_50:
            impact_classifications.append('Small')
        else:
            impact_classifications.append('Minimal')
    
    results_df['scale_of_impact'] = impact_classifications
    
    # Separate significant and non-significant predictors
    non_significant_predictors = results_df[results_df['is_significant'] == 0].index.tolist()
    significant_results = results_df[results_df['is_significant'] == 1].copy()
    
    # Sort by significance (already filtered) and then by impact
    impact_order = {'Large': 3, 'Moderate': 2, 'Small': 1, 'N/A': 0, 'Minimal': 0}
    if not significant_results.empty:
        significant_results['impact_rank'] = significant_results['scale_of_impact'].map(impact_order)
        significant_results = significant_results.sort_values('impact_rank', ascending=False)
        significant_results = significant_results.drop('impact_rank', axis=1)
    
    # Generate report
    print("\n" + "="*80)
    print("LOGISTIC REGRESSION MODEL INTERPRETATION")
    print("="*80)
    print(f"Total participants in the model: {len(outcomes_df):,}")
    print(f"Number of significant predictors: {len(significant_results)}")
    print(f"Number of non-significant predictors: {len(non_significant_predictors)}")
    print()
    
    if not significant_results.empty:
        print("SIGNIFICANT PREDICTORS:")
        print("-" * 40)
        display(significant_results.drop('is_significant', axis=1))
    else:
        print("No statistically significant predictors found (p ≤ 0.05)")
    
    print()
    if non_significant_predictors:
        print("NON-SIGNIFICANT PREDICTORS:")
        print("-" * 30)
        print(", ".join(non_significant_predictors))
    
    print()
    print("IMPACT SCALE LEGEND:")
    print("Large: ≥85th percentile impact")
    print("Moderate: 70th-85th percentile impact") 
    print("Small: 50th-70th percentile impact")
    print("="*80)

### 5. Result Interpretation
Plasma glucose concentration emerges as the most practically significant predictor

In [9]:
interpret_logistic_regression(model=LR_model, outcomes_df=scaled_df)


LOGISTIC REGRESSION MODEL INTERPRETATION
Total participants in the model: 768
Number of significant predictors: 6
Number of non-significant predictors: 5

SIGNIFICANT PREDICTORS:
----------------------------------------


Unnamed: 0,Coefficient,OddRatio,Pval,scale_of_impact
Glucose,1.12,3.06,<0.001,Moderate
Pregnancies,0.125,1.13,<0.001,Small
BloodPressure,-0.304,0.74,<0.05,Small
BMI,0.695,2.0,<0.001,Small
DiabetesPedigreeFunction,0.308,1.36,<0.01,Small
const,-1.947,0.14,<0.001,



NON-SIGNIFICANT PREDICTORS:
------------------------------
SkinThickness, Insulin, Age, BP_hypertension_s1, Age_bracket

IMPACT SCALE LEGEND:
Large: ≥85th percentile impact
Moderate: 70th-85th percentile impact
Small: 50th-70th percentile impact


### 6. Future Directions
Potential enhancements for this analytical framework include:
    
- Cross-validation of impact classifications
- Extension to other generalized linear models
- Incorporation of interaction effects in impact assessment

This comprehensive interpretation framework provides both statistical rigor and practical insights, making logistic regression results more accessible and actionable for decision-making purposes.