# Feature Importance Analysis (Logistic Regression)

Review of coefficient-driven feature importance for the baseline logistic regression model.

**Interpretation**
- Positive coefficients increase predicted default risk (higher log-odds)
- Negative coefficients decrease predicted default risk

In [1]:

from pathlib import Path
from IPython.display import display
import joblib
import numpy as np
import pandas as pd

PROJECT_ROOT = Path('..').resolve()
MODEL_PATH = PROJECT_ROOT / 'models/logistic_model.pkl'
FEATURE_NAMES_PATH = PROJECT_ROOT / 'models/feature_names.txt'
TOP_FEATURES_PATH = PROJECT_ROOT / 'results/top_features.csv'

feature_names = [line.strip() for line in FEATURE_NAMES_PATH.read_text().splitlines() if line.strip()]
model = joblib.load(MODEL_PATH)
coefficients = model.coef_.ravel()

importance_df = pd.DataFrame({'feature': feature_names, 'coefficient': coefficients})
importance_df['abs_coefficient'] = importance_df['coefficient'].abs()
importance_df['effect'] = np.where(importance_df['coefficient'] >= 0, 'increases_risk', 'decreases_risk')
importance_df['odds_ratio'] = np.exp(np.clip(importance_df['coefficient'], -50, 50))
importance_df = importance_df.sort_values('abs_coefficient', ascending=False)
importance_df.head(10)


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


Unnamed: 0,feature,coefficient,abs_coefficient,effect,odds_ratio
48,credit_type_EQUI,37.671285,37.671285,increases_risk,2.293143e+16
49,credit_type_EXP,-11.701564,11.701564,decreases_risk,8.280858e-06
46,credit_type_CIB,-11.653939,11.653939,decreases_risk,8.684772e-06
47,credit_type_CRIF,-11.61487,11.61487,decreases_risk,9.030799e-06
35,construction_type_mh,5.310043,5.310043,increases_risk,202.3589
65,security_type_Indriect,5.310043,5.310043,increases_risk,202.3589
41,secured_by_land,5.310043,5.310043,increases_risk,202.3589
33,lump_sum_payment_lpsm,2.61965,2.61965,increases_risk,13.73092
40,secured_by_home,-2.609131,2.609131,decreases_risk,0.07359848
36,construction_type_sb,-2.609131,2.609131,decreases_risk,0.07359848


In [2]:

top_features = pd.read_csv(TOP_FEATURES_PATH)
top_features


Unnamed: 0,feature,coefficient,abs_coefficient,effect,odds_ratio
0,credit_type_EQUI,37.671285,37.671285,increases_risk,2.293143e+16
1,credit_type_EXP,-11.701564,11.701564,decreases_risk,8.280858e-06
2,credit_type_CIB,-11.653939,11.653939,decreases_risk,8.684772e-06
3,credit_type_CRIF,-11.61487,11.61487,decreases_risk,9.030799e-06
4,construction_type_mh,5.310043,5.310043,increases_risk,202.3589
5,security_type_Indriect,5.310043,5.310043,increases_risk,202.3589
6,secured_by_land,5.310043,5.310043,increases_risk,202.3589
7,lump_sum_payment_lpsm,2.61965,2.61965,increases_risk,13.73092
8,secured_by_home,-2.609131,2.609131,decreases_risk,0.07359848
9,construction_type_sb,-2.609131,2.609131,decreases_risk,0.07359848


In [3]:

top_positive = importance_df.sort_values('coefficient', ascending=False).head(5)
top_negative = importance_df.sort_values('coefficient').head(5)

display(top_positive[['feature', 'coefficient', 'odds_ratio']])
display(top_negative[['feature', 'coefficient', 'odds_ratio']])


Unnamed: 0,feature,coefficient,odds_ratio
48,credit_type_EQUI,37.671285,2.293143e+16
35,construction_type_mh,5.310043,202.3589
65,security_type_Indriect,5.310043,202.3589
41,secured_by_land,5.310043,202.3589
33,lump_sum_payment_lpsm,2.61965,13.73092


Unnamed: 0,feature,coefficient,odds_ratio
49,credit_type_EXP,-11.701564,8e-06
46,credit_type_CIB,-11.653939,9e-06
47,credit_type_CRIF,-11.61487,9e-06
66,security_type_direct,-2.609131,0.073598
36,construction_type_sb,-2.609131,0.073598


**Findings**
- `credit_type_EQUI` carries an extremely large positive coefficient (~37.7), sharply increasing default odds relative to the baseline category.
- Other credit bureau categories (`credit_type_EXP`, `credit_type_CIB`, `credit_type_CRIF`) have large negative coefficients (around -11.6), signaling a much lower risk than the baseline.
- Collateral- and structure-related flags (`construction_type_mh`, `security_type_Indriect`, `secured_by_land`) meaningfully raise risk, while `secured_by_home` and `construction_type_sb` reduce it.
- The `lump_sum_payment_lpsm` option adds risk, suggesting borrowers selecting lump-sum payments may be more likely to default.
- Magnitudes indicate strong separation between categories; they should be interpreted relative to the base level of each one-hot encoded group.