# Model Post-Fitting: Detecting Collinearity

### by

# Jeff Gross

based on SAS e-learning

In [2]:
libname statdata "/folders/myfolders/ECST131"; 
libname library "/folders/myfolders/ECST131";

<img src="files/collin_1.png">

<img src="files/collin_2.png">

<img src="files/collin_3.png">

<img src="files/collin_4.png">

<img src="files/collin_5.png">

### VIF Cutoff: >10

<img src="files/modeling.png">

### Task: Fit a multiple logistic regression model with Unsafe as the outcome variable and Weight, Size, and Region as the predictor variables. Using the final model, chosen by backward elimination, and using the STORE statement, generate predictive probabilities for the cars in the following DATA step.

### Results: Comparing the model fit statistics, the AIC (92.629) and SC (100.322) are both smaller in the this logistic regression model fit by the backward elimination method. This indicates that the Size-only model is better than the Region-only model. Using the c statistic, you can also see improvement beyond the Region-only model, 0.818 in this model as compared with 0.598 in the previous model.

In [4]:
ods graphics on;

proc logistic data=statdata.safety plots(only)=(effect oddsratio);
   class Region (param=ref ref='Asia')
         Size (param=ref ref='Small');
   model Unsafe(event='1')=Weight Region Size / clodds=pl selection=backward;
   units weight=-1;
   store isSafe;
   format Size sizefmt.;
   title 'LOGISTIC MODEL: Backwards Elimination';
run;

title;

data checkSafety;
   length Region $9.;
	 input Weight Size Region $ 5-13;
	 datalines;
4 1 N America
3 1 Asia     
5 3 Asia     
5 2 N America
;
run;

proc plm restore=isSafe;
score data=checkSafety out=scored_cars / ILINK;
title 'Safety Predictions using PROC PLM';
run;

proc print data=scored_cars;
run;

title;

Model Information,Model Information.1
Data Set,STATDATA.SAFETY
Response Variable,Unsafe
Number of Response Levels,2
Model,binary logit
Optimization Technique,Fisher's scoring

0,1
Number of Observations Read,96
Number of Observations Used,96

Response Profile,Response Profile,Response Profile
Ordered Value,Unsafe,Total Frequency
1,0,66
2,1,30

Class Level Information,Class Level Information,Class Level Information,Class Level Information
Class,Value,Design Variables,Design Variables.1
Region,Asia,0,
,N America,1,
Size,Large,1,0.0
,Medium,0,1.0
,Small,0,0.0

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,121.249,94.004
SC,123.813,106.826
-2 Log L,119.249,84.004

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,35.2441,4,<.0001
Score,32.8219,4,<.0001
Wald,23.9864,4,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,121.249,92.455
SC,123.813,102.712
-2 Log L,119.249,84.455

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,34.7937,3,<.0001
Score,32.4658,3,<.0001
Wald,23.9471,3,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
0.4526,1,0.5011

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,121.249,92.629
SC,123.813,100.322
-2 Log L,119.249,86.629

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,32.6199,2,<.0001
Score,31.3081,2,<.0001
Wald,24.2875,2,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
2.5983,2,0.2728

Summary of Backward Elimination,Summary of Backward Elimination,Summary of Backward Elimination,Summary of Backward Elimination,Summary of Backward Elimination,Summary of Backward Elimination
Step,Effect Removed,DF,Number In,Wald Chi-Square,Pr > ChiSq
1,Region,1,2,0.4506,0.502
2,Weight,1,1,2.1565,0.142

Type 3 Analysis of Effects,Type 3 Analysis of Effects,Type 3 Analysis of Effects,Type 3 Analysis of Effects
Effect,DF,Wald Chi-Square,Pr > ChiSq
Size,2,24.2875,<.0001

Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates
Parameter,Unnamed: 1_level_1,DF,Estimate,Standard Error,Wald Chi-Square,Pr > ChiSq
Intercept,,1,0.6506,0.3561,3.3377,0.0677
Size,Large,1,-3.3585,0.8125,17.088,<.0001
Size,Medium,1,-2.2192,0.607,13.3654,0.0003

Association of Predicted Probabilities and Observed Responses,Association of Predicted Probabilities and Observed Responses.1,Association of Predicted Probabilities and Observed Responses.2,Association of Predicted Probabilities and Observed Responses.3
Percent Concordant,70.3,Somers' D,0.636
Percent Discordant,6.7,Gamma,0.827
Percent Tied,23.0,Tau-a,0.276
Pairs,1980.0,c,0.818

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Effect,Unit,Estimate,95% Confidence Limits,95% Confidence Limits.1
Size Large vs Small,1.0,0.035,0.005,0.141
Size Medium vs Small,1.0,0.109,0.03,0.336

Store Information,Store Information.1
Item Store,WORK.ISSAFE
Data Set Created From,STATDATA.SAFETY
Created By,PROC LOGISTIC
Date Created,05DEC17:03:50:25
Response Variable,Unsafe
Link Function,Logit
Distribution,Binary
Class Variables,Region Size Unsafe
Model Effects,Intercept Size

Obs,Region,Weight,Size,Predicted
1,N America,4,Small,0.65714
2,Asia,3,Small,0.65714
3,Asia,5,Large,0.06251
4,N America,5,Medium,0.17241
