# Notes about the SAS's Course
# Predictive Modeling Using Logistic Regression (15.1)

This course covers predictive modeling using SAS/STAT software with emphasis on the LOGISTIC procedure. This course also discusses selecting variables and interactions, recoding categorical variables based on the smooth weight of evidence, assessing models, treating missing values, and using efficiency techniques for massive data sets. This notes are based in the course materials, some codes and images are copyrighted by Sas Institute. I made a Jupyter Notebook using JupiterLab with SAS University Edition.

In [10]:
/*Run this script to configurate the session*/

%let InicioCurso=/folders/myfolders/Cursos/EPMLR51;
%include "&InicioCurso/setup.sas";

/*
Practices: 
Note: If you started a new SAS session after you performed the previous practice, do the following before you continue:

Make sure you have set up your practice files in the Course Overview.
Open l4_all.sas. It contains the solution code for all practices in Lesson 1, 2, 3, and 4. Locate the code for the previous practice(s), 
review the comments to see if any modifications are needed, and then submit the code.
*/

## Lesson 4: Measuring Model Performance
You know how to prepare your input variables by addressing common problems and how to then use the prepared data to develop a family of increasingly complex models. Now you need a way to measure how well your models generalize to new data, and then you need to choose the best model.
In this lesson, you learn how to do an honest assessment of your model and use a variety of metrics to assess model performance and select the best model. In addition to the most common metrics, you learn about profit-based metrics, the Kolmogorov-Smirnov statistic, and model selection plots.

### 4.1 Honest Assessment of the Model

After you prepare input variables and fit a model, you need to ask yourself, "Does my model generalize well?" You need to do an honest assessment of how well your model performs on a different sample of data than you used to develop the model.

* explain the benefit of comparing the training and validation data fit statistics versus model complexity
* prepare the input variables in the validation data set

#### Fit versus Complexity
After you create a family of increasingly complex models, you need to compare and evaluate them on the training and validation data. You can use a variety of metrics to measure model performance. Here is a graph that illustrates model fit versus complexity on both the training and validation data sets. Fit is on the Y axis and complexity is on the X axis.

First, let's plot the line for a model of increasing complexity that is fit to the training data set. The fit statistics tend to increase as complexity increases. Some of this increase happens because the model is capturing relevant trends in the data. However, some of the increase is due to overfitting. In other words, the model is identifying peculiarities of the training data set. To see the point at which overfitting begins, you compare the training fit line to the validation fit line. As you would expect, the model's fit statistics for the validation data tend to be lower than the fit statistics for the training data. Initially, the validation fit line increases with complexity, as more complex models detect more usable data patterns. Then the line tends to plateau, indicating more complicated models that do not increase fit. When the model becomes very complex, the line starts to decrease. The decrease in fit is due to overfitting.

Notice that the most complex model has the greatest difference between the training fit line and the validation fit line. This difference, known as shrinkage, is another statistic that some modelers use when measuring a model's overall predictive power. So, when comparing models, you might use a rule that says **"Choose the simplest model that has the highest validation fit measure, with no more than 10% shrinkage from the training to the validation results."**

If the measure of model fit is some sort of error rate, then the plot looks like the one in the previous example but flipped about the horizontal axis. If there is no profit or cost information, the Mean Squared Error (MSE) is one such fit statistic that measures how poorly a model fits. That is, smaller is better.

#### Assessing Models when Target Event Data Is Rare
Data splitting is a simple technique. However, when the target event is rare, you might not be able to afford to split your data because you want to use all of the target event cases to fit the model. Furthermore, when the test set is small, the performance measures might be unreliable because of high variability. In this situation, you can use other honest assessment approaches, such as bootstrapping and k-fold cross validation.

One approach that is frugal with the data is to assess the model on the same data set that was used for training but to penalize the assessment for optimism (Ripley 1996). The appropriate penalty can be determined theoretically or by using computationally intensive methods such as the bootstrap method. Bootstrapping is repeated sampling with replacement. A model is fit to each sample, the assessment statistics are calculated for each model, and the average of the assessment statistics is calculated. It is possible to write a macro to do bootstrapping. However, bootstrapping is not covered in this course.

For small and moderate data sets, k-fold cross validation, also called v-fold cross validation, (Breiman et al. 1984; Ripley 1996; Hand 1997) is a better strategy than data splitting. In k-fold cross validation, you split your data into k parts—also called folds. The benefit of using k-fold cross validation is that you use all of the data for both training and validation. Let's look at an example.

Suppose you split your data into five folds: A, B, C, D, and E. First, you train your model on parts B, C, D, and E, and then validate the model on part A. Next, you train your model using the data in parts A, C, D, and E and then validate the model on part B. You repeat this process so that you get validation statistics for each of the remaining parts. Because there are five parts, you get a total of five validation statistics, one for each part. Finally, you calculate the average of the five validation statistics. You use this average as the overall honest assessment of the model's ability to generalize. K-fold cross validation gives you accurate validation statistics, but it doesn't give you a final model. You get your final model by fitting the model to the entire development data set.

#### Preparing the Validation Data
Note: The demonstrations in this course build on each other. If you want to perform this demonstration in your own SAS software and you started a new SAS session after you performed the previous demonstration, open l4_demos.sas. It contains the solution code for all demos in Lesson 1, 2, 3 and 4. Locate the code for the previous demos, review the comments to see if any modifications are needed, and then submit the code.

Early in the target marketing project, we split the data into training and validation data sets. Now we are ready to prepare the validation data set that we'll score later to assess our model. Remember that the validation data needs to be prepared for scoring the same way that the training data was prepared for model building. Data preparation includes imputing missing values, creating new inputs, and applying any necessary transformations. In this demonstration, we do the following: Identify inputs that need imputation using PROC MEANS. Create an output data set with the medians of the inputs that have missing values using PROC UNIVARIATE, and impute values, create inputs, and apply a transformation using the DATA step.

Let's look at the code. We have: proc means data=work.valid (our validation data set), and we're asking for the number of missing values. The VAR statement lists the variables that were in our training data set model. And run.

So we'll submit the code. And it looks like credit card balance has some missing data, and investment, and credit card. So **in the validation data set, missing values should be replaced with the medians from the training data set**. PROC UNIVARIATE is used to create an output data set with the medians of those variables.

So we have: proc univariate data=work.train_imputed_swoe (the smoothed weight of evidence) _bins with the NOPRINT option. The VAR statement lists the three variables that had missing values. We have an OUTPUT statement where we have an output data set called work.medians. The PCTLPTS= option requests the 50th percentile and the PCTLPRE= option specifies a prefix for the variable names in the output data set. In this case, it'll be CC, CCBAL, and Investment. Let's run the PROC UNIVARIATE code.

The DATA step first combines values from a single observation in one data set (in this case, work.medians) with all the observations in another data set, work.valid. So we have: data work.valid_imputed_swoe_bins. We're dropping the medians and their indicator. if _N_=1 then set work.medians; set work.valid; And that does that 1 to N merge.

We have two arrays. We have array x that has the three variables that have missing values, and we have array med that has our three medians. The DO loop simply replaces the missing values with the medians. So, do i=1 to dim(x); (in this case, 3) if x(i)=. (missing) then x(i)=med(i);

So let's look at the first one. x(1) would be credit card, so if credit card is missing, then credit card equals med(1), which is credit card 50, which is the median for credit card. So if credit card is missing, it's replaced with the median from credit card. And the END goes with the DO loop.

This DO loop goes three times, replacing the missing values for the three variables. The Branch smoothed weight of evidence variable and the bins for checking account balance, which is the rank transformed input, are added with the %INCLUDE statements. So we're including the scoring code that created the Branch smoothed weight of evidence variable, and we include the scoring code that created the bins for the checking account balance.

And if you don't have a checking account, your checking account balance is the overall mean. So let's submit that DATA step, and we have prepared our validation data for scoring.

Instead of using PROC UNIVARIATE and the DATA step to replace the missing values in the validation data set with the medians from the training data set, you can use PROC STDIZE. For more information, see Imputation with PROC STDIZE in the Resources section.

The question arises: What metrics can we use to measure model performance? You'll learn about that next.

In [3]:
/*Run this script to configurate the session and restore programs required*/

%let InicioCurso=/folders/myfolders/Cursos/EPMLR51;
%include "&InicioCurso/programs/l3_all.sas";
%include "&InicioCurso/programs/l3_demos.sas";

0,1,2,3
Data Set Name,PMLR.PVA,Observations,19372
Member Type,DATA,Variables,58
Engine,V9,Indexes,0
Created,06/07/2020 16:44:39,Observation Length,432
Last Modified,06/07/2020 16:44:39,Deleted Observations,0
Protection,,Compressed,NO
Data Set Type,,Sorted,NO
Label,,,
Data Representation,"SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64",,
Encoding,utf-8 Unicode (UTF-8),,

Engine/Host Dependent Information,Engine/Host Dependent Information.1
Data Set Page Size,65536
Number of Data Set Pages,129
First Data Page,1
Max Obs per Page,151
Obs in First Data Page,129
Number of Data Set Repairs,0
Filename,/folders/myfolders/Cursos/EPMLR51/data/pva.sas7bdat
Release Created,9.0401M6
Host Created,Linux
Inode Number,9441

Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes,Alphabetic List of Variables and Attributes
#,Variable,Type,Len
42,CARD_PROM_12,Num,8
8,CLUSTER_CODE,Char,2
4,DONOR_AGE,Num,8
10,DONOR_GENDER,Char,3
26,FREQUENCY_STATUS_97NK,Num,8
9,HOME_OWNER,Char,3
11,INCOME_GROUP,Num,8
5,IN_HOUSE,Num,8
41,LAST_GIFT_AMT,Num,8
37,LIFETIME_AVG_GIFT_AMT,Num,8

Variable,Mean,N Miss,Maximum,Minimum
TARGET_B TARGET_D MONTHS_SINCE_ORIGIN DONOR_AGE IN_HOUSE INCOME_GROUP PUBLISHED_PHONE MOR_HIT_RATE WEALTH_RATING MEDIAN_HOME_VALUE MEDIAN_HOUSEHOLD_INCOME PCT_OWNER_OCCUPIED PCT_MALE_MILITARY PCT_MALE_VETERANS PCT_VIETNAM_VETERANS PCT_WWII_VETERANS PEP_STAR RECENT_STAR_STATUS FREQUENCY_STATUS_97NK RECENT_RESPONSE_PROP RECENT_AVG_GIFT_AMT RECENT_CARD_RESPONSE_PROP RECENT_AVG_CARD_GIFT_AMT RECENT_RESPONSE_COUNT RECENT_CARD_RESPONSE_COUNT LIFETIME_CARD_PROM LIFETIME_PROM LIFETIME_GIFT_AMOUNT LIFETIME_GIFT_COUNT LIFETIME_AVG_GIFT_AMT LIFETIME_GIFT_RANGE LIFETIME_MAX_GIFT_AMT LIFETIME_MIN_GIFT_AMT LAST_GIFT_AMT CARD_PROM_12 NUMBER_PROM_12 MONTHS_SINCE_LAST_GIFT MONTHS_SINCE_FIRST_GIFT PER_CAPITA_INCOME STATUS_FL STATUS_ES home01 nses1 nses3 nses4 nses_ nurbr nurbu nurbs nurbt nurb_,0.2500000 15.6243444 73.4099732 58.9190506 0.0731984 3.9075434 0.4977287 3.3616560 5.0053967 1079.87 341.9702147 69.6989986 1.0290109 30.5739211 29.6032934 32.8524675 0.5044394 0.9311377 1.9839975 0.1901275 15.3653959 0.2308077 11.6854703 3.0431034 1.7305389 18.6680776 47.5705141 104.4257165 9.9797646 12.8583383 11.5878758 19.2088081 7.6209323 16.5841988 5.3671278 12.9018687 18.1911522 69.4820875 15857.33 0.0833161 0.2399339 0.5474912 0.3058022 0.1715362 0.0199773 0.0234359 0.2067417 0.1267809 0.2318294 0.2035928 0.0234359,0 14529 0 4795 0 4392 0 0 8810 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0,1.0000000 200.0000000 137.0000000 87.0000000 1.0000000 7.0000000 1.0000000 241.0000000 9.0000000 6000.00 1500.00 99.0000000 97.0000000 99.0000000 99.0000000 99.0000000 1.0000000 22.0000000 4.0000000 1.0000000 260.0000000 1.0000000 300.0000000 16.0000000 9.0000000 56.0000000 194.0000000 3775.00 95.0000000 450.0000000 997.0000000 1000.00 450.0000000 450.0000000 17.0000000 64.0000000 27.0000000 260.0000000 174523.00 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000,0 1.0000000 5.0000000 0 0 1.0000000 0 0 0 0 0 0 0 0 0 0 0 0 1.0000000 0 0 0 0 0 0 2.0000000 5.0000000 15.0000000 1.0000000 1.3600000 0 5.0000000 0 0 0 2.0000000 4.0000000 15.0000000 0 0 0 0 0 0 0 0 0 0 0 0 0

Number of Variable Levels,Number of Variable Levels
Variable,Levels
URBANICITY,6
SES,5
CLUSTER_CODE,54
HOME_OWNER,2
DONOR_GENDER,4
OVERLAY_SOURCE,4
RECENCY_STATUS_96NK,6

URBANICITY,Frequency,Percent,Cumulative Frequency,Cumulative Percent
?,454,2.34,454,2.34
C,4022,20.76,4476,23.11
R,4005,20.67,8481,43.78
S,4491,23.18,12972,66.96
T,3944,20.36,16916,87.32
U,2456,12.68,19372,100.0

SES,Frequency,Percent,Cumulative Frequency,Cumulative Percent
1,5924,30.58,5924,30.58
2,9284,47.92,15208,78.51
3,3323,17.15,18531,95.66
4,387,2.0,18918,97.66
?,454,2.34,19372,100.0

CLUSTER_CODE,Frequency,Percent,Cumulative Frequency,Cumulative Percent
.,454,2.34,454,2.34
01,239,1.23,693,3.58
02,380,1.96,1073,5.54
03,300,1.55,1373,7.09
04,113,0.58,1486,7.67
05,199,1.03,1685,8.7
06,123,0.63,1808,9.33
07,184,0.95,1992,10.28
08,378,1.95,2370,12.23
09,153,0.79,2523,13.02

HOME_OWNER,Frequency,Percent,Cumulative Frequency,Cumulative Percent
H,10606,54.75,10606,54.75
U,8766,45.25,19372,100.0

DONOR_GENDER,Frequency,Percent,Cumulative Frequency,Cumulative Percent
A,1,0.01,1,0.01
F,10401,53.69,10402,53.7
M,7953,41.05,18355,94.75
U,1017,5.25,19372,100.0

OVERLAY_SOURCE,Frequency,Percent,Cumulative Frequency,Cumulative Percent
B,8732,45.08,8732,45.08
M,1480,7.64,10212,52.72
N,4392,22.67,14604,75.39
P,4768,24.61,19372,100.0

RECENCY_STATUS_96NK,Frequency,Percent,Cumulative Frequency,Cumulative Percent
A,11918,61.52,11918,61.52
E,427,2.2,12345,63.73
F,1521,7.85,13866,71.58
L,93,0.48,13959,72.06
N,1192,6.15,15151,78.21
S,4221,21.79,19372,100.0

Model Information,Model Information.1
Data Set,PMLR.PVA_TRAIN
Response Variable,TARGET_B
Number of Response Levels,2
Model,binary logit
Optimization Technique,Fisher's scoring

0,1
Number of Observations Read,9687
Number of Observations Used,9687

Response Profile,Response Profile,Response Profile
Ordered Value,TARGET_B,Total Frequency
1,0,7265
2,1,2422

Class Level Information,Class Level Information,Class Level Information
Class,Value,Design Variables
PEP_STAR,0,0
,1,1

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,10897.23,10663.061
SC,10904.409,10691.776
-2 Log L,10895.23,10655.061

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,240.169,3,<.0001
Score,242.9486,3,<.0001
Wald,237.2875,3,<.0001

Type 3 Analysis of Effects,Type 3 Analysis of Effects,Type 3 Analysis of Effects,Type 3 Analysis of Effects
Effect,DF,Wald Chi-Square,Pr > ChiSq
PEP_STAR,1,43.4902,<.0001
RECENT_AVG_GIFT_AMT,1,3.9559,0.0467
FREQUENCY_STATUS_97N,1,83.8209,<.0001

Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates
Parameter,Unnamed: 1_level_1,DF,Estimate,Standard Error,Wald Chi-Square,Pr > ChiSq
Intercept,,1,-1.6454,0.0831,392.448,<.0001
PEP_STAR,1.0,1,0.3371,0.0511,43.4902,<.0001
RECENT_AVG_GIFT_AMT,,1,-0.00579,0.00291,3.9559,0.0467
FREQUENCY_STATUS_97N,,1,0.2179,0.0238,83.8209,<.0001

Association of Predicted Probabilities and Observed Responses,Association of Predicted Probabilities and Observed Responses.1,Association of Predicted Probabilities and Observed Responses.2,Association of Predicted Probabilities and Observed Responses.3
Percent Concordant,59.9,Somers' D,0.208
Percent Discordant,39.0,Gamma,0.211
Percent Tied,1.1,Tau-a,0.078
Pairs,17595830.0,c,0.604

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Effect,Unit,Estimate,95% Confidence Limits,95% Confidence Limits.1
PEP_STAR 1 vs 0,1.0,1.401,1.267,1.549
RECENT_AVG_GIFT_AMT,1.0,0.994,0.988,1.0
FREQUENCY_STATUS_97N,1.0,1.243,1.187,1.303

Obs,P_1,PEP_STAR,RECENT_AVG_GIFT_AMT,FREQUENCY_STATUS_97NK
1,0.04639,1,15.0,1
2,0.033094,0,17.5,1
3,0.06489,0,8.33,4
4,0.090167,1,5.0,4
5,0.059152,1,8.33,2
6,0.058117,1,11.57,2
7,0.046941,1,12.86,1
8,0.031733,0,25.0,1
9,0.045126,1,20.0,1
10,0.032091,0,23.0,1

grp_resp,grp_amt,N Obs,Variable,Median
0.0,0,487,DONOR_AGE INCOME_GROUP WEALTH_RATING,65.0000000 4.0000000 5.0000000
,1,1147,DONOR_AGE INCOME_GROUP WEALTH_RATING,58.0000000 4.0000000 5.0000000
,2,1612,DONOR_AGE INCOME_GROUP WEALTH_RATING,58.0000000 4.0000000 6.0000000
1.0,0,671,DONOR_AGE INCOME_GROUP WEALTH_RATING,65.0000000 4.0000000 4.5000000
,1,1270,DONOR_AGE INCOME_GROUP WEALTH_RATING,59.0000000 4.0000000 5.0000000
,2,1202,DONOR_AGE INCOME_GROUP WEALTH_RATING,57.0000000 4.0000000 5.0000000
2.0,0,2155,DONOR_AGE INCOME_GROUP WEALTH_RATING,63.0000000 4.0000000 5.0000000
,1,733,DONOR_AGE INCOME_GROUP WEALTH_RATING,61.0000000 4.0000000 6.0000000
,2,410,DONOR_AGE INCOME_GROUP WEALTH_RATING,58.5000000 4.0000000 6.0000000

Eigenvalues of the Covariance Matrix,Eigenvalues of the Covariance Matrix,Eigenvalues of the Covariance Matrix,Eigenvalues of the Covariance Matrix,Eigenvalues of the Covariance Matrix
Unnamed: 0_level_1,Eigenvalue,Difference,Proportion,Cumulative
1,0.00145225,,1.0,1.0

0,1
Root-Mean-Square Total-Sample Standard Deviation,0.038108

0,1
Root-Mean-Square Distance Between Observations,0.053893

Cluster History,Cluster History,Cluster History,Cluster History,Cluster History,Cluster History,Cluster History
Number of Clusters,Clusters Joined,Clusters Joined.1,Freq,Semipartial R-Square,R-Square,Tie
53,16,25,325,0.0,1.0,
52,19,45,291,0.0,1.0,
51,03,05,254,0.0,1.0,
50,18,23,455,0.0,1.0,
49,CL52,44,473,0.0,1.0,
48,47,49,432,0.0,1.0,
47,22,53,265,0.0,1.0,
46,15,CL50,565,0.0,1.0,
45,26,35,472,0.0,1.0,
44,07,48,192,0.0,1.0,

Number of Clusters
4

CLUSNAME,CLUSTER_CODE,CLUSTER
CL4,16,1
CL4,25,1
CL4,3,1
CL4,5,1
CL4,18,1
CL4,23,1
CL4,22,1
CL4,53,1
CL4,15,1
CL4,26,1

CLUSNAME,CLUSTER_CODE,CLUSTER
CL6,9,3
CL6,52,3
CL6,6,3
CL6,10,3
CL6,41,3
CL6,51,3
CL6,37,3
CL6,8,3
CL6,32,3

CLUSNAME,CLUSTER_CODE,CLUSTER
CL7,19,2
CL7,45,2
CL7,44,2
CL7,47,2
CL7,49,2
CL7,7,2
CL7,48,2
CL7,21,2
CL7,43,2
CL7,29,2

CLUSNAME,CLUSTER_CODE,CLUSTER
CL9,13,4
CL9,38,4
CL9,.,4
CL9,20,4
CL9,28,4

Obs,CLUSTER_CODE,cluster_swoe
267,1,-0.98447

Cluster,Variable,1 - RSquare Ratio
Cluster 1,MONTHS_SINCE_ORIGIN,0.1694
,LIFETIME_CARD_PROM,0.0964
,LIFETIME_PROM,0.1097
,LIFETIME_GIFT_AMOUNT,0.6593
,LIFETIME_GIFT_COUNT,0.4943
,MONTHS_SINCE_FIRST_GIFT,0.1536
,mi_WEALTH_RATING,0.5208
Cluster 2,RECENT_AVG_GIFT_AMT,0.4247
,RECENT_AVG_CARD_GIFT_AMT,0.6359
,LIFETIME_GIFT_RANGE,0.3966

Obs,Number of Clusters,Total Variation Explained by Clusters,Proportion of Variation Explained by Clusters,Minimum Proportion Explained by a Cluster,Maximum Second Eigenvalue in a Cluster,Minimum R-squared for a Variable,Maximum 1-R**2 Ratio for a Variable
1,1,7.932328,0.1497,0.1497,5.826522,0.0,_
2,2,12.645612,0.2386,0.2177,4.131285,0.0,1.0000
3,3,16.730624,0.3157,0.2603,3.075172,0.0007,1.0041
4,4,19.665398,0.371,0.2603,2.436935,0.0005,1.0278
5,5,21.840212,0.4121,0.3119,1.949434,0.0005,1.0616
6,6,23.77593,0.4486,0.3119,1.79511,0.0005,1.1896
7,7,25.538296,0.4819,0.3119,1.486987,0.0005,1.3666
8,8,26.833836,0.5063,0.3576,1.416451,0.0005,1.2155
9,9,28.070094,0.5296,0.3576,1.303067,0.0005,1.2155
10,10,28.813747,0.5437,0.3576,1.089054,0.002,1.2155

Obs,variable,Spearman rank of variables,Hoeffding rank of variables,Spearman Correlation,Spearman p-value,Hoeffding Correlation,Hoeffding p-value
1,FREQUENCY_STATUS_97NK,1,1,0.13777,<.0001,0.00213,<.0001
2,LAST_GIFT_AMT,2,2,-0.12345,<.0001,0.00197,<.0001
3,LIFETIME_AVG_GIFT_AMT,3,3,-0.11888,<.0001,0.00188,<.0001
4,PEP_STAR,4,6,0.11235,<.0001,0.00099,<.0001
5,LIFETIME_GIFT_COUNT,5,4,0.10943,<.0001,0.00156,<.0001
6,MONTHS_SINCE_LAST_GIFT,6,5,-0.0919,<.0001,0.00103,<.0001
7,cluster_swoe,7,7,0.0815,<.0001,0.0008,<.0001
8,RECENT_STAR_STATUS,8,9,0.07289,<.0001,0.00035,0.0006
9,MEDIAN_HOME_VALUE,9,8,0.06225,<.0001,0.00044,0.0001
10,STATUS_FL,10,22,-0.04935,<.0001,-8e-05,1.0000

0.002449

Model Information,Model Information.1
Data Set,PMLR.PVA_TRAIN_IMPUTED_SWOE
Response Variable,TARGET_B
Number of Response Levels,2
Model,binary logit
Optimization Technique,Fisher's scoring

0,1
Number of Observations Read,9687
Number of Observations Used,9687

Response Profile,Response Profile,Response Profile
Ordered Value,TARGET_B,Total Frequency
1,0,7265
2,1,2422

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,10897.23,10551.519
SC,10904.409,10745.34
-2 Log L,10895.23,10497.519

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,397.7113,26,<.0001
Score,395.1719,26,<.0001
Wald,378.4073,26,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
325.7358,307,0.2212

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,10897.23,10538.651
SC,10904.409,10739.65
-2 Log L,10895.23,10482.651

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,412.5793,27,<.0001
Score,409.9007,27,<.0001
Wald,390.7876,27,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
302.6965,306,0.5426

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,10897.23,10529.687
SC,10904.409,10737.865
-2 Log L,10895.23,10471.687

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,423.5434,28,<.0001
Score,420.3558,28,<.0001
Wald,398.3173,28,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
290.1709,305,0.7202

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,10897.23,10521.304
SC,10904.409,10736.66
-2 Log L,10895.23,10461.304

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,433.9266,29,<.0001
Score,436.1775,29,<.0001
Wald,408.5931,29,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
282.3477,304,0.8086

Summary of Forward Selection,Summary of Forward Selection,Summary of Forward Selection,Summary of Forward Selection,Summary of Forward Selection,Summary of Forward Selection
Step,Effect Entered,DF,Number In,Score Chi-Square,Pr > ChiSq
1,LAST_GIFT_AMT*LIFETIME_AVG_GIFT_AMT,1,27,23.198,<.0001
2,LIFETIME_AVG_GIFT_AMT*RECENT_STAR_STATUS,1,28,12.8362,0.0003
3,LIFETIME_GIFT_COUNT*MONTHS_SINCE_LAST_GIFT,1,29,10.1872,0.0014

Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates
Parameter,DF,Estimate,Standard Error,Wald Chi-Square,Pr > ChiSq
Intercept,1,-0.6647,0.3584,3.4393,0.0637
LIFETIME_GIFT_COUNT,1,0.0363,0.0108,11.2376,0.0008
LAST_GIFT_AMT,1,-0.0157,0.00429,13.423,0.0002
MEDIAN_HOME_VALUE,1,0.0001,2.9e-05,11.8486,0.0006
FREQUENCY_STATUS_97NK,1,0.1466,0.0284,26.6538,<.0001
MONTHS_SINCE_LAST_GIFT,1,-0.00729,0.0105,0.4868,0.4854
nses_,1,0.3404,0.175,3.7837,0.0518
mi_DONOR_AGE,1,-0.0669,0.0619,1.1678,0.2798
PCT_MALE_VETERANS,1,-0.001,0.00257,0.1506,0.6979
PCT_MALE_MILITARY,1,0.00231,0.00526,0.1924,0.6609

Association of Predicted Probabilities and Observed Responses,Association of Predicted Probabilities and Observed Responses.1,Association of Predicted Probabilities and Observed Responses.2,Association of Predicted Probabilities and Observed Responses.3
Percent Concordant,63.5,Somers' D,0.269
Percent Discordant,36.5,Gamma,0.269
Percent Tied,0.0,Tau-a,0.101
Pairs,17595830.0,c,0.635

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Effect,Unit,Estimate,95% Confidence Limits,95% Confidence Limits.1
MEDIAN_HOME_VALUE,1.0,1.0,1.0,1.0
FREQUENCY_STATUS_97NK,1.0,1.158,1.095,1.224
nses_,1.0,1.406,0.994,1.975
mi_DONOR_AGE,1.0,0.935,0.828,1.056
PCT_MALE_VETERANS,1.0,0.999,0.994,1.004
PCT_MALE_MILITARY,1.0,1.002,0.991,1.012
PCT_WWII_VETERANS,1.0,1.003,0.999,1.006
cluster_swoe,1.0,2.607,1.853,3.672
PEP_STAR,1.0,1.348,1.186,1.531
nurbu,1.0,0.945,0.776,1.15

Model Information,Model Information.1
Data Set,PMLR.PVA_TRAIN_IMPUTED_SWOE
Response Variable,TARGET_B
Number of Response Levels,2
Model,binary logit
Optimization Technique,Fisher's scoring

0,1
Number of Observations Read,9687
Number of Observations Used,9687

Response Profile,Response Profile,Response Profile
Ordered Value,TARGET_B,Total Frequency
1,0,7265
2,1,2422

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,10897.23,10521.304
SC,10904.409,10736.66
-2 Log L,10895.23,10461.304

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,433.9266,29,<.0001
Score,436.1775,29,<.0001
Wald,408.5931,29,<.0001

Analysis of Effects Removed by Fast Backward Elimination,Analysis of Effects Removed by Fast Backward Elimination,Analysis of Effects Removed by Fast Backward Elimination,Analysis of Effects Removed by Fast Backward Elimination,Analysis of Effects Removed by Fast Backward Elimination,Analysis of Effects Removed by Fast Backward Elimination,Analysis of Effects Removed by Fast Backward Elimination
Effect Removed,Chi-Square,DF,Pr > ChiSq,Residual Chi-Square,DF.1,Pr > Residual ChiSq
STATUS_FL STATUS_FL,0.0003,1,0.9859,0.0003,1,0.9859
home01 home01,0.002,1,0.9645,0.0023,2,0.9989
nurbs nurbs,0.0289,1,0.865,0.0312,3,0.9985
PCT_MALE_VETERANS PCT_MALE_VETE,0.16,1,0.6892,0.1912,4,0.9957
PCT_MALE_MILITARY PCT_MALE_MILI,0.1767,1,0.6742,0.3679,5,0.9962
nurbt nurbt,0.3333,1,0.5637,0.7012,6,0.9945
nurbr nurbr,0.1894,1,0.6634,0.8906,7,0.9964
MOR_HIT_RATE MOR_HIT_RATE,0.4537,1,0.5006,1.3443,8,0.995
WEALTH_RATING WEALTH_RATING,0.498,1,0.4804,1.8423,9,0.9937
nses4 nses4,0.6128,1,0.4337,2.4551,10,0.9915

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,10897.23,10518.858
SC,10904.409,10590.643
-2 Log L,10895.23,10498.858

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,396.3724,9,<.0001
Score,393.6389,9,<.0001
Wald,375.9447,9,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
39.133,20,0.0064

Summary of Backward Elimination,Summary of Backward Elimination,Summary of Backward Elimination,Summary of Backward Elimination,Summary of Backward Elimination,Summary of Backward Elimination
Step,Effect Removed,DF,Number In,Wald Chi-Square,Pr > ChiSq
1,STATUS_FL,1,28,0.0003,0.9859
1,home01,1,27,0.002,0.9645
1,nurbs,1,26,0.0289,0.865
1,PCT_MALE_VETERANS,1,25,0.16,0.6892
1,PCT_MALE_MILITARY,1,24,0.1767,0.6742
1,nurbt,1,23,0.3333,0.5637
1,nurbr,1,22,0.1894,0.6634
1,MOR_HIT_RATE,1,21,0.4537,0.5006
1,WEALTH_RATING,1,20,0.498,0.4804
1,nses4,1,19,0.6128,0.4337

Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates
Parameter,DF,Estimate,Standard Error,Wald Chi-Square,Pr > ChiSq
Intercept,1,0.1482,0.2498,0.352,0.5530
LAST_GIFT_AMT,1,-0.0139,0.00426,10.6085,0.0011
MEDIAN_HOME_VALUE,1,9.5e-05,2.6e-05,13.4329,0.0002
FREQUENCY_STATUS_97NK,1,0.1565,0.0258,36.6681,<.0001
MONTHS_SINCE_LAST_GIFT,1,-0.0334,0.0062,29.1079,<.0001
LIFETIME_AVG_GIFT_AMT,1,-0.0138,0.00615,5.0581,0.0245
cluster_swoe,1,1.0032,0.1492,45.1805,<.0001
PEP_STAR,1,0.3191,0.0533,35.8039,<.0001
INCOME_GROUP,1,0.0484,0.0154,9.8491,0.0017
LAST_GIFT_AMT*LIFETIME_AVG_GIFT_AMT,1,0.000227,6.1e-05,13.9023,0.0002

Association of Predicted Probabilities and Observed Responses,Association of Predicted Probabilities and Observed Responses.1,Association of Predicted Probabilities and Observed Responses.2,Association of Predicted Probabilities and Observed Responses.3
Percent Concordant,63.1,Somers' D,0.262
Percent Discordant,36.9,Gamma,0.262
Percent Tied,0.0,Tau-a,0.098
Pairs,17595830.0,c,0.631

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Effect,Unit,Estimate,95% Confidence Limits,95% Confidence Limits.1
MEDIAN_HOME_VALUE,1.0,1.0,1.0,1.0
FREQUENCY_STATUS_97NK,1.0,1.169,1.112,1.23
MONTHS_SINCE_LAST_GIFT,1.0,0.967,0.955,0.979
cluster_swoe,1.0,2.727,2.037,3.656
PEP_STAR,1.0,1.376,1.239,1.528
INCOME_GROUP,1.0,1.05,1.018,1.082

Obs,model,AUC,AIC,BIC,MisClass,AdjRSquare,BrierScore
1,9,0.631652,14779.62,14851.4,0.2498,0.063068,0.222593
2,10,0.632213,14774.72,14853.68,0.2498,0.063921,0.222532
3,8,0.630349,14789.23,14853.84,0.2499,0.061631,0.222709
4,11,0.632442,14771.15,14857.29,0.2498,0.064609,0.222463
5,12,0.632987,14764.68,14858.01,0.2497,0.065654,0.222331
6,7,0.628417,14802.91,14860.34,0.2499,0.059688,0.222876
7,13,0.633555,14760.16,14860.66,0.2497,0.066458,0.222274
8,6,0.626502,14814.34,14864.59,0.2499,0.058021,0.223026
9,14,0.633561,14759.11,14866.79,0.2497,0.066835,0.222234
10,5,0.625815,14826.13,14869.2,0.25,0.056308,0.223329

Variables Included in Model
LIFETIME_GIFT_COUNT LAST_GIFT_AMT MEDIAN_HOME_VALUE FREQUENCY_STATUS_97NK cluster_swoe PEP_STAR INCOME_GROUP LAST_GIFT_AMT*LIFETIME_AVG_GIFT_AMT LIFETIME_GIFT_COUNT*MONTHS_SINCE_LAST_GIFT

Variable,Label,N,N Miss,Mean,Minimum,Maximum
AcctAge DDA DDABal Dep DepAmt CashBk Checks DirDep NSF NSFAmt Phone Teller Sav SavBal ATM ATMAmt POS POSAmt CD CDBal IRA IRABal LOC LOCBal Inv InvBal ILS ILSBal MM MMBal MMCred MTG MTGBal CC CCBal CCPurc SDB Income HMOwn LORes HMVal Age CRScore Moved InArea,Age of Oldest Account Checking Account Checking Balance Checking Deposits Amount Deposited Number Cash Back Number of Checks Direct Deposit Number Insufficient Fund Amount NSF Number Telephone Banking Teller Visits Saving Account Saving Balance ATM ATM Withdrawal Amount Number Point of Sale Amount Point of Sale Certificate of Deposit CD Balance Retirement Account IRA Balance Line of Credit Line of Credit Balance Investment Investment Balance Installment Loan Loan Balance Money Market Money Market Balance Money Market Credits Mortgage Mortgage Balance Credit Card Credit Card Balance Credit Card Purchases Safety Deposit Box Income Owns Home Length of Residence Home Value Age Credit Score Recent Address Change Local Address,30194 32264 32264 32264 32264 32264 32264 32264 32264 32264 28131 32264 32264 32264 32264 32264 28131 28131 32264 32264 32264 32264 32264 32264 28131 28131 32264 32264 32264 32264 32264 32264 32264 28131 28131 28131 32264 26482 26731 26482 26482 25907 31557 32264 32264,2070 0 0 0 0 0 0 0 0 0 4133 0 0 0 0 0 4133 4133 0 0 0 0 0 0 4133 4133 0 0 0 0 0 0 0 4133 4133 4133 0 5782 5533 5782 5782 6357 707 0 0,5.9086772 0.8156459 2170.02 2.1346082 2232.76 0.0159621 4.2599182 0.2955616 0.0870630 2.2905464 0.4056024 1.3652678 0.4668981 3170.60 0.6099368 1235.41 1.0756816 48.9261782 0.1258368 2530.71 0.0532792 617.5704550 0.0633833 1175.22 0.0296826 1599.17 0.0495909 517.5692344 0.1148959 1875.76 0.0563786 0.0493429 8081.74 0.4830969 9586.55 0.1541716 0.1086660 40.5889283 0.5418802 7.0056642 110.9121290 47.9283205 666.4935197 0.0296305 0.9602963,0.3000000 0 -774.8300000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -613.0000000 0 -2214.92 0 0 0 0 0 0 0 0 -2060.51 0 0 0 0 0.5000000 67.0000000 16.0000000 509.0000000 0 0,61.5000000 1.0000000 278093.83 28.0000000 484893.67 4.0000000 49.0000000 1.0000000 1.0000000 666.8500000 30.0000000 27.0000000 1.0000000 700026.94 1.0000000 427731.26 54.0000000 3293.49 1.0000000 1053900.00 1.0000000 596497.60 1.0000000 523147.24 1.0000000 8323796.02 1.0000000 29162.79 1.0000000 120801.11 5.0000000 1.0000000 10887573.28 1.0000000 10641354.78 5.0000000 1.0000000 233.0000000 1.0000000 19.5000000 754.0000000 94.0000000 820.0000000 1.0000000 1.0000000

Ins,Frequency,Percent,Cumulative Frequency,Cumulative Percent
0,21089,65.36,21089,65.36
1,11175,34.64,32264,100.0

Branch of Bank,Branch of Bank,Branch of Bank,Branch of Bank,Branch of Bank
Branch,Frequency,Percent,Cumulative Frequency,Cumulative Percent
B1,2819,8.74,2819,8.74
B10,273,0.85,3092,9.58
B11,247,0.77,3339,10.35
B12,549,1.7,3888,12.05
B13,535,1.66,4423,13.71
B14,1072,3.32,5495,17.03
B15,2235,6.93,7730,23.96
B16,1534,4.75,9264,28.71
B17,850,2.63,10114,31.35
B18,541,1.68,10655,33.02

Area Classification,Area Classification,Area Classification,Area Classification,Area Classification
Res,Frequency,Percent,Cumulative Frequency,Cumulative Percent
R,8077,25.03,8077,25.03
S,11506,35.66,19583,60.7
U,12681,39.3,32264,100.0

Table of Ins by Selected,Table of Ins by Selected,Table of Ins by Selected,Table of Ins by Selected
Ins,Selected(Selection Indicator),Selected(Selection Indicator),Selected(Selection Indicator)
Ins,0,1,Total
Frequency Percent Row Pct Col Pct,,,
0,7028 21.78 33.33 65.36,14061 43.58 66.67 65.36,21089 65.36
1,3724 11.54 33.32 34.64,7451 23.09 66.68 34.64,11175 34.64
Total,10752 33.33,21512 66.67,32264 100.00
Frequency Percent Row Pct Col Pct,Table of Ins by Selected Ins Selected(Selection Indicator) 0 1 Total 0 7028 21.78 33.33 65.36 14061 43.58 66.67 65.36 21089 65.36  1 3724 11.54 33.32 34.64 7451 23.09 66.68 34.64 11175 34.64  Total 10752 33.33 21512 66.67 32264 100.00,,

Frequency Percent Row Pct Col Pct

Table of Ins by Selected,Table of Ins by Selected,Table of Ins by Selected,Table of Ins by Selected
Ins,Selected(Selection Indicator),Selected(Selection Indicator),Selected(Selection Indicator)
Ins,0,1,Total
0,7028 21.78 33.33 65.36,14061 43.58 66.67 65.36,21089 65.36
1,3724 11.54 33.32 34.64,7451 23.09 66.68 34.64,11175 34.64
Total,10752 33.33,21512 66.67,32264 100.00

Model Information,Model Information.1
Data Set,WORK.TRAIN
Response Variable,Ins
Number of Response Levels,2
Model,binary logit
Optimization Technique,Fisher's scoring

0,1
Number of Observations Read,21512
Number of Observations Used,21512

Response Profile,Response Profile,Response Profile
Ordered Value,Ins,Total Frequency
1,0,14061
2,1,7451

Class Level Information,Class Level Information,Class Level Information,Class Level Information
Class,Value,Design Variables,Design Variables.1
Res,R,1,0.0
,S,0,0.0
,U,0,1.0
DDA,0,0,
,1,1,

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,26284.098
SC,27767.651,26355.885
-2 Log L,27757.675,26266.098

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,1491.5772,8,<.0001
Score,1315.6105,8,<.0001
Wald,1256.8282,8,<.0001

Type 3 Analysis of Effects,Type 3 Analysis of Effects,Type 3 Analysis of Effects,Type 3 Analysis of Effects
Effect,DF,Wald Chi-Square,Pr > ChiSq
DDA,1,484.002,<.0001
DDABal,1,317.1284,<.0001
Dep,1,26.0277,<.0001
DepAmt,1,10.1271,0.0015
CashBk,1,19.8706,<.0001
Checks,1,0.0309,0.8604
Res,2,0.1229,0.9404

Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates
Parameter,Unnamed: 1_level_1,DF,Estimate,Standard Error,Wald Chi-Square,Pr > ChiSq,Standardized Estimate
Intercept,,1,0.1706,0.0374,20.8591,<.0001,
DDA,1,1,-1.041,0.0473,484.002,<.0001,-0.2226
DDABal,,1,7.5e-05,4.188e-06,317.1284,<.0001,0.3135
Dep,,1,-0.0682,0.0134,26.0277,<.0001,-0.0648
DepAmt,,1,1.2e-05,3.819e-06,10.1271,0.0015,0.046
CashBk,,1,-0.6393,0.1434,19.8706,<.0001,-0.0468
Checks,,1,-0.00068,0.00384,0.0309,0.8604,-0.00193
Res,R,1,-0.0129,0.0388,0.1106,0.7395,-0.00308
Res,U,1,-0.00191,0.0343,0.0031,0.9557,-0.00051

Association of Predicted Probabilities and Observed Responses,Association of Predicted Probabilities and Observed Responses.1,Association of Predicted Probabilities and Observed Responses.2,Association of Predicted Probabilities and Observed Responses.3
Percent Concordant,67.2,Somers' D,0.357
Percent Discordant,31.5,Gamma,0.362
Percent Tied,1.3,Tau-a,0.162
Pairs,104768511.0,c,0.679

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Label,Odds Ratio,Estimate,95% Confidence Limits,95% Confidence Limits.1
Comparisons of Residential Classification,Res R vs S,0.987,0.915,1.065
Comparisons of Residential Classification 2,Res R vs U,0.989,0.918,1.066
Comparisons of Residential Classification 3,Res U vs S,0.998,0.933,1.068

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Effect,Unit,Estimate,95% Confidence Limits,95% Confidence Limits.1
DDA 1 vs 0,1.0,0.353,0.322,0.387
DDABal,1000.0,1.077,1.069,1.086
Dep,1.0,0.934,0.91,0.959
DepAmt,1000.0,1.012,1.005,1.02
CashBk,1.0,0.528,0.394,0.693
Checks,1.0,0.999,0.992,1.007
Res R vs S,1.0,0.987,0.915,1.065
Res U vs S,1.0,0.998,0.933,1.068

Obs,P_1,DDA,DDABal,Dep,DepAmt,CashBk,Checks,Res
1,0.27023,1,56.29,2,955.51,0,1,U
2,0.32036,1,3292.17,2,961.6,0,1,U
3,0.29822,1,1723.86,2,2108.65,0,2,U
4,0.54208,0,0.0,0,0.0,0,0,U
5,0.26946,1,67.91,2,519.24,0,3,S
6,0.32228,1,2554.58,1,501.36,0,2,S
7,0.27038,1,0.0,2,2883.08,0,12,R
8,0.30399,1,2641.33,3,4521.61,0,8,S
9,0.54255,0,0.0,0,0.0,0,0,S
10,0.27956,1,52.22,1,75.59,0,0,R

Analysis Variable : P_1
Mean
0.3448203

Obs,P_1,DDA,DDABal,Dep,DepAmt,CashBk,Checks,Res
1,0.27023,1,56.29,2,955.51,0,1,U
2,0.32036,1,3292.17,2,961.6,0,1,U
3,0.29822,1,1723.86,2,2108.65,0,2,U
4,0.54208,0,0.0,0,0.0,0,0,U
5,0.26946,1,67.91,2,519.24,0,3,S
6,0.32228,1,2554.58,1,501.36,0,2,S
7,0.27038,1,0.0,2,2883.08,0,12,R
8,0.30399,1,2641.33,3,4521.61,0,8,S
9,0.54255,0,0.0,0,0.0,0,0,S
10,0.27956,1,52.22,1,75.59,0,0,R

Obs,P_Ins1,DDA,DDABal,Dep,DepAmt,CashBk,Checks,Res
1,0.27023,1,56.29,2,955.51,0,1,U
2,0.32036,1,3292.17,2,961.6,0,1,U
3,0.29822,1,1723.86,2,2108.65,0,2,U
4,0.54208,0,0.0,0,0.0,0,0,U
5,0.26946,1,67.91,2,519.24,0,3,S
6,0.32228,1,2554.58,1,501.36,0,2,S
7,0.27038,1,0.0,2,2883.08,0,12,R
8,0.30399,1,2641.33,3,4521.61,0,8,S
9,0.54255,0,0.0,0,0.0,0,0,S
10,0.27956,1,52.22,1,75.59,0,0,R

Obs,P_1,DDA,DDABal,Dep,DepAmt,CashBk,Checks,Res
1,0.01406,1,56.29,2,955.51,0,1,U
2,0.01783,1,3292.17,2,961.6,0,1,U
3,0.016102,1,1723.86,2,2108.65,0,2,U
4,0.043602,0,0.0,0,0.0,0,0,U
5,0.014007,1,67.91,2,519.24,0,3,S
6,0.017985,1,2554.58,1,501.36,0,2,S
7,0.014071,1,0.0,2,2883.08,0,12,R
8,0.016543,1,2641.33,3,4521.61,0,8,S
9,0.043682,0,0.0,0,0.0,0,0,S
10,0.014724,1,52.22,1,75.59,0,0,R

Analysis Variable : P_1
Mean
0.0249733

Obs,prob,DDA,DDABal,Dep,DepAmt,CashBk,Checks,Res
1,0.01406,1,56.29,2,955.51,0,1,U
2,0.01783,1,3292.17,2,961.6,0,1,U
3,0.016102,1,1723.86,2,2108.65,0,2,U
4,0.043602,0,0.0,0,0.0,0,0,U
5,0.014007,1,67.91,2,519.24,0,3,S
6,0.017985,1,2554.58,1,501.36,0,2,S
7,0.014071,1,0.0,2,2883.08,0,12,R
8,0.016543,1,2641.33,3,4521.61,0,8,S
9,0.043682,0,0.0,0,0.0,0,0,S
10,0.014724,1,52.22,1,75.59,0,0,R

Obs,CCBal,CCPurc,Income,HMOwn
1,0.00,1,4,1
2,65.76,0,125,1
3,85202.99,0,55,1
4,.,.,20,0
5,0.00,0,25,1
6,0.00,0,8,1
7,0.00,0,100,1
8,323.13,0,13,1
9,32366.86,0,.,1
10,0.00,0,9,0

Obs,CCBal,MICCBal,CCPurc,MICCPurc,Income,MIIncome,HMOwn,MIHMOwn,nummiss
1,0.0,0,1,0,4,0,1,0,0
2,65.76,0,0,0,125,0,1,0,0
3,85202.99,0,0,0,55,0,1,0,0
4,0.0,1,0,1,20,0,0,0,8
5,0.0,0,0,0,25,0,1,0,8
6,0.0,0,0,0,8,0,1,0,9
7,0.0,0,0,0,100,0,1,0,9
8,323.13,0,0,0,13,0,1,0,9
9,32366.86,0,0,0,35,1,1,0,13
10,0.0,0,0,0,9,0,0,0,13

Obs,Branch,_TYPE_,_FREQ_,prop
1,B1,1,1930,0.36995
2,B10,1,182,0.41758
3,B11,1,160,0.41875
4,B12,1,368,0.36957
5,B13,1,369,0.4065
6,B14,1,712,0.19663
7,B15,1,1510,0.23179
8,B16,1,1040,0.28558
9,B17,1,544,0.34007
10,B18,1,370,0.36757

Eigenvalues of the Covariance Matrix,Eigenvalues of the Covariance Matrix,Eigenvalues of the Covariance Matrix,Eigenvalues of the Covariance Matrix,Eigenvalues of the Covariance Matrix
Unnamed: 0_level_1,Eigenvalue,Difference,Proportion,Cumulative
1,0.00262967,,1.0,1.0

0,1
Root-Mean-Square Total-Sample Standard Deviation,0.05128

0,1
Root-Mean-Square Distance Between Observations,0.072521

Cluster History,Cluster History,Cluster History,Cluster History,Cluster History,Cluster History,Cluster History
Number of Clusters,Clusters Joined,Clusters Joined.1,Freq,Semipartial R-Square,R-Square,Tie
18,B1,B12,2298,0.0,1.0,
17,B18,B9,738,0.0,1.0,
16,B10,B11,342,0.0,1.0,
15,B19,B8,1069,0.0,1.0,
14,CL17,B6,1676,0.0,1.0,
13,B2,B7,4491,0.0,1.0,
12,CL15,B3,2974,0.0001,1.0,
11,CL18,CL14,3974,0.0002,1.0,
10,CL16,B13,711,0.0004,0.999,
9,CL12,B5,4793,0.0006,0.999,

Number of Clusters
5

CLUSNAME,Branch,CLUSTER
B16,B16,5

CLUSNAME,Branch,CLUSTER
CL5,B14,4
CL5,B15,4

CLUSNAME,Branch,CLUSTER
CL6,B10,2
CL6,B11,2
CL6,B19,2
CL6,B8,2
CL6,B3,2
CL6,B13,2
CL6,B5,2

CLUSNAME,Branch,CLUSTER
CL7,B2,3
CL7,B7,3
CL7,B17,3

CLUSNAME,Branch,CLUSTER
CL8,B1,1
CL8,B12,1
CL8,B18,1
CL8,B9,1
CL8,B6,1
CL8,B4,1

Cluster,Variable,1 - RSquare Ratio,Variable Label
Cluster 1,branch_swoe,0.4189,
,MIPhone,0.0042,
,MIPOS,0.0042,
,MIPOSAmt,0.0042,
,MIInv,0.0042,
,MIInvBal,0.0042,
,MICC,0.0042,
,MICCBal,0.0042,
,MICCPurc,0.0042,
Cluster 2,MIIncome,0.0074,

Obs,Number of Clusters,Total Variation Explained by Clusters,Proportion of Variation Explained by Clusters,Minimum Proportion Explained by a Cluster,Maximum Second Eigenvalue in a Cluster,Minimum R-squared for a Variable,Maximum 1-R**2 Ratio for a Variable
1,1,8.839653,0.1449,0.1449,5.021094,0.0,_
2,2,13.846715,0.227,0.1956,3.457352,0.0,1.0000
3,3,17.207611,0.2821,0.1373,2.625736,0.0,1.4229
4,4,19.690396,0.3228,0.1373,2.314577,0.0001,1.4229
5,5,21.919904,0.3593,0.2239,2.059159,0.0001,1.3331
6,6,23.915604,0.3921,0.2239,1.965075,0.0003,1.2875
7,7,25.812779,0.4232,0.2239,1.607659,0.0003,1.2875
8,8,27.29802,0.4475,0.2239,1.476805,0.0003,1.3890
9,9,28.676857,0.4701,0.2239,1.410293,0.0003,1.4518
10,10,30.055834,0.4927,0.2211,1.383226,0.0003,1.4518

Obs,Variable,Spearman rank of variables,Hoeffding rank of variables,Spearman Correlation,Spearman p-value,Hoeffding Correlation,Hoeffding p-value
1,SavBal,1,1,0.2509,<.0001,0.00981,<.0001
2,CD,2,7,0.20283,<.0001,0.00186,<.0001
3,DDA,3,5,-0.19512,<.0001,0.00237,<.0001
4,MM,4,12,0.15949,<.0001,0.00103,<.0001
5,Dep,5,2,-0.15414,<.0001,0.00362,<.0001
6,Sav,6,4,0.15154,<.0001,0.00238,<.0001
7,CC,7,6,0.14636,<.0001,0.00216,<.0001
8,ATM,8,8,-0.1229,<.0001,0.00147,<.0001
9,IRA,9,17,0.1123,<.0001,0.0002,0.0001
10,IRABal,10,18,0.11122,<.0001,0.00018,0.0002

Obs,DDABal,bin
1,1986.81,76
2,1594.84,71
3,1437.57,69
4,190.03,33
5,1772.13,73
6,375.62,42
7,324.94,40
8,13.85,21
9,9644.48,95
10,284.88,38

Obs,bin,_TYPE_,_FREQ_,ins,DDABal
1,0,1,135,12,-32.2597
2,9,1,3994,2161,0.0
3,19,1,173,12,2.8347
4,20,1,215,19,8.8542
5,21,1,215,26,17.2156
6,22,1,215,40,28.3862
7,23,1,216,25,41.2978
8,24,1,215,26,53.9779
9,25,1,215,32,68.166
10,26,1,215,45,83.2786

Checking Account,N Obs,Variable,Label,Mean,Median,Minimum,Maximum
0,3968,DDABal Ins,Checking Balance,0 0.5415827,0 1.0000000,0 0,0 1.0000000
1,17544,DDABal Ins,Checking Balance,2713.45 0.3022116,890.5900000 0,-774.8300000 0,278093.83 1.0000000

Obs,bin,_TYPE_,_FREQ_,max
1,0,1,215,1.75
2,1,1,215,8.45
3,2,1,214,16.44
4,3,1,216,27.54
5,4,1,215,41.08
6,5,1,215,53.01
7,6,1,215,66.42
8,7,1,216,82.96
9,8,1,215,96.55
10,9,1,215,111.59

Analysis Variable : DDABal Checking Balance,Analysis Variable : DDABal Checking Balance,Analysis Variable : DDABal Checking Balance,Analysis Variable : DDABal Checking Balance
B_DDABal,N Obs,Minimum,Maximum
0,215,-774.83,1.75
1,215,1.89,8.45
2,214,8.46,16.44
3,216,16.46,27.54
4,215,27.62,41.08
5,215,41.19,53.01
6,215,53.02,66.42
7,216,66.5,82.96
8,215,83.06,96.55
9,215,96.62,111.59

0.001586

Model Information,Model Information.1
Data Set,WORK.TRAIN_IMPUTED_SWOE_BINS
Response Variable,Ins
Number of Response Levels,2
Model,binary logit
Optimization Technique,Fisher's scoring

0,1
Number of Observations Read,21512
Number of Observations Used,21512

Response Profile,Response Profile,Response Profile
Ordered Value,Ins,Total Frequency
1,0,14061
2,1,7451

Class Level Information,Class Level Information,Class Level Information,Class Level Information
Class,Value,Design Variables,Design Variables.1
Res,R,1,0
,S,0,0
,U,0,1

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,23374.668
SC,27767.651,23613.959
-2 Log L,27757.675,23314.668

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,4443.0066,29,<.0001
Score,4025.8134,29,<.0001
Wald,3240.8442,29,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
1519.1514,384,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,23042.784
SC,27767.651,23290.051
-2 Log L,27757.675,22980.784

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,4776.8913,30,<.0001
Score,4228.8323,30,<.0001
Wald,3415.5578,30,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
1030.7985,383,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22982.529
SC,27767.651,23237.773
-2 Log L,27757.675,22918.529

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,4839.1455,31,<.0001
Score,4249.0303,31,<.0001
Wald,3482.1332,31,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
945.1677,382,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22925.941
SC,27767.651,23189.161
-2 Log L,27757.675,22859.941

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,4897.7342,32,<.0001
Score,4280.9816,32,<.0001
Wald,3500.4542,32,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
870.2437,381,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22873.358
SC,27767.651,23144.554
-2 Log L,27757.675,22805.358

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,4952.3173,33,<.0001
Score,4301.4385,33,<.0001
Wald,3508.8916,33,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
869.2315,380,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22828.869
SC,27767.651,23108.041
-2 Log L,27757.675,22758.869

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,4998.8063,34,<.0001
Score,4307.7237,34,<.0001
Wald,3469.6034,34,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
850.2275,379,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22811.515
SC,27767.651,23098.664
-2 Log L,27757.675,22739.515

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5018.1599,35,<.0001
Score,4311.6398,35,<.0001
Wald,3467.6629,35,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
809.9998,378,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22791.943
SC,27767.651,23087.069
-2 Log L,27757.675,22717.943

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5039.7318,36,<.0001
Score,4343.237,36,<.0001
Wald,3474.7437,36,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
758.5702,377,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22778.653
SC,27767.651,23081.755
-2 Log L,27757.675,22702.653

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5055.0223,37,<.0001
Score,4384.3023,37,<.0001
Wald,3415.373,37,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
747.8028,376,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22752.814
SC,27767.651,23063.892
-2 Log L,27757.675,22674.814

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5082.8613,38,<.0001
Score,4402.1131,38,<.0001
Wald,3432.3592,38,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
716.8324,375,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22742.376
SC,27767.651,23061.431
-2 Log L,27757.675,22662.376

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5095.2985,39,<.0001
Score,4409.0129,39,<.0001
Wald,3431.1292,39,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
695.8205,374,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22729.148
SC,27767.651,23056.179
-2 Log L,27757.675,22647.148

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5110.5274,40,<.0001
Score,4421.238,40,<.0001
Wald,3428.7744,40,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
680.1339,373,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22713.192
SC,27767.651,23048.199
-2 Log L,27757.675,22629.192

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5128.483,41,<.0001
Score,4448.6802,41,<.0001
Wald,3425.5025,41,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
661.5405,372,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22700.945
SC,27767.651,23043.928
-2 Log L,27757.675,22614.945

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5142.7304,42,<.0001
Score,4460.5014,42,<.0001
Wald,3425.2923,42,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
645.6204,371,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22689.322
SC,27767.651,23040.282
-2 Log L,27757.675,22601.322

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5156.353,43,<.0001
Score,4475.15,43,<.0001
Wald,3415.5694,43,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
630.2558,370,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22677.014
SC,27767.651,23035.951
-2 Log L,27757.675,22587.014

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5170.6605,44,<.0001
Score,4485.4356,44,<.0001
Wald,3423.071,44,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
615.8777,369,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22665.286
SC,27767.651,23032.199
-2 Log L,27757.675,22573.286

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5184.3889,45,<.0001
Score,4495.803,45,<.0001
Wald,3425.555,45,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
601.8428,368,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22654.661
SC,27767.651,23029.55
-2 Log L,27757.675,22560.661

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5197.0137,46,<.0001
Score,4504.7703,46,<.0001
Wald,3440.1085,46,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
585.8285,367,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22645.631
SC,27767.651,23028.497
-2 Log L,27757.675,22549.631

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5208.0438,47,<.0001
Score,4513.099,47,<.0001
Wald,3444.0515,47,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
577.4575,366,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22636.784
SC,27767.651,23027.626
-2 Log L,27757.675,22538.784

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5218.8909,48,<.0001
Score,4518.6104,48,<.0001
Wald,3450.5367,48,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
566.0713,365,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22627.86
SC,27767.651,23026.678
-2 Log L,27757.675,22527.86

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5229.8152,49,<.0001
Score,4522.7746,49,<.0001
Wald,3448.0013,49,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
559.5328,364,<.0001

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22619.644
SC,27767.651,23026.439
-2 Log L,27757.675,22517.644

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5240.031,50,<.0001
Score,4539.5156,50,<.0001
Wald,3457.4404,50,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
551.3525,363,<.0001

Summary of Forward Selection,Summary of Forward Selection,Summary of Forward Selection,Summary of Forward Selection,Summary of Forward Selection,Summary of Forward Selection,Summary of Forward Selection
Step,Effect Entered,DF,Number In,Score Chi-Square,Pr > ChiSq,Variable Label
1,SavBal*B_DDABal,1,29,342.3565,<.0001,
2,SavBal*DDA,1,30,75.6237,<.0001,
3,MM*B_DDABal,1,31,61.2113,<.0001,
4,branch_swoe*ATMAmt,1,32,55.0544,<.0001,
5,Sav*B_DDABal,1,33,46.3236,<.0001,
6,ATMAmt*DepAmt,1,34,36.9443,<.0001,
7,SavBal*SDB,1,35,28.9771,<.0001,
8,SavBal*ATMAmt,1,36,24.4441,<.0001,
9,B_DDABal*ATMAmt,1,37,28.2743,<.0001,
10,SavBal*IRA,1,38,18.1867,<.0001,

Type 3 Analysis of Effects,Type 3 Analysis of Effects,Type 3 Analysis of Effects,Type 3 Analysis of Effects
Effect,DF,Wald Chi-Square,Pr > ChiSq
SavBal,1,299.0089,<.0001
Dep,1,3.2378,0.0720
DDA,1,7.9296,0.0049
CD,1,219.7287,<.0001
Sav,1,121.8246,<.0001
CC,1,6.1556,0.0131
ATM,1,0.2451,0.6205
MM,1,142.9057,<.0001
branch_swoe,1,129.4792,<.0001
Phone,1,3.3792,0.0660

Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates
Parameter,Unnamed: 1_level_1,DF,Estimate,Standard Error,Wald Chi-Square,Pr > ChiSq
Intercept,,1,-2.0116,0.1406,204.7488,<.0001
SavBal,,1,0.000167,9.646e-06,299.0089,<.0001
Dep,,1,-0.0457,0.0254,3.2378,0.0720
DDA,,1,-0.1794,0.0637,7.9296,0.0049
CD,,1,1.1361,0.0766,219.7287,<.0001
Sav,,1,1.01,0.0915,121.8246,<.0001
CC,,1,0.1248,0.0503,6.1556,0.0131
ATM,,1,0.0314,0.0633,0.2451,0.6205
MM,,1,1.7606,0.1473,142.9057,<.0001
branch_swoe,,1,0.9275,0.0815,129.4792,<.0001

Association of Predicted Probabilities and Observed Responses,Association of Predicted Probabilities and Observed Responses.1,Association of Predicted Probabilities and Observed Responses.2,Association of Predicted Probabilities and Observed Responses.3
Percent Concordant,78.9,Somers' D,0.578
Percent Discordant,21.1,Gamma,0.578
Percent Tied,0.0,Tau-a,0.262
Pairs,104768511.0,c,0.789

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Effect,Unit,Estimate,95% Confidence Limits,95% Confidence Limits.1
Phone,1.0,0.966,0.931,1.002
ILS,1.0,0.781,0.67,0.909
POS,1.0,1.0,0.987,1.014
CCPurc,1.0,0.989,0.914,1.071
CCBal,1.0,1.0,1.0,1.0
Inv,1.0,1.552,1.281,1.886
InArea,1.0,0.957,0.813,1.127
Age,1.0,0.999,0.997,1.002
MICRScor,1.0,0.962,0.776,1.189
Income,1.0,1.001,0.999,1.002

Model Information,Model Information.1
Data Set,WORK.TRAIN_IMPUTED_SWOE_BINS
Response Variable,Ins
Number of Response Levels,2
Model,binary logit
Optimization Technique,Fisher's scoring

0,1
Number of Observations Read,21512
Number of Observations Used,21512

Response Profile,Response Profile,Response Profile
Ordered Value,Ins,Total Frequency
1,0,14061
2,1,7451

Class Level Information,Class Level Information,Class Level Information,Class Level Information
Class,Value,Design Variables,Design Variables.1
Res,R,1,0
,S,0,0
,U,0,1

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22619.644
SC,27767.651,23026.439
-2 Log L,27757.675,22517.644

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5240.031,50,<.0001
Score,4539.5156,50,<.0001
Wald,3457.4404,50,<.0001

Analysis of Effects Removed by Fast Backward Elimination,Analysis of Effects Removed by Fast Backward Elimination,Analysis of Effects Removed by Fast Backward Elimination,Analysis of Effects Removed by Fast Backward Elimination,Analysis of Effects Removed by Fast Backward Elimination,Analysis of Effects Removed by Fast Backward Elimination,Analysis of Effects Removed by Fast Backward Elimination
Effect Removed,Chi-Square,DF,Pr > ChiSq,Residual Chi-Square,DF.1,Pr > Residual ChiSq
POS,0.0017,1,0.9673,0.0017,1,0.9673
Res,0.3178,2,0.8531,0.3195,3,0.9563
CCPurc,0.0713,1,0.7895,0.3907,4,0.9832
MICRScor,0.1243,1,0.7244,0.5151,5,0.9916
Age,0.2614,1,0.6091,0.7765,6,0.9927
InArea,0.2819,1,0.5954,1.0584,7,0.9938
Income,1.0045,1,0.3162,2.0629,8,0.979
Phone,3.4393,1,0.0637,5.5023,9,0.7885
CCBal,4.2769,1,0.0386,9.7791,10,0.4601
MM*IRABal,8.6585,1,0.0033,18.4377,11,0.072

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22619.337
SC,27767.651,22930.415
-2 Log L,27757.675,22541.337

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5216.3382,38,<.0001
Score,4520.7969,38,<.0001
Wald,3448.4351,38,<.0001

Residual Chi-Square Test,Residual Chi-Square Test,Residual Chi-Square Test
Chi-Square,DF,Pr > ChiSq
28.3337,12,0.0049

Summary of Backward Elimination,Summary of Backward Elimination,Summary of Backward Elimination,Summary of Backward Elimination,Summary of Backward Elimination,Summary of Backward Elimination,Summary of Backward Elimination
Step,Effect Removed,DF,Number In,Wald Chi-Square,Pr > ChiSq,Variable Label
1,POS,1,48,0.0017,0.9673,Number Point of Sale
1,Res,2,47,0.3178,0.8531,Area Classification
1,CCPurc,1,46,0.0713,0.7895,Credit Card Purchases
1,MICRScor,1,45,0.1243,0.7244,
1,Age,1,44,0.2614,0.6091,Age
1,InArea,1,43,0.2819,0.5954,Local Address
1,Income,1,42,1.0045,0.3162,Income
1,Phone,1,41,3.4393,0.0637,Number Telephone Banking
1,CCBal,1,40,4.2769,0.0386,Credit Card Balance
1,MM*IRABal,1,39,8.6585,0.0033,

Type 3 Analysis of Effects,Type 3 Analysis of Effects,Type 3 Analysis of Effects,Type 3 Analysis of Effects
Effect,DF,Wald Chi-Square,Pr > ChiSq
SavBal,1,298.8926,<.0001
Dep,1,4.2465,0.0393
DDA,1,8.3313,0.0039
CD,1,222.9731,<.0001
Sav,1,121.9672,<.0001
CC,1,5.1065,0.0238
ATM,1,0.2032,0.6522
MM,1,142.6023,<.0001
branch_swoe,1,128.1414,<.0001
IRA,1,32.2143,<.0001

Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates
Parameter,DF,Estimate,Standard Error,Wald Chi-Square,Pr > ChiSq
Intercept,1,-2.0561,0.0996,426.2679,<.0001
SavBal,1,0.000166,9.605e-06,298.8926,<.0001
Dep,1,-0.0517,0.0251,4.2465,0.0393
DDA,1,-0.181,0.0627,8.3313,0.0039
CD,1,1.1427,0.0765,222.9731,<.0001
Sav,1,1.0092,0.0914,121.9672,<.0001
CC,1,0.1103,0.0488,5.1065,0.0238
ATM,1,0.028,0.0622,0.2032,0.6522
MM,1,1.7524,0.1467,142.6023,<.0001
branch_swoe,1,0.9132,0.0807,128.1414,<.0001

Association of Predicted Probabilities and Observed Responses,Association of Predicted Probabilities and Observed Responses.1,Association of Predicted Probabilities and Observed Responses.2,Association of Predicted Probabilities and Observed Responses.3
Percent Concordant,78.9,Somers' D,0.577
Percent Discordant,21.1,Gamma,0.578
Percent Tied,0.0,Tau-a,0.261
Pairs,104768511.0,c,0.789

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Effect,Unit,Estimate,95% Confidence Limits,95% Confidence Limits.1
ILS,1.0,0.772,0.663,0.898
Inv,1.0,1.53,1.263,1.857

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Odds Ratio,Estimate,95% Confidence Limits,95% Confidence Limits.1
B_DDABal at SavBal=0 Sav=0.4684 MM=0.1137 IRA=0.0529 ATMAmt=1252.6,1.02,1.019,1.021
B_DDABal at SavBal=1211 Sav=0.4684 MM=0.1137 IRA=0.0529 ATMAmt=1252.6,1.018,1.017,1.019
B_DDABal at SavBal=52299 Sav=0.4684 MM=0.1137 IRA=0.0529 ATMAmt=1252.6,0.935,0.923,0.946

Model Information,Model Information.1
Data Set,WORK.TRAIN_IMPUTED_SWOE_BINS
Response Variable,Ins
Number of Response Levels,2
Model,binary logit
Optimization Technique,Fisher's scoring

0,1
Number of Observations Read,21512
Number of Observations Used,21512

Response Profile,Response Profile,Response Profile
Ordered Value,Ins,Total Frequency
1,0,14061
2,1,7451

Regression Models Selected by Score Criterion,Regression Models Selected by Score Criterion,Regression Models Selected by Score Criterion
Number of Variables,Score Chi-Square,Variables Included in Model
1,1872.2915,B_DDABal
2,2490.9205,CD B_DDABal
3,2935.4526,SavBal CD B_DDABal
4,3240.0673,SavBal CD B_DDABal SavBal*B_DDABal
5,3501.6203,SavBal CD branch_swoe B_DDABal SavBal*B_DDABal
6,3731.896,SavBal CD Sav MM branch_swoe B_DDABal
7,3952.2174,SavBal CD Sav MM branch_swoe B_DDABal SavBal*B_DDABal
8,4039.4342,SavBal Dep CD Sav MM branch_swoe B_DDABal SavBal*B_DDABal
9,4086.6501,SavBal Dep CD Sav MM branch_swoe IRA B_DDABal SavBal*B_DDABal
10,4129.3415,SavBal CD Sav MM branch_swoe IRA B_DDABal SavBal*B_DDABal DDA*ATMAmt Dep*ATM

Obs,model,AUC,AIC,BIC,MisClass,AdjRSquare,BrierScore
1,35,0.788365,50279.66,50566.81,0.3398,0.354277,0.30715
2,30,0.787561,50321.71,50568.98,0.34,0.352545,0.307317
3,33,0.788091,50299.66,50570.86,0.3399,0.353479,0.307291
4,36,0.788464,50275.9,50571.03,0.3398,0.354468,0.307156
5,28,0.787377,50340.06,50571.38,0.3399,0.3518,0.307333
6,27,0.787237,50348.13,50571.47,0.3396,0.351464,0.307294
7,37,0.788533,50271.74,50574.84,0.3397,0.354673,0.307166
8,34,0.788171,50295.93,50575.11,0.3399,0.353669,0.307297
9,32,0.788012,50313.5,50576.72,0.3399,0.352952,0.307405
10,26,0.787156,50364.17,50579.53,0.3397,0.350862,0.307379

Variables Included in Model
SavBal DDA CD Sav MM branch_swoe IRA IRABal B_DDABal ATMAmt ILS NSF SDB CCBal Inv SavBal*B_DDABal MM*B_DDABal branch_swoe*ATMAmt Sav*B_DDABal SavBal*SDB SavBal*DDA ATMAmt*DepAmt B_DDABal*ATMAmt SavBal*ATMAmt SavBal*IRA SavBal*MM SavBal*CC Sav*NSF DDA*ATMAmt Dep*ATM IRA*B_DDABal CD*MM MM*IRABal CD*Sav Sav*CC


In [4]:
title1 "Variables with Missing Values on the Validation Data Set";
proc means data=work.valid nmiss;
   var SavBal DDA CD Sav MM IRA IRABal ATMAmt ILS NSF SDB CCBal Inv
       DepAmt Dep ATM CC;
run;

proc univariate data=work.train_imputed_swoe_bins noprint;
   var cc ccbal inv;
   output out=work.medians
          pctlpts=50
          pctlpre=cc ccbal inv;
run;

data work.valid_imputed_swoe_bins(drop=cc50 ccbal50 inv50 i);
   if _N_=1 then set work.medians;
   set work.valid;
   array x(*) cc ccbal inv;
   array med(*) cc50 ccbal50 inv50;
   do i=1 to dim(x);
      if x(i)=. then x(i)=med(i);
   end;
   %include brswoe;
   if not dda then ddabal=&mean;
   %include rank;
run;

Variable,Label,N Miss
SavBal DDA CD Sav MM IRA IRABal ATMAmt ILS NSF SDB CCBal Inv DepAmt Dep ATM CC,Saving Balance Checking Account Certificate of Deposit Saving Account Money Market Retirement Account IRA Balance ATM Withdrawal Amount Installment Loan Number Insufficient Fund Safety Deposit Box Credit Card Balance Investment Amount Deposited Checking Deposits ATM Credit Card,0 0 0 0 0 0 0 0 0 0 0 1350 1350 0 0 0 1350


### 4.2 Common Metrics for Model Performance

You know the importance of doing an honest assessment of your model on a validation data set. Now you need to decide which model performance measures you will 

* describe several model performance measures
* adjust the confusion matrix for oversampling
* create ROC curves, gains charts, and lift charts on the validation data set

#### Understanding the Confusion Matrix
Typically, you use predictive models to categorize cases using a cutoff (or decision boundary). For example, suppose you're fitting a model to predict fraud. If the probability of fraud is above the cutoff, you'll investigate it. If it's below the cutoff, you will not investigate it. For a given cutoff, you need to assess how well your predictive model performs. The fundamental assessment tool for model performance is the confusion matrix. The confusion matrix is simply a cross tabulation of predicted classes and actual classes. It quantifies the confusion of the classifier. To create a confusion matrix, you need to know the cutoff you are using. Later in this lesson, you learn a formula for determining an optimal cutoff.

Let's take a look at the structure of a confusion matrix. The column on the left represents the cases with a predicted class of 0—in other words, the cases with a predicted probability that is below the cutoff. The column on the right represents the cases with a predicted class of 1—in other words, the cases with a predicted probability above the cutoff. The top row represents the cases with the actual class of 0—in other words, the cases that have an actual target variable value of 0, meaning the target event did not occur. The bottom row represents the cases with an actual class of 1—in other words, the cases that have an actual target variable value of 1, meaning the target event did occur. The target event might be unfavorable (like fraud or churn) or favorable (like a response to an offer). Whether it is unfavorable or favorable, the target event is typically called a positive when classified as 1. It is called a negative when it is classified as 0. So, a case with a predicted class of 0 is a predicted negative; a case with a predicted class of 1 is a predicted positive; a case with an actual class of 0 is an actual negative; and a case with an actual class of 1 is an actual positive.

Let's use the target marketing business scenario at the bank to explore how the confusion matrix works. Each case will have one of four possible outcomes: The customer has a predicted class of 1 and the customer actually did buy the variable annuity. The case is placed in the bottom, right quadrant and is considered to be a true positive. The customer has a predicted class of 1 but did not buy a variable annuity. The case is placed in the top, right quadrant and is called a false positive. The customer has a predicted class of 0 and the customer actually did not buy a variable annuity. The case is placed in the top, left quadrant and is considered a true negative. The customer has a predicted class of 0 but actually did buy a variable annuity. The case is placed in the bottom, left quadrant and is considered a false negative.

You can use the confusion matrix to calculate statistics that measure model performance. Some of the most common statistics are accuracy, error rate, sensitivity, positive predicted value, specificity, and negative predicted value. The simplest statistics for measuring the performance of a model are accuracy and error rate. The accuracy of the model equals the number of true positives plus the number of true negatives over the total number of cases. The error rate of the model equals the number of false positives plus the number of false negatives over the total number of cases. That is equivalent to 1 − the accuracy. Two specialized measures of classifier model performance are sensitivity and positive predicted value (also called PV+). You calculate sensitivity by dividing the true positives by the total actual positives. You calculate positive predicted value by dividing the true positives by the total predicted positives. Specificity and negative predicted value (also called PV-) are measurements involving true negatives. You calculate specificity by dividing the true negatives by the total actual negatives. You calculate negative predicted value by dividing the true negatives by the total predicted negatives.

You might be wondering which of these measures will be the most useful to you. The answer depends on the context of your problem. For example, if you are working on a target marketing project, you might be most concerned with getting a high positive predicted value because you want to maximize the response rate for customers who receive an offer. However, suppose you are working on a fraud detection project. In this case, you might be most concerned with obtaining high sensitivity such that the model is detecting a high proportion of the actual fraud cases.

#### Measuring Performance across Cutoffs by Using the ROC Curve
You know how to use model performance measures, like sensitivity and specificity, to calculate model performance for a specified cutoff. However, what if you don't know which cutoff to use? You can measure model performance across all cutoffs by using the receiver-operating characteristic curve, also called the ROC curve. The ROC curve was adapted from signal detection theory for the assessment of classifiers. The ROC curve displays the sensitivity (also known as the true positive rate) and 1 minus specificity (also known as the false positive rate) for the entire range of cutoffs.

Let's look at the structure of an ROC curve and how it works. On the ROC curve graph, sensitivity is on the Y axis, and 1 minus specificity is on the X axis. Remember that sensitivity equals the true positives divided by the total actual positives. Suppose you have 1,000 actual positives. If you have zero true positives, then you have a sensitivity of 0. If all 1,000 of the actual positives are also true positives, then you have a sensitivity of 1. This is why the Y axis goes from 0 to 1.

Now, let's look at the X axis, which represents 1 minus the specificity. Remember that specificity equals the true negatives divided by the total actual negatives. Suppose you have 1,000 actual negatives. If you have zero true negatives, then you have a specificity of 0. One minus a specificity of 0 equals 1, so a specificity of 0 is located at 1 on the X axis. If all 1,000 of the actual negatives are also true negatives, then you have a specificity of 1. One minus a specificity of 1 equals 0, so a specificity of 1 is located at 0 on the X axis. This is why the X axis goes from 0 to 1. Remember that the ROC curve displays the sensitivity and 1 minus specificity for the entire range of cutoff values.

You know that predicted probabilities are constrained to be between 0 and 1, so the range of all possible cutoff values is 0 to 1. At the lowest possible cutoff of 0, all cases have a predicted probability above the cutoff, so all cases are in predicted class 1. Because all of the actual positives are also true positives in this case, the sensitivity is 1. Because there are zero true negatives in this case, specificity is 0. 1 minus a specificity of 0 is 1, so the lowest possible cutoff of 0 is located at coordinate (1,1). At the highest possible cutoff of 1, all cases have a predicted probability below the cutoff, so all cases are in predicted class 0. Because there are zero true positives in this case, the sensitivity is 0. Because all of the actual negatives are also true negatives in this case, specificity is 1. One minus a specificity of 1 is 0, so the highest possible cutoff of 1 is located at coordinate (0,0). As the cutoff decreases, more cases are allocated to class 1, the sensitivity increases, and the specificity decreases. As the cutoff increases, more cases are allocated to class 0, the sensitivity decreases, and the specificity increases. If the posterior probabilities were arbitrarily assigned to the cases, then the ratio of false positives to true positives would be the same as the ratio of the total actual negatives to the total actual positives. A baseline displaying a 45-degree angle from (0,0) to (1,1) represents a random model for which the area under the curve is 0.50, which is represented by the c statistic. This baseline represents the accuracy of a model that predicts classes no better than flipping a coin. The more the ROC curve bulges from the 45-degree line, the more accurately the model predicts compared to a random model. If you had a model that predicted perfectly, its ROC curve would reach the (0,1) point. You want your model's ROC curve to bulge as much as possible toward (0,1).

#### Choosing Depth by Using the Gains Chart
One widely used measure of model performance is the gains chart. The cumulative gains chart displays the positive predicted value on the Y axis and the depth on the X axis. Depth equals the total percentage of cases that are allocated to class 1. The gains chart is widely used in target marketing to decide what percentage of cases to solicit—in other words, how deep in the database to go with a promotion. Cutoff values range from the highest at the far left to the lowest at the far right. If the cutoff is 1, then all cases are classified as 0, and so the depth equals 0%. If the cutoff is 0, then all cases are classified as 1, so the depth equals 100%. So, as the cutoff increases, the depth decreases. The marginal rate equals the proportion of events in the sample adjusted to the true population event rate, such as the rate of response to a promotion. If you had a random model that arbitrarily assigned posterior probabilities to cases, then the gains chart would be a horizontal line at the marginal rate. If your model's gains chart follows the marginal rate, then your model does not predict any better than a random model. The simplest way to construct this curve is to sort and bin the predicted posterior probabilities (for example, deciles). The gains chart is easily augmented with revenue and cost information.

For a model with good predictive power, the positive predicted value increases as the depth decreases, creating a curve shaped like a steep ski slope. For example, in target marketing, a gains chart with a ski slope shape shows that people with the highest probabilities have the highest response rates. A lift chart is a variation of the gains chart. The lift equals the positive predicted value divided by the marginal rate. For a given depth, there are lift times more responders targeted by the model than by random chance. For example, suppose you want to target the top 10% of cases, and your lift chart shows that you will get a lift of 4. That means that when you target the top 10%, you get four times more responders than if you took a random sample of 10% of the cases. Just like with a regular gains chart, a lift chart for a model with good predictive power has a curve shaped like a steep ski slope. This shape shows that cases with the highest predicted probabilities have the highest lift. You can use either a gains chart or a lift chart to show how well your model generalizes.

A plot of sensitivity versus depth is sometimes called a Lorenz curve, concentration curve, or a lift curve (although the lift value is not explicitly displayed).

#### Effects of Oversampled Data on Performance Measures
If you create a validation data set by splitting oversampled data, then the validation data is also a biased sample. You know that if you correct for oversampling when fitting a model, it will generate predicted posterior probabilities that are correct. But does oversampling affect model performance measures? Let's find out by looking at an example. On the left is the confusion matrix for a population and on the right is the confusion matrix for the sample. Each confusion matrix displays the proportion of true negatives, true positives, false negatives, and false positives, as well as the proportion of predicted negatives, predicted positives, actual negatives, and actual positives. For the population, the proportion of actual negatives is 97% and the proportion of actual positives is 3%, meaning that the population is made up of 97% non-events and 3% events. You oversampled the events so that the sample data is made up of 50% non-events and 50% events. Clearly, the sample is biased.

Let's calculate several performance measures to see which are affected by the bias. First, let's look at sensitivity and specificity. Remember, sensitivity equals the true positives divided by the total actual positives. For both the population and the sample, the sensitivity is the same. 2 divided by 3 for the population and 33 divided by 50 for the sample both equal 0.66. Specificity equals the true negatives divided by the total actual negatives. For both the population and the sample, the specificity is the same. 56 divided by 97 for the population and 29 divided by 50 for the sample both equal 0.58. So, oversampling does not affect sensitivity or specificity measures. Because the ROC curve relies on sensitivity and specificity, it also is not affected by oversampling.

Now, let's look at positive predicted value and negative predicted value. Remember that positive predicted value equals the true positives divided by the total predicted positives. The positive predicted values for the population and the sample are not the same. 2 divided by 43 for the population and 33 divided by 54 for the sample are not equal. The negative predicted value equals the true negatives divided by the total predicted negatives. The negative predicted value is not the same for the population and the sample. 56 divided by 57 for the population and 29 divided by 46 for the sample are not equal. As you can see, oversampling does have an effect on positive predicted values and negative predicted values. Because gains charts and lift charts rely on positive predicted values, they are also affected by oversampling. Before you create gains charts and lift charts, you need to adjust the confusion matrix for oversampling.

#### Adjusting a Confusion Matrix for Oversampling
If you oversampled, you need to adjust your confusion matrix so that it matches your population. To do this, you need to know the values for π1and π0. You also need to know the values for sensitivity (represented by Se) and specificity (represented by Sp).

To get the true proportion of true positives, you multiply π1 by the sensitivity. The true proportion of true negatives equals π0 times specificity; the true proportion of false positives equals π0 times 1 minus specificity; and the true proportion of false negatives is π1 times 1 minus sensitivity.

Notice that these adjustments are equivalent to multiplying the cell counts by their sample weights, for example,
where TP is the proportion of true positives, and sample weights are defined as πi / ρi for class i.

#### Demo: Measuring Model Performance Based on Commonly-Used Metrics
Note: The demonstrations in this course build on each other. If you want to perform this demonstration in your own SAS software and you started a new SAS session after you performed the previous demonstration, open l4_demos.sas. It contains the solution code for all demos in Lesson 1, 2, 3 and 4. Locate the code for the previous demos, review the comments to see if any modifications are needed, and then submit the code.

In this demonstration, we calculate performance measures on the validation data set for the target marketing project. We do the following: Score the validation data set using PROC LOGISTIC. Adjust the confusion matrix for oversampling using a DATA step, and generate a lift chart using PROC SGPLOT.

Let's look at the code. We have: proc logistic data=work.train_imputed_swoe_bins; model ins (event='1') equals the variables we selected in Lesson 3. Then we have a SCORE statement: score data=work.valid_imputed_swoe_bins (the data set we just created in the last demonstration) out=work.scoval. We have the PRIOREVENT= option. We set that to &pi1, and then we have an OUTROC= option.

What is this OUTROC= option? The OUTROC= option creates an output data set with Sensitivity and 1 minus Specificity calculated for a full range of cutoff probabilities. The other statistics in the OUTROC= data set are not useful when the data are oversampled. Then we're creating a data set called work.roc.

When we specify the OUTROC= option in the SCORE statement, the ROC curve for the scored data set is displayed. That's why I'm using ods select roccurve, so we'll select the ROC curve for the validation data set.

We're also using the FITSTAT option, and that's going to generate model fit statistics. And we're using the ods select scorefitstat to capture the table that shows the fit statistics.

In [6]:
ods select roccurve scorefitstat;
proc logistic data=work.train_imputed_swoe_bins;
   model ins(event='1')=&selected;
   score data=work.valid_imputed_swoe_bins out=work.scoval 
         priorevent=&pi1 outroc=work.roc fitstat;
run;

Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data
Data Set,Total Frequency,Log Likelihood,Error Rate,AIC,AICC,BIC,SC,R-Square,Max-Rescaled R-Square,AUC,Brier Score
WORK.VALID_IMPUTED_SWOE_BINS,10752,-12661.1,0.3406,25394.13,25394.38,25656.31,25656.31,0.316954,0.338919,0.78197,0.308581


So let's highlight this code and submit it. And here we have the ROC curve for the validation data set with the c statistic being 0.7820. Here, we have the fit statistics for the validation data set, and the c statistic, 0.78197, or 0.7820. Remember, the c statistic for the training data set was 0.788, which is not much higher than 0.782. **This small difference in the c statistics between the training and validation data sets indicates that this model will generalize very well to new data**.

In [7]:
title1 "Statistics in the ROC Data Set";
proc print data=work.roc(obs=10);
   var _prob_ _sensit_ _1mspec_;
run;

data work.roc;
   set work.roc;
   cutoff=_PROB_;
   specif=1-_1MSPEC_;
   tp=&pi1*_SENSIT_;
   fn=&pi1*(1-_SENSIT_);
   tn=(1-&pi1)*specif;
   fp=(1-&pi1)*_1MSPEC_;
   depth=tp+fp;
   pospv=tp/depth;
   negpv=tn/(1-depth);
   acc=tp+tn;
   lift=pospv/&pi1;
   keep cutoff tn fp fn tp 
        _SENSIT_ _1MSPEC_ specif depth
        pospv negpv acc lift;
run;

Obs,_PROB_,_SENSIT_,_1MSPEC_
1,1.0,0.000537057,0.0
2,1.0,0.000805585,0.0
3,1.0,0.001074114,0.0
4,0.99999,0.001342642,0.0
5,0.99997,0.001611171,0.0
6,0.99948,0.001879699,0.0
7,0.99896,0.002148228,0.0
8,0.9989,0.002416756,0.0
9,0.99875,0.002416756,0.000142288
10,0.99823,0.002416756,0.000284576


Now, let's print out the roc data set that we generated with the OUTROC= option. So we have a title, PROC PRINT, and a VAR statement showing the cutoff probabilities, the Sensitivity, and 1 minus Specificity.

The two variables, Sensitivity and 1 minus Specificity in the OUTROC= data set are correct whether or not the validation data are oversampled. The variable, _Prob_, is correct, provided the PRIOREVENT= option was set to &pi1, which is what we did. If the posterior probabilities were not corrected, then _Prob_ needs to be adjusted. To see the formula used for the adjustment, see Adjusting the Posterior Probabilities in the Resources section.

Knowledge of the population priors and Sensitivity and Specificity is sufficient to fill in the confusion matrices. Several additional statistics can be calculated in a DATA step.

So here, we bring in the work.roc data set. The cutoff is _Prob_. The Specificity is 1 minus (1 minus Specificity). The true positive is &pi1 times Sensitivity. False negative is &pi1 times 1 minus Sensitivity. True negative is &pi0 times Specificity. False positive is &pi0 times 1 minus Specificity.

The depth is true positives plus false positives. The positive predicted value is true positives over the depth, and negative predictive value: true negatives over 1 minus the depth. The accuracy: true positives plus true negatives, and the lift is the positive predicted value over &pi1. Keep a number of these variables and run. So let me submit the DATA step.

Now, let's create a lift chart by plotting lift against depth. We'll add a reference line at the baseline, which is a lift of 1, and restrict the focus to the region where depth is greater than 0.5% and less than 50%. Let's look at the code.

We have: proc sgplot data=work.roc; where the depth is between 0.005 and 0.50. We'll do a line plot where the Y axis is lift; the X axis is depth. The reference line is at 1. The axis being the Y axis, and the YAXIS VALUES=... I just picked 0 to 9 by 1, just to make the plot look prettier. And we have a RUN and a QUIT with a TITLE. So let's submit the code for PROC SGPLOT.

In [8]:
/* Create a lift chart */
title1 "Lift Chart for Validation Data";
proc sgplot data=work.roc;
   where 0.005 <= depth <= 0.50;
   series y=lift x=depth;
   refline 1.0 / axis=y;
   yaxis values=(0 to 9 by 1);
run; quit;
title1 ;

The lift chart shows that at a depth of 0.1, the lift would be approximately 4. There's 0.1. It's about 4. This means that if you targeted the top 10% of your customers, based on the predicted probabilities, you would get four times as many responses compared to targeting a random sample of 10%.

We've assessed our model based on commonly used metrics. Next, we'll assess the model using the profit matrix.

#### Practice: Assessing Model Performance
For the veterans' organization project, do the following:

prepare the validation data set to be scored by the model fitted on the training data set
fit a logistic model on the training data set
score the validation data set
compute model performance statistics and generate graphs on the validation data set

Write a PROC MEANS step to examine which variables in pmlr.pva_valid (the validation data set) have missing values. Use the inputs from the model fitted on the training data set. Note: Exclude Cluster_Swoe, which needs to be created, but specify the inputs involved in the interactions.

In [9]:
title1 "Variables with Missing Values on the Validation Data Set";
proc means data=pmlr.pva_valid nmiss;
   var LIFETIME_GIFT_COUNT LAST_GIFT_AMT MEDIAN_HOME_VALUE 
       FREQUENCY_STATUS_97NK PEP_STAR INCOME_GROUP 
       LIFETIME_AVG_GIFT_AMT MONTHS_SINCE_LAST_GIFT;
run;

Variable,N Miss
LIFETIME_GIFT_COUNT LAST_GIFT_AMT MEDIAN_HOME_VALUE FREQUENCY_STATUS_97NK PEP_STAR INCOME_GROUP LIFETIME_AVG_GIFT_AMT MONTHS_SINCE_LAST_GIFT,0 0 0 0 0 2229 0 0


Which input variables have missing values? **R.The results show that the input variable Income_Group has missing values.**


Write a PROC UNIVARIATE step to create a data set with the medians from pmlr.pva_train_imputed_swoe (the training data set). Name the new data set work.medians. Use the NOPRINT option in the PROC UNIVARIATE statement. Store the medians in a variable whose name is the original variable name followed by 50.

Add a PROC PRINT step to print the output data set.

In [11]:
proc univariate data=pmlr.pva_train_imputed_swoe noprint;
   var INCOME_GROUP;
   output out=work.medians
          pctlpts=50
          pctlpre=income_group;
run;

title1 "Medians for Variables with Missing Values";
proc print data=work.medians;
run;
title1;

Obs,income_group50
1,4


What are the medians for the variables with missing values? **R.As shown in the results, the median for Income_Group is 4.**


Write a DATA step that does the following:
imputes the variables with missing values using two ARRAY statements and a DO loop with index i
includes the scoring code to create the smoothed weight of evidence for Cluster_Code
performs a one-to-many merge to create the final version of the pmlr.pva_valid_imputed_swoe data set
drops the variables Income_Group50 and i

In [12]:
data pmlr.pva_valid_imputed_swoe(drop=income_group50 i);
   if _N_=1 then set work.medians;
   set pmlr.pva_valid;
   array x(*) income_group;
   array med(*) income_group50;
      do i=1 to dim(x);
         if x(i)=. then x(i)=med(i);
      end;
   %include clswoe;
run;

How many observations are in pmlr.pva_valid_imputed_swoe? **R. The log indicates that the pmlr.pva_valid_imputed_swoe data set has 9685 observations.**


Write a PROC LOGISTIC step that does the following:
fits a logistic regression model on pmlr.pva_train_imputed_swoe with Target_B as the target variable and the ex_selected macro variable (created in the previous practice) specifying the input variables
uses the EVENT= option to model the probability that Target_B=1
uses the SCORE statement to score pmlr.pva_valid_imputed_swoe with an adjustment for oversampling using the PRIOREVENT= option
uses the OUTROC= option to create a data set named work.roc with many of the statistics that are necessary for model assessment and for creating a lift chart for the validation data set
uses the FITSTAT option to generate model fit statistics

In [14]:
title1 "Training Data Set Model";
proc logistic data= pmlr.pva_train_imputed_swoe;
   model target_b(event='1')=&ex_selected;
   score data= pmlr.pva_valid_imputed_swoe priorevent=&ex_pi1
         outroc=work.roc fitstat; 
run;
title1;

Model Information,Model Information.1
Data Set,PMLR.PVA_TRAIN_IMPUTED_SWOE
Response Variable,TARGET_B
Number of Response Levels,2
Model,binary logit
Optimization Technique,Fisher's scoring

0,1
Number of Observations Read,9687
Number of Observations Used,9687

Response Profile,Response Profile,Response Profile
Ordered Value,TARGET_B,Total Frequency
1,0,7265
2,1,2422

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,10897.23,10514.106
SC,10904.409,10585.892
-2 Log L,10895.23,10494.106

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,401.124,9,<.0001
Score,405.4144,9,<.0001
Wald,382.4438,9,<.0001

Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates
Parameter,DF,Estimate,Standard Error,Wald Chi-Square,Pr > ChiSq
Intercept,1,-0.6217,0.2112,8.6698,0.0032
LIFETIME_GIFT_COUNT,1,0.0401,0.0067,35.9259,<.0001
LAST_GIFT_AMT,1,-0.0183,0.00398,21.1735,<.0001
MEDIAN_HOME_VALUE,1,9.5e-05,2.6e-05,13.4529,0.0002
FREQUENCY_STATUS_97N,1,0.172,0.0253,46.3852,<.0001
cluster_swoe,1,0.9869,0.1493,43.6931,<.0001
PEP_STAR,1,0.3248,0.0614,27.9318,<.0001
INCOME_GROUP,1,0.0471,0.0154,9.3146,0.0023
LAST_GIFT*LIFETIME_A,1,0.000167,5e-05,11.125,0.0009
LIFETIME_*MONTHS_SIN,1,-0.00211,0.000366,33.3864,<.0001

Odds Ratio Estimates,Odds Ratio Estimates,Odds Ratio Estimates,Odds Ratio Estimates
Effect,Point Estimate,95% Wald Confidence Limits,95% Wald Confidence Limits.1
MEDIAN_HOME_VALUE,1.0,1.0,1.0
FREQUENCY_STATUS_97N,1.188,1.13,1.248
cluster_swoe,2.683,2.002,3.595
PEP_STAR,1.384,1.227,1.561
INCOME_GROUP,1.048,1.017,1.08

Association of Predicted Probabilities and Observed Responses,Association of Predicted Probabilities and Observed Responses.1,Association of Predicted Probabilities and Observed Responses.2,Association of Predicted Probabilities and Observed Responses.3
Percent Concordant,63.2,Somers' D,0.263
Percent Discordant,36.8,Gamma,0.263
Percent Tied,0.0,Tau-a,0.099
Pairs,17595830.0,c,0.632

Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data
Data Set,Total Frequency,Log Likelihood,Error Rate,AIC,AICC,BIC,SC,R-Square,Max-Rescaled R-Square,AUC,Brier Score
PMLR.PVA_VALID_IMPUTED_SWOE,9685,-7444.5,0.2499,14908.97,14909,14980.76,14980.76,0.036643,0.046213,0.608916,0.223326


What is the c statistic for the validation data set? **R. In the results, the plot of the ROC curve for the validation data set shows that the c statistic for the validation data set is 0.6089.**

Using the data set created by the OUTROC= option, write a DATA step to compute the proportion of true positives, the proportion of false negatives, the proportion of true negatives, the proportion of false positives, the positive predicted value, the negative predicted value, the accuracy, the proportion allocated to class 1 (depth), and the lift.

Add a PROC SGPLOT step that creates a lift chart. Add a reference line at a lift of 1, and restrict the focus to the region where depth is greater than 0.5% and less than 50%. Restrict the Y axis from 0 to 4 by 1.

In [15]:
data work.roc;
   set work.roc;
   cutoff=_PROB_;
   specif=1-_1MSPEC_;
   tp=&ex_pi1*_SENSIT_;
   fn=&ex_pi1*(1-_SENSIT_);
   tn=(1-&ex_pi1)*specif;
   fp=(1-&ex_pi1)*_1MSPEC_;
   depth=tp+fp;
   pospv=tp/depth;
   negpv=tn/(1-depth);
   acc=tp+tn;
   lift=pospv/&ex_pi1;
   keep cutoff tn fp fn tp 
        _SENSIT_ _1MSPEC_ specif depth
        pospv negpv acc lift;
run;
title1 "Lift Chart for Validation Data";
proc sgplot data=work.roc;
   where 0.005 <= depth <= 0.50;
   series y=lift x=depth;
   refline 1.0 / axis=y;
   yaxis values=(0 to 4 by 1);
run; 
quit;
title1;

What is the lift at a depth of 10%? **R. As shown in the results, the lift at a depth of 10% is approximately 1.9.**

### 4.3 Profit-Based Metrics
You know how to generate metrics that assess how well your model performs at different cutoffs. However, statistical measures alone probably will not be the only criteria important to business decision makers. In a business setting, you will likely need to give an estimate of the profit your model can expect to generate.

* explain the decision rule that maximizes the expected profit
* explain the profit matrix and how to use it to estimate the profit per scored customer
* graph the average validation profit against different cutoffs and different depths

#### Understanding the Effect of Cutoffs on Confusion Matrices
You know that a cutoff is a decision rule for allocating cases to classes. Different cutoffs produce different classification allocations and different confusion matrices. If the goal were to increase the sensitivity of the classifier, then the optimal classifier would allocate all cases to class 1. If the goal were to increase specificity, then the optimal classifier would be to allocate all cases to class 0. Let's look at an example. Here is the response surface of a model on the probability scale showing a low cutoff, a medium cutoff, and a high cutoff. Each cutoff produces a confusion matrix that differs from the other two. Also, each cutoff has a different sensitivity and a different specificity. Higher cutoffs decrease sensitivity and increase specificity. Lower cutoffs decrease specificity and increase sensitivity.

#### Understanding the Profit Matrix
In business, you typically choose cutoffs based on profit, rather than sensitivity or specificity. The optimal cutoff maximizes the total expected profit. A formal approach to determining the optimal cutoff uses statistical decision theory (McLachlan 1992, Ripley 1996, Hand 1997). To choose the optimal cutoff, you generate a profit matrix. Let's look at an example.

Here is a profit matrix for a target marketing project. The numbers in this particular profit matrix are based on the cost and profit figures supplied by the business analyst. The average cost to send an offer is $1. That's the average cost of the marketing effort to send an offer to one potential customer. When a customer responds to an offer, the average revenue is $100. The profit matrix displays the expected profit for each true negative, false negative, false positive, and true positive. Profit equals revenue minus cost.

The column on the left side represents cases that are predicted to not respond to an offer. The business will not solicit these customers, so there is no cost and no revenue for these cases. The true negatives and false negatives will generate a profit of $0. The column on the right represents cases that are predicted to respond to an offer. These are the customers that the business will solicit. For each false positive, the customer will not respond, so the profit will be negative $1. That's the revenue, $0, minus the average cost, $1. For each true positive, the customer will respond, so the average profit will be $99. That's the average revenue, $100, minus the average cost, $1.

Notice that some businesses might choose to calculate the cost of lost opportunity for the false negatives.

#### Choosing the Optimal Cutoff by Using the Profit Matrix
Now, let's look at the confusion matrices for three cutoffs (low, medium, and high) and calculate the total profit for each. At the high cutoff, 16 cases respond to the offer and 5 do not. 16 times $99, minus 5 times $1, equals a total expected profit of $1579. At the medium cutoff, 21 cases respond to an offer and 9 do not. 21 times $99, minus 9 times $1, equals a total expected profit of $2070. At the low cutoff, 24 cases respond to the offer and 18 do not. 24 times $99, minus 18 times $1, equals a total expected profit of $2358. Of these three cutoffs, the lowest one is the best because it has the highest expected total profit. This example illustrates how you can use the profit matrix to calculate the total expected profit for a cutoff.

A typical decision rule would be that you should solicit if the expected profit for soliciting, given the posterior probability, is higher than the expected profit for ignoring the customer.

Solicit if:

     E(Profit | pi, solicit) > E(Profit | pi, do not solicit)

     pi*99 + (1-pi)*(-1) > pi*0 + (1-pi)*(0)

     99*pi -1 + pi > 0

     100*pi -1 > 0

     pi > 0.01

This cutoff of 0.01 can be used to calculate the expected profit of using this rule with the current model.

To maximize profits, you need to find the decision point that has the highest expected profit across all cutoffs. You use Bayes' rule to do this. First, let's look at the symbols used in Bayes' rule. Here is the profit matrix with symbols that represent the expected profit for each true negative (δTN), false negative (δFN), false positive (δFP), and true positive (δTP). Bayes' rule says you will make decision 1, which, in this example, means you will solicit a customer, if the predicted probability is greater than 1 divided by 1 plus the ratio of true positive profit minus false negative profit over true negative profit minus false positive profit.

Using the symbols in this profit matrix, if a customer is solicited then the expected profit is

p
(
δ
T
P
)
+
(
1
−
p
)
(
δ
F
P
)
 

where p is the true posterior probability that a case belongs to class 1. If a customer is not solicited, then the expected profit is

      
p
(
δ
F
N
)
+
(
1
−
p
)
(
δ
T
N
)
 

Therefore, the optimal rule allocates a case to class 1 if

      
p
(
δ
T
P
)
+
(
1
−
p
)
(
δ
F
P
)
>
p
(
δ
F
N
)
+
(
1
−
p
)
(
δ
T
N
)
 

otherwise, allocate the case to class 0. Solving for p gives the optimal cutoff probability. Because p must be estimated from the data, the plug-in Bayes' rule is used in practice. Consequently, the plug-in Bayes' rule might not achieve the maximum profit if the estimate of the posterior probability is poorly estimated.

      
ˆ
p
>
1
1
+
(
δ
T
P
−
δ
F
N
δ
T
N
−
δ
F
P
)
 

Now, let's use Bayes' rule to calculate the optimal cutoff for our example. The predicted probability is greater than 1 divided by 1 plus the ratio of 99 minus 0 over 0 minus negative 1. So for this example, Bayes' rule says that to maximize the expected profit, the optimal cutoff is .01. This means the optimal decision is to solicit customers with a predicted probability above .01.

      
ˆ
p
>
1
1
+
(
99
−
0
0
−
(
−
1
)
)


#### Understanding the Confusion Matrix
Typically, you use predictive models to categorize cases using a cutoff (or decision boundary). For example, suppose you're fitting a model to predict fraud. If the probability of fraud is above the cutoff, you'll investigate it. If it's below the cutoff, you will not investigate it. For a given cutoff, you need to assess how well your predictive model performs. The fundamental assessment tool for model performance is the confusion matrix. The confusion matrix is simply a cross tabulation of predicted classes and actual classes. It quantifies the confusion of the classifier. To create a confusion matrix, you need to know the cutoff you are using. Later in this lesson, you learn a formula for determining an optimal cutoff.

Let's take a look at the structure of a confusion matrix. The column on the left represents the cases with a predicted class of 0—in other words, the cases with a predicted probability that is below the cutoff. The column on the right represents the cases with a predicted class of 1—in other words, the cases with a predicted probability above the cutoff. The top row represents the cases with the actual class of 0—in other words, the cases that have an actual target variable value of 0, meaning the target event did not occur. The bottom row represents the cases with an actual class of 1—in other words, the cases that have an actual target variable value of 1, meaning the target event did occur. The target event might be unfavorable (like fraud or churn) or favorable (like a response to an offer). Whether it is unfavorable or favorable, the target event is typically called a positive when classified as 1. It is called a negative when it is classified as 0. So, a case with a predicted class of 0 is a predicted negative; a case with a predicted class of 1 is a predicted positive; a case with an actual class of 0 is an actual negative; and a case with an actual class of 1 is an actual positive.

Let's use the target marketing business scenario at the bank to explore how the confusion matrix works. Each case will have one of four possible outcomes: The customer has a predicted class of 1 and the customer actually did buy the variable annuity. The case is placed in the bottom, right quadrant and is considered to be a true positive. The customer has a predicted class of 1 but did not buy a variable annuity. The case is placed in the top, right quadrant and is called a false positive. The customer has a predicted class of 0 and the customer actually did not buy a variable annuity. The case is placed in the top, left quadrant and is considered a true negative. The customer has a predicted class of 0 but actually did buy a variable annuity. The case is placed in the bottom, left quadrant and is considered a false negative.

You can use the confusion matrix to calculate statistics that measure model performance. Some of the most common statistics are accuracy, error rate, sensitivity, positive predicted value, specificity, and negative predicted value. The simplest statistics for measuring the performance of a model are accuracy and error rate. The accuracy of the model equals the number of true positives plus the number of true negatives over the total number of cases. The error rate of the model equals the number of false positives plus the number of false negatives over the total number of cases. That is equivalent to 1 − the accuracy. Two specialized measures of classifier model performance are sensitivity and positive predicted value (also called PV+). You calculate sensitivity by dividing the true positives by the total actual positives. You calculate positive predicted value by dividing the true positives by the total predicted positives. Specificity and negative predicted value (also called PV-) are measurements involving true negatives. You calculate specificity by dividing the true negatives by the total actual negatives. You calculate negative predicted value by dividing the true negatives by the total predicted negatives.

You might be wondering which of these measures will be the most useful to you. The answer depends on the context of your problem. For example, if you are working on a target marketing project, you might be most concerned with getting a high positive predicted value because you want to maximize the response rate for customers who receive an offer. However, suppose you are working on a fraud detection project. In this case, you might be most concerned with obtaining high sensitivity such that the model is detecting a high proportion of the actual fraud cases.

#### Measuring Performance across Cutoffs by Using the ROC Curve
You know how to use model performance measures, like sensitivity and specificity, to calculate model performance for a specified cutoff. However, what if you don't know which cutoff to use? You can measure model performance across all cutoffs by using the receiver-operating characteristic curve, also called the ROC curve. The ROC curve was adapted from signal detection theory for the assessment of classifiers. The ROC curve displays the sensitivity (also known as the true positive rate) and 1 minus specificity (also known as the false positive rate) for the entire range of cutoffs.

Let's look at the structure of an ROC curve and how it works. On the ROC curve graph, sensitivity is on the Y axis, and 1 minus specificity is on the X axis. Remember that sensitivity equals the true positives divided by the total actual positives. Suppose you have 1,000 actual positives. If you have zero true positives, then you have a sensitivity of 0. If all 1,000 of the actual positives are also true positives, then you have a sensitivity of 1. This is why the Y axis goes from 0 to 1.

Now, let's look at the X axis, which represents 1 minus the specificity. Remember that specificity equals the true negatives divided by the total actual negatives. Suppose you have 1,000 actual negatives. If you have zero true negatives, then you have a specificity of 0. One minus a specificity of 0 equals 1, so a specificity of 0 is located at 1 on the X axis. If all 1,000 of the actual negatives are also true negatives, then you have a specificity of 1. One minus a specificity of 1 equals 0, so a specificity of 1 is located at 0 on the X axis. This is why the X axis goes from 0 to 1. Remember that the ROC curve displays the sensitivity and 1 minus specificity for the entire range of cutoff values.

You know that predicted probabilities are constrained to be between 0 and 1, so the range of all possible cutoff values is 0 to 1. At the lowest possible cutoff of 0, all cases have a predicted probability above the cutoff, so all cases are in predicted class 1. Because all of the actual positives are also true positives in this case, the sensitivity is 1. Because there are zero true negatives in this case, specificity is 0. 1 minus a specificity of 0 is 1, so the lowest possible cutoff of 0 is located at coordinate (1,1). At the highest possible cutoff of 1, all cases have a predicted probability below the cutoff, so all cases are in predicted class 0. Because there are zero true positives in this case, the sensitivity is 0. Because all of the actual negatives are also true negatives in this case, specificity is 1. One minus a specificity of 1 is 0, so the highest possible cutoff of 1 is located at coordinate (0,0). As the cutoff decreases, more cases are allocated to class 1, the sensitivity increases, and the specificity decreases. As the cutoff increases, more cases are allocated to class 0, the sensitivity decreases, and the specificity increases. If the posterior probabilities were arbitrarily assigned to the cases, then the ratio of false positives to true positives would be the same as the ratio of the total actual negatives to the total actual positives. A baseline displaying a 45-degree angle from (0,0) to (1,1) represents a random model for which the area under the curve is 0.50, which is represented by the c statistic. This baseline represents the accuracy of a model that predicts classes no better than flipping a coin. The more the ROC curve bulges from the 45-degree line, the more accurately the model predicts compared to a random model. If you had a model that predicted perfectly, its ROC curve would reach the (0,1) point. You want your model's ROC curve to bulge as much as possible toward (0,1).

#### Choosing Depth by Using the Gains Chart
One widely used measure of model performance is the gains chart. The cumulative gains chart displays the positive predicted value on the Y axis and the depth on the X axis. Depth equals the total percentage of cases that are allocated to class 1. The gains chart is widely used in target marketing to decide what percentage of cases to solicit—in other words, how deep in the database to go with a promotion. Cutoff values range from the highest at the far left to the lowest at the far right. If the cutoff is 1, then all cases are classified as 0, and so the depth equals 0%. If the cutoff is 0, then all cases are classified as 1, so the depth equals 100%. So, as the cutoff increases, the depth decreases. The marginal rate equals the proportion of events in the sample adjusted to the true population event rate, such as the rate of response to a promotion. If you had a random model that arbitrarily assigned posterior probabilities to cases, then the gains chart would be a horizontal line at the marginal rate. If your model's gains chart follows the marginal rate, then your model does not predict any better than a random model. The simplest way to construct this curve is to sort and bin the predicted posterior probabilities (for example, deciles). The gains chart is easily augmented with revenue and cost information.

For a model with good predictive power, the positive predicted value increases as the depth decreases, creating a curve shaped like a steep ski slope. For example, in target marketing, a gains chart with a ski slope shape shows that people with the highest probabilities have the highest response rates. A lift chart is a variation of the gains chart. The lift equals the positive predicted value divided by the marginal rate. For a given depth, there are lift times more responders targeted by the model than by random chance. For example, suppose you want to target the top 10% of cases, and your lift chart shows that you will get a lift of 4. That means that when you target the top 10%, you get four times more responders than if you took a random sample of 10% of the cases. Just like with a regular gains chart, a lift chart for a model with good predictive power has a curve shaped like a steep ski slope. This shape shows that cases with the highest predicted probabilities have the highest lift. You can use either a gains chart or a lift chart to show how well your model generalizes.

A plot of sensitivity versus depth is sometimes called a Lorenz curve, concentration curve, or a lift curve (although the lift value is not explicitly displayed).

#### Effects of Oversampled Data on Performance Measures
If you create a validation data set by splitting oversampled data, then the validation data is also a biased sample. You know that if you correct for oversampling when fitting a model, it will generate predicted posterior probabilities that are correct. But does oversampling affect model performance measures? Let's find out by looking at an example. On the left is the confusion matrix for a population and on the right is the confusion matrix for the sample. Each confusion matrix displays the proportion of true negatives, true positives, false negatives, and false positives, as well as the proportion of predicted negatives, predicted positives, actual negatives, and actual positives. For the population, the proportion of actual negatives is 97% and the proportion of actual positives is 3%, meaning that the population is made up of 97% non-events and 3% events. You oversampled the events so that the sample data is made up of 50% non-events and 50% events. Clearly, the sample is biased.

Let's calculate several performance measures to see which are affected by the bias. First, let's look at sensitivity and specificity. Remember, sensitivity equals the true positives divided by the total actual positives. For both the population and the sample, the sensitivity is the same. 2 divided by 3 for the population and 33 divided by 50 for the sample both equal 0.66. Specificity equals the true negatives divided by the total actual negatives. For both the population and the sample, the specificity is the same. 56 divided by 97 for the population and 29 divided by 50 for the sample both equal 0.58. So, oversampling does not affect sensitivity or specificity measures. Because the ROC curve relies on sensitivity and specificity, it also is not affected by oversampling.

Now, let's look at positive predicted value and negative predicted value. Remember that positive predicted value equals the true positives divided by the total predicted positives. The positive predicted values for the population and the sample are not the same. 2 divided by 43 for the population and 33 divided by 54 for the sample are not equal. The negative predicted value equals the true negatives divided by the total predicted negatives. The negative predicted value is not the same for the population and the sample. 56 divided by 57 for the population and 29 divided by 46 for the sample are not equal. As you can see, oversampling does have an effect on positive predicted values and negative predicted values. Because gains charts and lift charts rely on positive predicted values, they are also affected by oversampling. Before you create gains charts and lift charts, you need to adjust the confusion matrix for oversampling.

#### Adjusting a Confusion Matrix for Oversampling
If you oversampled, you need to adjust your confusion matrix so that it matches your population. To do this, you need to know the values for π1and π0. You also need to know the values for sensitivity (represented by Se) and specificity (represented by Sp).

To get the true proportion of true positives, you multiply π1 by the sensitivity. The true proportion of true negatives equals π0 times specificity; the true proportion of false positives equals π0 times 1 minus specificity; and the true proportion of false negatives is π1 times 1 minus sensitivity.

Notice that these adjustments are equivalent to multiplying the cell counts by their sample weights, for example,
where TP is the proportion of true positives, and sample weights are defined as πi / ρi for class i.

#### Demo: Measuring Model Performance Based on Commonly-Used Metrics
Note: The demonstrations in this course build on each other. If you want to perform this demonstration in your own SAS software and you started a new SAS session after you performed the previous demonstration, open l4_demos.sas. It contains the solution code for all demos in Lesson 1, 2, 3 and 4. Locate the code for the previous demos, review the comments to see if any modifications are needed, and then submit the code.

In this demonstration, we calculate performance measures on the validation data set for the target marketing project. We do the following: Score the validation data set using PROC LOGISTIC. Adjust the confusion matrix for oversampling using a DATA step, and generate a lift chart using PROC SGPLOT.

Let's look at the code. We have: proc logistic data=work.train_imputed_swoe_bins; model ins (event='1') equals the variables we selected in Lesson 3. Then we have a SCORE statement: score data=work.valid_imputed_swoe_bins (the data set we just created in the last demonstration) out=work.scoval. We have the PRIOREVENT= option. We set that to &pi1, and then we have an OUTROC= option.

What is this OUTROC= option? The OUTROC= option creates an output data set with Sensitivity and 1 minus Specificity calculated for a full range of cutoff probabilities. The other statistics in the OUTROC= data set are not useful when the data are oversampled. Then we're creating a data set called work.roc.

When we specify the OUTROC= option in the SCORE statement, the ROC curve for the scored data set is displayed. That's why I'm using ods select roccurve, so we'll select the ROC curve for the validation data set.

We're also using the FITSTAT option, and that's going to generate model fit statistics. And we're using the ods select scorefitstat to capture the table that shows the fit statistics.

#### Using the Central Cutoff
Of course, you must have the cost and profit information for your business problem to calculate Bayes' rule. In many situations, gathering profit information can be difficult. When this is the case, many business analysts use a cutoff of π1. The central cutoff of π1 tends to maximize the mean of sensitivity and specificity. Because increasing sensitivity usually corresponds to decreasing specificity, the central cutoff tends to equalize sensitivity and specificity.

If separate samples were taken with equal allocation (50% events and 50% nonevents), then using the unadjusted cutoff of 0.5 on the biased sample is equivalent to using the central cutoff, π1, on the population.

#### Using Profit to Assess Fit
If you have a profit matrix, you can also generate an empirical profit plot. To do this, you create a plot in which average profit is on the Y axis and depth is on the X axis. A profit plot is a simple way to illustrate how deep to go into your sample to maximize profits. In this example, the plot shows that the maximum average profit occurs when you target the top 42% of the cases. Using total or average profit as an assessment statistic might skirt those issues.

#### Calculating Sampling Weights
When your sample is not representative, you have learned how to adjust the model to reflect the population. Another technique is to adjust the data by computing sampling weights. When a rare target event is oversampled, class 0 is under-represented in the sample. Consequently, a class-0 case should actually count more in the analysis than a class-1 case. The sampling weights adjust the number of cases in the sample to be nπ0 and nπ1, in class 0 and 1 respectively. The classes are now in the same proportion in the adjusted sample as they are in the population. In the next demonstration, you see how to calculate sampling weights to compute profits that reflect the population.


#### Using a Profit Matrix to Measure Model Performance
Note: The demonstrations in this course build on each other. If you want to perform this demonstration in your own SAS software and you started a new SAS session after you performed the previous demonstration, open l4_demos.sas. It contains the solution code for all demos in Lesson 1, 2, 3 and 4. Locate the code for the previous demos, review the comments to see if any modifications are needed, and then submit the code.

Let's measure the performance of the target marketing model by calculating profit on the scored validation data set. To calculate the profit per individual, we use a cutoff of 0.01, and the profit matrix of negative $1 for soliciting a non-responder, and $99 for soliciting a responder. In this demonstration, we do the following: Sum and average the profits using PROC MEANS, account for oversampling using the WEIGHT statement in PROC MEANS, and generate a plot of the change in average profit as a function of the solicited depth and the cutoff using PROC SGPLOT.

Let's look at the code. First, we'll create a macro variable for the proportion of events in the sample. Make it a global macro variable, called rho1. PROC SQL with a NOPRINT. Select mean(INS) into rho1, from our data set pmlr.develop. Let's submit the code.

Now we have a DATA step. We're bringing in the scored validation data set, and we're applying our sample weights.

Remember, in order to calculate total and average profit comparable to what would be achieved in the population, the weights must be calculated. Sampling weights adjust the data, so that it better represents the true population. So the sampling weight for the responders is pi1 over rho1, and the sampling weight for the non-responders is pi0 over rho0.

The decision variable is created as a flag indicating whether the predicted probability is greater than the cutoff. Using the information about decision and response, the profit per individual is calculated. So, if the probability was greater than 0.01, so we solicited a person, and it was a responder, that means that we gain $99. If we solicited the person and it was a non-responder, we lost $1.

Let's submit the DATA step. Now let's calculate the total and average profit. So we have PROC MEANS. Profit will be based on the validation data set, less for the sum and the mean. Weight sampwt. So again, we're weighting our observations, and these weights will make sure that the proportion of events in our sample is the same as the proportion of events in our population.

And we're calculating Profit. So let's submit the PROC MEANS code. And let's look at the mean. Using this model to score a population of say, a million individuals and soliciting only those with a predicted probability greater than 0.01, would yield a total expected profit of $1,249,479.10. Other models can be compared to this current model with this statistic.

To see how other cutoffs fare, let's use the information in the roc data set to draw a plot of how the average profit changes as a function of the solicited depth and the cutoff. For a plot showing the cutoff, let's restrict the plot to the region around the cutoff of 0.01. Then we bring in the data set work.roc, and we generate our average profit. So using the true positive and false positive rates calculated earlier, we can calculate the average profit for each of the cutoffs considered in the roc data set.

A false positive is an individual who is solicited, but does not respond. Hence, the cost of soliciting the false positives on average is the dollar solicitation cost times the false positive rate. Likewise, the profit associated with individuals who are solicited and respond is $99 times the true positive rate. The difference between these two terms is the average profit.

So let's submit the DATA step. Calculate the average profit, and now, let's plot the average profit against depth. We have proc sgplot data=work.roc; We'll do a line plot, where y is the average profit, x equals depth.

The yaxis label="Average Profit"; and run. Let's submit the code. The plot shows that the highest average profit would occur at a solicited depth of around 50 to 60%.

Now let's generate the average profit against the cutoffs. So we have: proc sgplot data=work.roc; where the cutoff is less than or equal to 0.05.

Again, we're restricting the plot region around the cutoff of 0.01. We'll have a reference line around 0.01, where the axis is the X axis, and we'll do a line plot, y equals the average profit, x equals cutoff, with the yaxis label of average profit. Let's submit the PROC SGPLOT code, and the plot shows that the highest average profit would occur at a cutoff around 0.011, right about there.

In [18]:
/* Add the decision variable    */
/* (based on the profit matrix) */
/* and calculate profit         */ 
%global rho1;
proc SQL noprint;
  select mean(INS) into :rho1 from pmlr.develop;
quit;

data work.scoval;
   set work.scoval;
   sampwt=(&pi1/&rho1)*(INS) 
            + ((1-&pi1)/(1-&rho1))*(1-INS);
   decision=(p_1 > 0.01);
   profit=decision*INS*99
            - decision*(1-INS)*1;
run;

/* Calculate total and average profit */

title1 "Total and Average Profit";
proc means data=work.scoval sum mean;
   weight sampwt;
   var profit;
run;

/* Investigate the true positive and */
/* false positive rates              */
data work.roc;
   set work.roc;
   AveProf=99*tp - 1*fp;
run;

title1 "Average Profit Against Depth";
proc sgplot data=work.roc;
   series y=aveProf x=depth;
   yaxis label="Average Profit";
run;

title1 "Average Profit Against Cutoff";
proc sgplot data=work.roc;
   where cutoff le 0.05;
   refline .01 / axis=x;
   series y=aveProf x=cutoff;
   yaxis label="Average Profit";
run;



Analysis Variable : profit,Analysis Variable : profit
Sum,Mean
13434.53,1.2494791


#### 4.4 Kolmogorov-Smirnov Statistic

You already know how to calculate several measures of your model's performance. Another measure is the Kolmogorov-Smirnov statistic, which is commonly called the K-S statistic. The K-S statistic examines the overall predictive power of a model. It is widely used in modeling for financial services.

* Explain the purpose of the Kolmogorov-Smirnov statistic
* compute the Kolmogorov-Smirnov statistic in PROC NPAR1WAY

#### Plotting Class Separation
In order to assess the overall predictive power of a model, you need to look at how well the model discriminates between events and non-events. Let's look at an example. Here is a class separation graph showing smoothed probability density functions on the Y axis and posterior probability on the X axis. The lines on the graph represent the distributions for class 0 (the actual non-events) and class 1 (the actual events). As you can see, the distributions overlap quite a lot. When the distributions overlap a lot, it means that the model does a poor job of discriminating between events and non-events. In other words, this model has low overall predictive power. The more the distributions overlap, the weaker the model is.

Now let's look at a class separation graph for another model. The graph shows that there is quite a difference between the distributions for the non-events and the events, which means that this model has much higher overall predictive power than the first model. So, what test do you use to measure the differences in class distributions? A common test for comparing two distributions is the t test. However, the t test has parametric assumptions, and one of the assumptions is normality. Because posterior probabilities are unlikely to have a normal distribution, the t test is inappropriate. Fortunately, there are other tests that you can use when distributions are not normal.

Note: The simplest statistics are based on the difference between the means of the two distributions. In credit scoring, the divergence statistic is a scaled difference between the means (Nelson 1997). Hand (1997) discusses several summary measures based on the difference between the means. Many other two-sample tests have been devised for nonnormal distributions (Conover 1980).

#### Assessing Overall Predictive Power
The Kolmogorov-Smirnov two-sample test is commonly used to assess how well a model distinguishes between events and non-events. It is based on the distance between the empirical cumulative distribution functions (Conover 1980). The Kolmogorov-Smirnov test produces a value called the K-S statistic. Use of the K-S statistic for comparing predictive models is popular in credit risk modeling. On the left is the class separation graph for a model with a smoothed probability density function on the Y axis. The same classification model is plotted in the empirical cumulative distributions graph on the right, where the Y axis represents the proportion of observations that have predicted probabilities below a given posterior probability. For instance, for class 1, when x=0.5, y=0.85. This means that 85% of the events have predicted probabilities below 0.5. In contrast, for class 0, when x=0.5, y=0.98. This means that 98% of the non-events have predicted probabilities below 0.5. Of course, both lines start at (0,0) because none of the cases have predicted probabilities below 0, and both lines end at (1,1) because all of the cases have predicted probabilities at or below 1.

The K-S statistic that we are interested in is called the D statistic in the NPAR1WAY procedure. The D statistic equals the maximum vertical difference between the cumulative distributions for class 0 and class 1. In this example, D=0.49. The higher the value of D is, the better the model distinguishes between events and non-events. You might wonder, "How do I know if my model's K-S statistic is high enough?" The interpretation of the strength of the D statistic is context specific. In some situations, a model with a D statistic of 0.5 is considered adequate. In other situations, a D value of 0.3 might be good enough. The K-S test is concerned with the shape, variance, and central tendency of a distribution. However, when you assess the predictive power of a model, differences in central tendency are more important than differences in shape and variance.

Although widely used, it turns out that the K-S statistic is not the most powerful test for differences in location. The Wilcoxon-Mann-Whitney test statistic has more power than the K-S statistic in its ability to measure differences in central tendency. So, how do you calculate this test statistic? Remarkably, the value of the Wilcoxon-Mann-Whitney test statistic is equivalent to that of the c statistic (Hand 1997). The c statistic and the K-S statistic are both unaffected when you oversample the rare events. This is because the empirical cumulative distribution function is unchanged if each case represents more than one case in the population. Also, when you use the central cutoff, π1, you maximize the D statistic.

The Wilcoxon version of this popular two-sample test is based on the ranks of the data. In the predictive modeling context, the predicted posterior probabilities would be ranked from smallest to largest. The test statistic is based on the sum of the ranks in the classes. The area under the ROC curve, c, can be determined from the rank-sum in class 1.

      
c
=
n
1
∑
{
i
|
y
=
1
}
 
R
i
−
1
/
2
n
1
(
n
1
+
1
)
n
1
⋅
n
0
 
The first term in the numerator is the sum of the ranks in class 1.

A perfect ROC curve would be a horizontal line at one—that is, sensitivity and specificity would both equal one for all cutoffs. In this case, the c statistic would equal one. The c statistic technically ranges from zero to one, but in practice, it should not get much lower than one-half. A perfectly random model, where the posterior probabilities were assigned arbitrarily, would give a 45° angle straight ROC curve that intersects the origin. Hence, it would give a c statistic of 0.5.

The area under the ROC curve is also equivalent to the Gini coefficient, which is used to summarize the performance of a Lorenz curve (Hand 1997).

#### Practice: Using the K-S Statistic to Measure Model Performance
Note: The demonstrations in this course build on each other. If you want to perform this demonstration in your own SAS software and you started a new SAS session after you performed the previous demonstration, open l4_demos.sas. It contains the solution code for all demos in Lesson 1, 2, 3 and 4. Locate the code for the previous demos, review the comments to see if any modifications are needed, and then submit the code.

In this demonstration, we measure the overall predictive power of the target marketing model. Using PROC NPAR1WAY, we do the following. Calculate the K-S statistic on the scored validation data set, and create an empirical distribution function plot. Let's go to the code.

We have PROC NPAR1WAY, and we have the EDF option. The EDF option in PROC NPAR1WAY requests the K-S statistic. Data equals the scored validation data set. class ins; var p_1;

So what is this test doing? It's comparing the values of the variable listed in the VAR statement, which is the predicted probabilities, between the groups listed in the CLASS statement, which is the responders and non-responders. Let's submit the PROC NPAR1WAY code.

Here, we have the first table, showing the value where the maximum separation occurred is at 0.020933, and the D statistic is 0.43. That represents the largest separation between the two cumulative distribution functions.

The empirical distribution function plot shows the proportion of observations less than a given probability for the responders and non-responders. The D statistic is the maximum, vertical difference between the two distributions, which occurs somewhere in this area at 0.020933.

In [19]:
title1 "K-S Statistic for the Validation Data Set";
proc npar1way edf data=work.scoval;
   class ins;
   var p_1;
run;


Kolmogorov-Smirnov Test for Variable P_1 Classified by Variable Ins,Kolmogorov-Smirnov Test for Variable P_1 Classified by Variable Ins,Kolmogorov-Smirnov Test for Variable P_1 Classified by Variable Ins,Kolmogorov-Smirnov Test for Variable P_1 Classified by Variable Ins
Ins,N,EDF at Maximum,Deviation from Mean at Maximum
0,7028,0.710302,12.624590
1,3724,0.275510,-17.343164
Total,10752,0.559710,
Maximum Deviation Occurred at Observation 2898,Maximum Deviation Occurred at Observation 2898,Maximum Deviation Occurred at Observation 2898,Maximum Deviation Occurred at Observation 2898
Value of P_1 at Maximum = 0.020933,Value of P_1 at Maximum = 0.020933,Value of P_1 at Maximum = 0.020933,Value of P_1 at Maximum = 0.020933

Kolmogorov-Smirnov Two-Sample Test (Asymptotic),Kolmogorov-Smirnov Two-Sample Test (Asymptotic).1,Kolmogorov-Smirnov Two-Sample Test (Asymptotic).2,Kolmogorov-Smirnov Two-Sample Test (Asymptotic).3
KS,0.206877,D,0.434791
KSa,21.451471,Pr > KSa,<.0001

Cramer-von Mises Test for Variable P_1 Classified by Variable Ins,Cramer-von Mises Test for Variable P_1 Classified by Variable Ins,Cramer-von Mises Test for Variable P_1 Classified by Variable Ins
Ins,N,Summed Deviation from Mean
0,7028,81.042697
1,3724,152.945241

Cramer-von Mises Statistics (Asymptotic),Cramer-von Mises Statistics (Asymptotic).1,Cramer-von Mises Statistics (Asymptotic).2,Cramer-von Mises Statistics (Asymptotic).3
CM,0.021762,CMa,233.987939

Kuiper Test for Variable P_1 Classified by Variable Ins,Kuiper Test for Variable P_1 Classified by Variable Ins,Kuiper Test for Variable P_1 Classified by Variable Ins
Ins,N,Deviation from Mean
0,7028,0.434791
1,3724,0.000269

Kuiper Two-Sample Test (Asymptotic),Kuiper Two-Sample Test (Asymptotic).1,Kuiper Two-Sample Test (Asymptotic).2,Kuiper Two-Sample Test (Asymptotic).3,Kuiper Two-Sample Test (Asymptotic).4,Kuiper Two-Sample Test (Asymptotic).5
K,0.43506,Ka,21.464719,Pr > Ka,<.0001


### 4.5 Model Selection Plots

The modern strategy for predictive modeling is to create a family of increasingly complex predictive models and choose the one that generalizes the best. In this course, we show you how to do this by using some custom macros.

* use the ROC and ROCCONTRAST statements in PROC LOGISTIC
* use the Assess and Fitandscore macros to generate and evaluate many models

#### Comparing ROC Curves of Several Models
You know that the ROC curve is a measure of a model's predictive accuracy. To create and compare ROC curves for multiple models, you use both the ROC and ROCCONTRAST statements in PROC LOGISTIC.

Let's look at the syntax of these statements and some sample code. You will see this code in the next demonstration. You use one ROC statement for each of the models that you want to compare. In this example, you want to compare two models, so you use two ROC statements. These statements create ROC curves for each of the specified models. You can use a label to identify the output for each ROC statement. The specification in each statement specifies the models to be compared. The specification can be either a list of input variables that have previously been specified in the MODEL statement, or PRED=variable, where the variable does not have to be specified in the MODEL statement, such as the predicted probability. The PRED= option enables you to input a criterion produced outside PROC LOGISTIC.

The ROCCONTRAST statement compares the ROC curves for the models that you specified in the ROC statements. You can specify only one ROCCONTRAST statement. You specify a label to identify the contrast statistics in the output. You provide a contrast specification to specify how the models will be compared. If no contrast is specified, then, by default, SAS produces a contrast matrix of the differences between each ROC curve and a reference curve.

#### Comparing ROC Curves to Measure Model Performance
Note: The demonstrations in this course build on each other. If you want to perform this demonstration in your own SAS software and you started a new SAS session after you performed the previous demonstration, open l4_demos.sas. It contains the solution code for all demos in Lesson 1, 2, 3 and 4. Locate the code for the previous demos, review the comments to see if any modifications are needed, and then submit the code.

In this demonstration, we create and compare the ROC curves on the validation data set for the two models we fit earlier: the preliminary model that we fit in Lesson 2, which has six inputs, and the model that we are working with in this lesson, which has 35 inputs. Let's look at the code. We have proc logistic data= the training data set we're working with. noprint; class res; model ins(event='1') equals the six inputs that we showed in Lesson 2.

Then we have a SCORE statement. We are scoring the validation data set, outputting a scored validation data set, and we're renaming the predicted probabilities to p_ch2. Then we have PROC LOGISTIC on the training data set. model ins(event='1') equals the variables we selected earlier in Lesson 3. The SCORE statement shows that we are going to score the scored validation data set. Output the scored validation data set, and we're renaming the predicted probabilities to p_sel.

Now we're going to use the ROC and ROCCONTRAST statements to create and compare the ROC curves for the two models. We have PROC LOGISTIC. We're working with the scored validation data set. model ins(event='1') equals the predicted probabilities from the preliminary model and the predicted probabilities for the model that we're working on. And we have a NOFIT option that performs the global score test without fitting the model. This option is used so that only the models specified in the ROC statements are compared.

And what are those models? Here we have the ROC statement with a label, looking at the model that we fit in Lesson 2. The ROC statement with a label,

and we're looking at the model that we fit in this lesson. Here is the ROCCONTRAST statement where we compare the two models. Since there's only two models, all we need is a label, and run. And all we're selecting is the ROC overlay plot, the ROC association statistics table, and the ROC contrast table.

Let's submit the code. Here we're showing the ROC curves on an overlay plot, and it clearly shows that the model we fit in Lesson 2 is inferior to the model that we're working with in Lesson 4.

Here are the ROC association statistics, where we show the c statistic for the Lesson 2 model, the c statistic for the Lesson 4 model. We have the standard error, and we have the confidence bands, and we have Somers' D, gamma, and tau-a.

The ROC contrast test results clearly show a difference between the two ROC curves, one from the Lesson 2 model and the other one that we're working with in Lesson 4. So the validation results show that the c statistic is much lower for the model fit in Lesson 2. However, because these inputs were selected arbitrarily, it would be highly unlikely that the model in Lesson 2 would outperform the model that we are working with in Lesson 4.

In [20]:
proc logistic data=work.train_imputed_swoe_bins noprint;
   class res;
   model ins(event='1')=dda ddabal dep depamt checks res;
   score data=work.valid_imputed_swoe_bins 
         out=work.sco_validate(rename=(p_1=p_ch2));         
run;

proc logistic data=work.train_imputed_swoe_bins noprint;
   model ins(event='1')=&selected;
   score data=work.sco_validate out=work.sco_validate(rename=(p_1=p_sel));         
run;

title1 "Validation Data Set Performance";
ods select ROCOverlay ROCAssociation ROCContrastTest;
proc logistic data=work.sco_validate;
   model ins(event='1')=p_ch2 p_sel / nofit;
   roc "Chapter 2 Model" p_ch2;
   roc "Chapter 4 Model" p_sel;
   roccontrast "Comparing the Two Models";
run;

ROC Association Statistics,ROC Association Statistics,ROC Association Statistics,ROC Association Statistics,ROC Association Statistics,ROC Association Statistics,ROC Association Statistics,ROC Association Statistics
ROC Model,Mann-Whitney,Mann-Whitney,Mann-Whitney,Mann-Whitney,Somers' D,Gamma,Tau-a
ROC Model,Area,Standard Error,95% Wald Confidence Limits,95% Wald Confidence Limits.1,Somers' D,Gamma,Tau-a
Chapter 2 Model,0.6715,0.00543,0.6609,0.6821,0.343,0.3475,0.1553
Chapter 4 Model,0.782,0.00456,0.773,0.7909,0.5639,0.564,0.2554

ROC Contrast Test Results,ROC Contrast Test Results,ROC Contrast Test Results,ROC Contrast Test Results
Contrast,DF,Chi-Square,Pr > ChiSq
Comparing the Two Models,1,528.4259,<.0001


#### Using Macros to Compare Many Models
The process of selecting inputs for a model, fitting that model, and evaluating that model's fit on the validation data set can be time consuming. However, you can automate the process with macro programming. The Assess and Fitandscore macros enable you to consider many candidate models in a small time frame. These two macros take a series of models generated by the best-subsets logistic regression and compare them on the validation data performance. You can generate plots of the results, which show the performance gains as a function of model complexity. These plots can be a helpful tool as you make your final model selection.

#### Comparing and Evaluating Many Models, Part 1
Note: The demonstrations in this course build on each other. If you want to perform this demonstration in your own SAS software and you started a new SAS session after you performed the previous demonstration, open l4_demos.sas. It contains the solution code for all demos in Lesson 1, 2, 3 and 4. Locate the code for the previous demos, review the comments to see if any modifications are needed, and then submit the code.

For the target marketing project, we want to generate a series of models of varying complexity. In this part of the demonstration, we generate a series of models on the prepared training data set using macros and best-subsets selection.

The validation data must be prepared in the same way as the training data set, and our process begins here with the screened variables from the demonstration in Lesson 3. So here I have %PUT and &screened to see which variables were selected in Lesson 3. Let's submit the code.

In [21]:
%put &screened;

We can see the variables in the log. So one of these variables we don't have, and that's the missing indicator credit score. That needs to be created on the validation data set.

So here we have a DATA step: data work.valid_imputed_swoe_bins; Bring in the data set, work.valid_imputed_swoe_bins, and we create the missing indicator credit score variable, where the missing indicator credit score variable equals 1 if a credit score is missing, zero otherwise.

And we also need to create our two dummy variables for Res where Resr equals 1 if Res equals 'R', and Resu equals 1 when Res equals 'U'. This is because the best-subsets selection does not allow CLASS statements in PROC LOGISTIC.

Let's submit the DATA step. 

In [22]:
data work.valid_imputed_swoe_bins;
   set work.valid_imputed_swoe_bins;
   MICRScor=(crscore=.);
   resr=(res='R');
   resu=(res='U');
run;

Now let's see which variables have missing values on this validation data set. 

We have: proc means data=work.valid_imputed_swoe_bins, asking for the number of missing values, and the VAR statement is the macro variable &screened.

Let's highlight the code and submit it, and it looks like Number Telephone Banking has missing data. Number Point of Sale has missing data. Credit Card Purchases have missing data; Age, and Income.

In [23]:
title1 "Variables with Missing Values on the Validation Data Set";
proc means data=work.valid_imputed_swoe_bins nmiss;
   var &screened; 
run;

Variable,Label,N Miss
SavBal Dep DDA CD Sav CC ATM MM branch_swoe Phone IRA IRABal B_DDABal ATMAmt ILS POS NSF CCPurc SDB DepAmt CCBal Inv InArea Age CashBk MICRScor Income,Saving Balance Checking Deposits Checking Account Certificate of Deposit Saving Account Credit Card ATM Money Market  Number Telephone Banking Retirement Account IRA Balance  ATM Withdrawal Amount Installment Loan Number Point of Sale Number Insufficient Fund Credit Card Purchases Safety Deposit Box Amount Deposited Credit Card Balance Investment Local Address Age Number Cash Back  Income,0 0 0 0 0 0 0 0 0 1350 0 0 0 0 0 1350 0 1350 0 0 0 0 0 2061 0 0 1866


Instead of using PROC UNIVARIATE to impute the missing variables for the selected variables, we're going to use PROC STDIZE. Something different... We're going to use PROC STDIZE to output a SAS data set that contains the relevant information about the imputed values for the selected inputs.

We're also going to use PROC STDIZE to take that information and use it to impute the training data set values in the validation data set. Let's look at the code.

We have: proc stdize data=work.train_imputed_swoe_bins method=median reponly (so replace only the missing values),

and here's something new: outstat=med; The OUTSTAT= option saves the imputed values in a separate data set called med, so med will have the medians for our variables: var &screened;

Let's submit the first PROC STDIZE. 

In [24]:
proc stdize data=work.train_imputed_swoe_bins method=median reponly
            OUTSTAT=med;
   var &screened;
run;

Then after the med data set has been created, it can be used to impute for missing values in a different data set.

We have: proc stdize data=work.valid_imputed_swoe_bins with the REPONLY (replace only) option and method=in(med). So essentially what we're doing is replacing the missing values on the validation data set with the medians from the imputed training data set.

We're using the METHOD=IN option, where it's taking the medians from the med data set, creating an output data set, work.valid_imputed_swoe_bins, and we're using the screened variables.

So let's highlight the second PROC STDIZE and submit, and **now we should have no more missing values on the validation data set**.

In [25]:
proc stdize data=work.valid_imputed_swoe_bins reponly method=in(med)
            out=work.valid_imputed_swoe_bins;
   var &screened;
run;


This demonstration uses two macros. One is called the Assess macro, and that will assess the performance of a model on a particular data set, and append a record summarizing that model's performance to the results data set.

The Assess macro is called in the Fitandscore macro. The Fitandscore macro will fit and score many different models. Here, the series of models is created by the best-subsets selection. For more information, see Assess and Fitandscore Macros in the Resources section.

So I include the first macro, the Assess macro, and I include a second macro, the Fitandscore macro. Let's highlight these two and submit.

And here we have the %fitandscore macro. The parameters for the macro are the data set name. So here we have (data_train=train_imputed_swoe_bins, data_validate=valid_imputed_swoe_bins, ...).

Another parameter is the TARGET= variable. In this case, it's Ins. Another parameter in a macro is the PREDICTORS=, which is the screened variables, the two dummy codes for Res, and then all the interactions that we uncovered in the forward selection method.

Another parameter is the BEST=, which asks for the number of models for each data set to be displayed for each model size.

Other parameters are the profit matrix. So we have to put in the profit matrix values, where, again, if the person was solicited and they didn't respond, you lose $1. If the person was solicited and they did respond, you gain $99, and if the person was not solicited, you have zeroes.

And the last parameter is the PI1= parameter, which is the proportion of events in the population.

Okay, so let's highlight this and run. So it's calling PROC LOGISTIC over and over, and it's scoring all the models selected in the best-subsets selection method.

We compare these models in the next part of the demonstration.

In [26]:

/* Assess macro code */
%macro assess(data=,inputcount=,inputsinmodel=,index=,pi1=,rho1=,target=,
              profit11=,profit01=,profit10=,profit00=);
    %let rho0 = %sysevalf(1-&rho1);
    %let pi0 = %sysevalf(1-&pi1);

    proc sort data=scored&data;
        by descending p_1;
    run;

    /* create assessment data set */
    data assess;
        attrib DATAROLE length=$5;
        retain sse 0 csum 0 DATAROLE "&data";

        /* 2 x 2 count array, or count matrix */
        array n[0:1,0:1] _temporary_ (0 0 0 0);

        /* sample weights array */
        array w[0:1] _temporary_ 
            (%sysevalf(&pi0/&rho0) %sysevalf(&pi1/&rho1));
        keep DATAROLE INPUT_COUNT INDEX 
            TOTAL_PROFIT OVERALL_AVG_PROFIT ASE C;
        set scored&data end=last;

        /* profit associated with each decision */
        d1=&Profit11*p_1+&Profit01*p_0;
        d0=&Profit10*p_1+&Profit00*p_0;

        /* T is a flag for response */
        t=(strip(&target)="1");

        /* D is the decision, based on profit. */
        d=(d1>d0);

        /* update the count matrix, sse, and c */
        n[t,d] + w[t];
        sse + (&target-p_1)**2;
        csum + ((n[1,1]+n[1,0])*(1-t)*w[0]);

        if last then
            do;
                INPUT_COUNT=&inputcount;
                TOTAL_PROFIT = sum(&Profit11*n[1,1],&Profit10*n[1,0],
                    &Profit01*n[0,1],&Profit00*n[0,0]);
                OVERALL_AVG_PROFIT = TOTAL_PROFIT/sum(n[0,0],n[1,0],n[0,1],n[1,1]);
                ASE = sse/sum(n[0,0],n[1,0],n[0,1],n[1,1]);
                C = csum/(sum(n[0,0],n[0,1])*sum(n[1,0],n[1,1]));
                index=&index;
                output;
            end;
    run;

    proc append base=results data=assess force;
    run;

%mend assess;


/* Fitandscore macro code */
/*Usage:*/
/*%fitandscore(data_train=,                             training data set*/
/*             data_validate=,                          validation data set*/
/*             target=,                                 target variable*/
/*             predictors=,                             predictor variable list*/
/*             best=,                                   # of best subset models to try*/
/*             profit00=,profit01=,profit10=,profit11=, values of the profit matrix*/
/*             pi1=);                                   actual population proportion*/

%macro fitandscore(data_train=,
                   data_validate=,
                   target=,
                   predictors=,
                   best=,
                   profit00=,profit01=,profit10=,profit11=,
                   pi1=);

    ods select none;

    ods output bestsubsets=score;

    proc logistic data=&data_train;
        model ins(event='1')=&predictors 
            / selection=SCORE best=&best;
    run;

    %global nmodels;

    proc sql;
        select count(*) into :nmodels from score;
    run;

    %global rho1;

    proc sql;
        select mean(INS) into :rho1 from &data_train;
    run;

    %do i=1 %to &nmodels;
        %global inputs&i;
        %global ic&i;
    %end;

    proc sql noprint;
        select variablesinmodel into :inputs1 -  
            from score;
        select NumberOfVariables into :ic1 - 
            from score;
    quit;

    proc datasets 
        library=work 
        nodetails 
        nolist;
        delete results;
    run;

    %do model_indx=1 %to &nmodels;

        %let im=&&inputs&model_indx;
        %let ic=&&ic&model_indx;

        proc logistic data=&data_train;
            model &target(event='1')=&im;
            score data=&data_train 
                out=scored&data_train(keep=ins p_1 p_0)
                priorevent=&pi1;
            score data=&data_validate 
                out=scored&data_validate(keep=ins p_1 p_0)
                priorevent=&pi1;
        run;

        %assess(data=&data_train,inputcount=&ic,inputsinmodel=&im,index=&model_indx,
            pi1=&pi1,rho1=&rho1,target=&target,profit11=&profit11,profit01=&profit01,
            profit10=&profit10,profit00=&profit00);
        %assess(data=&data_validate,inputcount=&ic,inputsinmodel=&im,index=&model_indx,
            pi1=&pi1,rho1=&rho1,target=&target,profit11=&profit11,profit01=&profit01,
            profit10=&profit10,profit00=&profit00);

    %end;

    ods select all;

%mend fitandscore;

/* End of macro code */

%fitandscore(data_train=train_imputed_swoe_bins,
             data_validate=valid_imputed_swoe_bins,
             target=ins,predictors=&screened resr resu SavBal*B_DDABal 
             MM*B_DDABal branch_swoe*ATMAmt B_DDABal*Sav SavBal*SDB 
             SavBal*DDA ATMAmt*DepAmt B_DDABal*ATMAmt SavBal*ATMAmt
             SavBal*IRA SavBal*MM SavBal*CC Sav*NSF DDA*ATMAmt Dep*ATM
             IRA*B_DDABal CD*MM MM*IRABal CD*Sav B_DDABal*CashBk Sav*CC,
             best=2,profit00=0,profit01=-1,profit10=0,profit11=99,pi1=0.02);



In [None]:
%showLog

#### Comparing and Evaluating Many Models, Part 2

Note: The demonstrations in this course build on each other. If you want to perform this demonstration in your own SAS software and you started a new SAS session after you performed the previous demonstration, open l4_demos.sas. It contains the solution code for all demos in Lesson 1, 2, 3 and 4. Locate the code for the previous demos, review the comments to see if any modifications are needed, and then submit the code.

In the first part of this demonstration, we generated a series of models on the prepared training data set. In this part of the demonstration, we compare the model performance measures for the training and validation data sets, and select the model with the highest profit.

The output of the Fitandscore macro is the work.results data set. Let's do a printout of the results data set. Let's look at the first 24 observations, and it shows you, for each data role (train or validate), we have the input count. We have the computed total profit, the computed average profit, the average squared error, the c statistic, and the model number. The average squared error is related to the mean squared error, but it has a different divisor.

In [28]:
title1 "Model Performance Measures for Training and Validation Data Sets";
proc print data=work.results(obs=24);
run;

Obs,DATAROLE,INPUT_COUNT,TOTAL_PROFIT,OVERALL_AVG_PROFIT,ASE,C,index
1,train,1,23129.89,1.07521,0.32753,0.65653,1
2,valid,1,11480.13,1.0677,0.32773,0.65483,1
3,train,1,21511.97,1.0,0.3292,0.48375,2
4,valid,1,10751.16,0.99991,0.32932,0.47975,2
5,train,2,23760.31,1.10451,0.32416,0.68932,3
6,valid,2,11883.25,1.1052,0.32438,0.6885,3
7,train,2,24629.62,1.14492,0.32564,0.69135,4
8,valid,2,12105.74,1.12589,0.32588,0.68752,4
9,train,3,24966.72,1.16059,0.31447,0.73595,5
10,valid,3,12572.67,1.16931,0.31586,0.73638,5


Now let's generate a graph of the c statistics by model, and a graph of the overall average profit by model. Let's separate the plots by the training and validation data sets, and only show the model with an index greater than 18, with 10 inputs or more. So we have: proc sgplot data=work.results; Remember, that is the output data set from the Fitandscore macro, where index is greater than 18. Let's do a line plot. The Y axis is the c statistic. The X axis is the model number. Then we have group=datarole, so we'll have a training results and a validation results. Marker attributes: We'll have the symbol be circles, and we'll show the markers. We'll show all the symbols.

Then, we'll have yaxis label= is the "C Statistic", and the values I selected: 0.770 to 0.790 by 0.01, just to make the graph look nicer. And the xaxis label is "Model Number" where I selected the values 20 to 80 by 5. So let's submit the PROC SGPLOT code.

In [29]:
title1 "C Statistics by Model";
proc sgplot data=work.results;
   where index > 18;
   series y=c x=index / group=datarole markerattrs=(symbol=circle) markers;
   yaxis label="C Statistic" Values=(0.770 to 0.790 by 0.01);
   xaxis label="Model Number" Values=(20 to 80 by 5);
run;

And the plot seems to reach an early peak around index 41. Because the goal here is to find a model that generalizes well, the simplest model that has good performance on the validation data is probably the best candidate. So I would say this model right here seems to be a good candidate.

Let's go back to the code. A number of people would like to see overall average profit by model, so **we're going to do a plot of profit to help visualize profit by model**. So we do: proc sgplot; data=work.results; where the index is greater than 18. Let's do a line plot. y equals overall average profit. x equals the model number. group=datarole (train versus validate). Marker attributes: Now we're using plus symbols, and again, the MARKERS option adds data point markers to the series' plot data points.

We have yaxis label="Overall Average Profit". I selected the values 1.21 to 1.26 by 0.01. And the xaxis label="Model Number". I selected the values 20 to 80 by 5. Let's submit the plot for the average profit.

In [30]:
title1 "Overall Average Profit by Model";
proc sgplot data=work.results;
   where index > 18;
   series y=overall_avg_profit x=index / 
           group=datarole markerattrs=(symbol=plus) markers;
   yaxis label="Overall Average Profit" Values=(1.21 to 1.26 by 0.010);
   xaxis label="Model Number" Values=(20 to 80 by 5);
run;

The plot seems to show that the validation data performance peaks at index 56, right about here. This might be a candidate model that we'll move forward with, if profit is one of your most important metrics for model performance.

So let's **show the model with the highest profit**. First, we're going to create a macro variable index. It's going to be a global macro variable. proc sql; select index into : index (the macro variable index) from work.results where datarole='valid'... having the overall average profit be the maximum overall average profit. 

In [31]:
title1 "Model Number with Highest Profit";
%global index;
proc sql;
   select index into :index
   from work.results
   where datarole='valid' 
   having overall_avg_profit=max(overall_avg_profit);
quit;

index
56


So I think **it's model 56**, based on the plot. 

Let's see which model actually had the highest overall average profit, and it's model 56.

Then we use a %let index=.... Then we're using the %COMPRESS. What is that? The %COMPRESS auto-call macro removes all the blanks from the macro variable index. We do not need any blanks in the macro variable for index. Then we use: proc logistic data=...; (the training data set). model ins(event='1')...; And this will show the variables from the model with the highest profit. score...; and we're scoring the validation data set, creating an output data set called scoval2 (score validation 2). And let's ask for our fit statistics.

So let's submit the code for PROC LOGISTIC. And let's see who made it in the model. 

In [32]:
/* Remove all blanks from index */
%let index=%cmpres(&index);
title1 "Logistic Model with Highest Profit";
proc logistic data=work.train_imputed_swoe_bins;
   model ins(event='1')=&&inputs&index;
   score data=work.valid_imputed_swoe_bins out=work.scoval2 fitstat; 
run;

Model Information,Model Information.1
Data Set,WORK.TRAIN_IMPUTED_SWOE_BINS
Response Variable,Ins
Number of Response Levels,2
Model,binary logit
Optimization Technique,Fisher's scoring

0,1
Number of Observations Read,21512
Number of Observations Used,21512

Response Profile,Response Profile,Response Profile
Ordered Value,Ins,Total Frequency
1,0,14061
2,1,7451

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,22665.951
SC,27767.651,22897.265
-2 Log L,27757.675,22607.951

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,5149.7241,28,<.0001
Score,4479.4862,28,<.0001
Wald,3438.5476,28,<.0001

Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates
Parameter,DF,Estimate,Standard Error,Wald Chi-Square,Pr > ChiSq
Intercept,1,-1.9217,0.0926,430.2719,<.0001
SavBal,1,0.000164,9.545e-06,293.4418,<.0001
DDA,1,-0.2544,0.0491,26.8705,<.0001
CD,1,0.9672,0.0522,343.7165,<.0001
Sav,1,0.8459,0.0844,100.5593,<.0001
MM,1,1.7664,0.1462,146.0518,<.0001
branch_swoe,1,0.9408,0.0795,139.8803,<.0001
IRA,1,1.1164,0.1968,32.1891,<.0001
B_DDABal,1,0.0284,0.000996,811.3105,<.0001
ATMAmt,1,0.000201,2.9e-05,47.0271,<.0001

Odds Ratio Estimates,Odds Ratio Estimates,Odds Ratio Estimates,Odds Ratio Estimates
Effect,Point Estimate,95% Wald Confidence Limits,95% Wald Confidence Limits.1
ILS,0.784,0.674,0.911
NSF,1.441,1.268,1.637
Inv,1.555,1.284,1.885

Association of Predicted Probabilities and Observed Responses,Association of Predicted Probabilities and Observed Responses.1,Association of Predicted Probabilities and Observed Responses.2,Association of Predicted Probabilities and Observed Responses.3
Percent Concordant,78.7,Somers' D,0.574
Percent Discordant,21.3,Gamma,0.575
Percent Tied,0.0,Tau-a,0.26
Pairs,104768511.0,c,0.787

Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data,Fit Statistics for SCORE Data
Data Set,Total Frequency,Log Likelihood,Error Rate,AIC,AICC,BIC,SC,R-Square,Max-Rescaled R-Square,AUC,Brier Score
WORK.VALID_IMPUTED_SWOE_BINS,10752,-5750.7,0.2672,11559.46,11559.62,11770.66,11770.66,0.197976,0.273139,0.782223,0.1768


Boy, a number of variables. Savings balance made it. CD, checking account, Savings, Money market. There's my branch smoothed weight of evidence. IRA. There are the bins checking account balance, ATM amount, several other variables.

And we have a number of interactions that made it also. This is the model with the highest profit. There's our Odds Ratio Estimates. Our c is 0.787, but that's for the training data set. Let's see what it is for the validation data set. 0.782223. So again, not much difference between the training data set c statistic and the validation data set c statistic, so we are fairly comfortable saying that this model will generalize well on new data. There doesn't seem to be any problem with overfitting here.