# Notes about the SAS's Course
# Predictive Modeling Using Logistic Regression (15.1)

This course covers predictive modeling using SAS/STAT software with emphasis on the LOGISTIC procedure. This course also discusses selecting variables and interactions, recoding categorical variables based on the smooth weight of evidence, assessing models, treating missing values, and using efficiency techniques for massive data sets. This notes are based in the course materials, some codes and images are copyrighted by Sas Institute. I made a Jupyter Notebook using JupiterLab with SAS University Edition.

In [4]:
/*Run this script to configurate the session*/

%let InicioCurso=/folders/myfolders/Cursos/EPMLR51/;
%include "&InicioCurso/setup.sas";

## Lesson 2: Fitting the Model


#### Fitting a Basic Logistic Regression Model, Part 1

First create the training and validation data.

In [5]:
data work.develop;
   set pmlr.develop;
run;

%global inputs;
%let inputs=ACCTAGE DDA DDABAL DEP DEPAMT CASHBK 
            CHECKS DIRDEP NSF NSFAMT PHONE TELLER 
            SAV SAVBAL ATM ATMAMT POS POSAMT CD 
            CDBAL IRA IRABAL LOC LOCBAL INV 
            INVBAL ILS ILSBAL MM MMBAL MMCRED MTG 
            MTGBAL CC CCBAL CCPURC SDB INCOME 
            HMOWN LORES HMVAL AGE CRSCORE MOVED 
            INAREA;

proc means data=work.develop n nmiss mean min max;
   var &inputs;
run;

proc freq data=work.develop;
   tables ins branch res;
run;



/***************************************************************************/

/* Sort the data by the target in preparation for stratified sampling. */

proc sort data=work.develop out=work.develop_sort; 
   by ins; 
run;

/* The SURVEYSELECT procedure will perform stratified sampling 
   on any variable in the STRATA statement. The OUTALL option 
   specifies that you want a flag appended to the file to 
   indicate selected records, not simply a file comprised 
   of the selected records. */

proc surveyselect noprint data=work.develop_sort 
                  samprate=.6667 stratumseed=restore
                  out=work.develop_sample
                  seed=44444 outall;
   strata ins;
run;

/* Verify stratification. */

proc freq data=work.develop_sample;
   tables ins*selected;
run;

/* Create training and validation data sets. */

data work.train(drop=selected SelectionProb SamplingWeight) 
     work.valid(drop=selected SelectionProb SamplingWeight);
   set work.develop_sample;
   if selected then output work.train;
   else output work.valid;
run;

Variable,Label,N,N Miss,Mean,Minimum,Maximum
AcctAge DDA DDABal Dep DepAmt CashBk Checks DirDep NSF NSFAmt Phone Teller Sav SavBal ATM ATMAmt POS POSAmt CD CDBal IRA IRABal LOC LOCBal Inv InvBal ILS ILSBal MM MMBal MMCred MTG MTGBal CC CCBal CCPurc SDB Income HMOwn LORes HMVal Age CRScore Moved InArea,Age of Oldest Account Checking Account Checking Balance Checking Deposits Amount Deposited Number Cash Back Number of Checks Direct Deposit Number Insufficient Fund Amount NSF Number Telephone Banking Teller Visits Saving Account Saving Balance ATM ATM Withdrawal Amount Number Point of Sale Amount Point of Sale Certificate of Deposit CD Balance Retirement Account IRA Balance Line of Credit Line of Credit Balance Investment Investment Balance Installment Loan Loan Balance Money Market Money Market Balance Money Market Credits Mortgage Mortgage Balance Credit Card Credit Card Balance Credit Card Purchases Safety Deposit Box Income Owns Home Length of Residence Home Value Age Credit Score Recent Address Change Local Address,30194 32264 32264 32264 32264 32264 32264 32264 32264 32264 28131 32264 32264 32264 32264 32264 28131 28131 32264 32264 32264 32264 32264 32264 28131 28131 32264 32264 32264 32264 32264 32264 32264 28131 28131 28131 32264 26482 26731 26482 26482 25907 31557 32264 32264,2070 0 0 0 0 0 0 0 0 0 4133 0 0 0 0 0 4133 4133 0 0 0 0 0 0 4133 4133 0 0 0 0 0 0 0 4133 4133 4133 0 5782 5533 5782 5782 6357 707 0 0,5.9086772 0.8156459 2170.02 2.1346082 2232.76 0.0159621 4.2599182 0.2955616 0.0870630 2.2905464 0.4056024 1.3652678 0.4668981 3170.60 0.6099368 1235.41 1.0756816 48.9261782 0.1258368 2530.71 0.0532792 617.5704550 0.0633833 1175.22 0.0296826 1599.17 0.0495909 517.5692344 0.1148959 1875.76 0.0563786 0.0493429 8081.74 0.4830969 9586.55 0.1541716 0.1086660 40.5889283 0.5418802 7.0056642 110.9121290 47.9283205 666.4935197 0.0296305 0.9602963,0.3000000 0 -774.8300000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -613.0000000 0 -2214.92 0 0 0 0 0 0 0 0 -2060.51 0 0 0 0 0.5000000 67.0000000 16.0000000 509.0000000 0 0,61.5000000 1.0000000 278093.83 28.0000000 484893.67 4.0000000 49.0000000 1.0000000 1.0000000 666.8500000 30.0000000 27.0000000 1.0000000 700026.94 1.0000000 427731.26 54.0000000 3293.49 1.0000000 1053900.00 1.0000000 596497.60 1.0000000 523147.24 1.0000000 8323796.02 1.0000000 29162.79 1.0000000 120801.11 5.0000000 1.0000000 10887573.28 1.0000000 10641354.78 5.0000000 1.0000000 233.0000000 1.0000000 19.5000000 754.0000000 94.0000000 820.0000000 1.0000000 1.0000000

Ins,Frequency,Percent,Cumulative Frequency,Cumulative Percent
0,21089,65.36,21089,65.36
1,11175,34.64,32264,100.0

Branch of Bank,Branch of Bank,Branch of Bank,Branch of Bank,Branch of Bank
Branch,Frequency,Percent,Cumulative Frequency,Cumulative Percent
B1,2819,8.74,2819,8.74
B10,273,0.85,3092,9.58
B11,247,0.77,3339,10.35
B12,549,1.7,3888,12.05
B13,535,1.66,4423,13.71
B14,1072,3.32,5495,17.03
B15,2235,6.93,7730,23.96
B16,1534,4.75,9264,28.71
B17,850,2.63,10114,31.35
B18,541,1.68,10655,33.02

Area Classification,Area Classification,Area Classification,Area Classification,Area Classification
Res,Frequency,Percent,Cumulative Frequency,Cumulative Percent
R,8077,25.03,8077,25.03
S,11506,35.66,19583,60.7
U,12681,39.3,32264,100.0

Table of Ins by Selected,Table of Ins by Selected,Table of Ins by Selected,Table of Ins by Selected
Ins,Selected(Selection Indicator),Selected(Selection Indicator),Selected(Selection Indicator)
Ins,0,1,Total
Frequency Percent Row Pct Col Pct,,,
0,7028 21.78 33.33 65.36,14061 43.58 66.67 65.36,21089 65.36
1,3724 11.54 33.32 34.64,7451 23.09 66.68 34.64,11175 34.64
Total,10752 33.33,21512 66.67,32264 100.00
Frequency Percent Row Pct Col Pct,Table of Ins by Selected Ins Selected(Selection Indicator) 0 1 Total 0 7028 21.78 33.33 65.36 14061 43.58 66.67 65.36 21089 65.36  1 3724 11.54 33.32 34.64 7451 23.09 66.68 34.64 11175 34.64  Total 10752 33.33 21512 66.67 32264 100.00,,

Frequency Percent Row Pct Col Pct

Table of Ins by Selected,Table of Ins by Selected,Table of Ins by Selected,Table of Ins by Selected
Ins,Selected(Selection Indicator),Selected(Selection Indicator),Selected(Selection Indicator)
Ins,0,1,Total
0,7028 21.78 33.33 65.36,14061 43.58 66.67 65.36,21089 65.36
1,3724 11.54 33.32 34.64,7451 23.09 66.68 34.64,11175 34.64
Total,10752 33.33,21512 66.67,32264 100.00


In [4]:
title 'Show the data source';
proc print data=work.train(obs=3); run; 
title;


title1 "Logistic Regression Model for the Variable Annuity Data Set";
proc logistic data=work.train 
              plots(only maxpoints=none)=(effect(clband x=(ddabal depamt checks res))
              
              /*ONLY (suppresses the default plots). Only specifically requested plot requests are displayed.*/
              /*maxpoints=none, (requests that no observations be displayed in the effect plots). */
              /*We have the effect plots. That displays and enhances the effect plots for the model that will 
              have the probabilities on the Y axis and the values of the predictor variables on the X axis.*/
              /*CLBAND that displays confidence limits on the plots.*/
              /*We also have X=. That specifies effects to be used on the X axis of the effect plots. You can specify several X axes.*/
              
              oddsratio (type=horizontalstat));
              
              /*We also have an odds ratio plot request that displays and enhances the odds ratio plots for the model when the CLODDS= option, or
              ODDSRATIO statements, are also specified.*/
              /*TYPE=horizontalstat displays the values of the odds ratios and their confidence limits on the right side of the graphic, 
              and this option places the odds ratio values on the X axis.*/
              
   class res (param=ref ref='S') dda (param=ref ref='0');
   
   /*CLASS statement that specifies the classification variables to be used in the analysis. The CLASS statement must precede the MODEL 
   statement.*/
   /*PARAM= option (that's a selected CLASS statement option) that specifies the parameterization method for the classification variable 
   or variables.*/
   /*REF= option that specifies the reference level, and the reference level here is S for suburb.*/
   /*DDA (checking account) as a class variable with reference cell parameterization and a reference cell of 0.*/
   
   model ins(event='1')=dda ddabal dep depamt 
   
   /*MODEL statement specifies the response variable and the predictor variables, which can be character or numeric.*/
   /*EVENT= option. That specifies the event category for the binary response model. has no effect when there are more than 
   two response categories.*/
   
   
               cashbk checks res / stb clodds=pl;
               
               /*STB: requests that standardized estimates for the parameters be printed for the predictor variables.*/
               /*clodds=pl that requests confidence intervals for the odds ratios, and we're going to use the profile likelihood confidence interval.*/
   
   units ddabal=1000 depamt=1000 / default=1;
   
   /*UNITS statement that enables you to obtain an odds ratio estimate for a specified change in a predictor variable.(number, standard deviation, or a 
   number times the standard deviation.). here 1,000-unit increase in checking account balance and a 1,000-unit increase in deposit amount.*/
   /*DEFAULT=, and that gives a list of units of change for all the predictor variables that are not specified in the UNITS statement. So here we have default=1.*/
   /*If the DEFAULT option is not specified, PROC LOGISTIC does not produce customized odds ratio estimates for any predictive variable that is not listed in the 
   UNITS statement.*/

   oddsratio 'Comparisons of Residential Classification' res / diff=all cl=pl;
   
   /*ODDSRATIO statement. That produces odds ratios for variables. can specify several ODDSRATIO statements.*/
   /*DIFF=ALL option that specifies whether the odds ratios for a classification variable are computed against their reference level or all pairs of variables are 
   compared. Here, we're using DIFF=ALL, so we're going to have all possible comparisons.*/
   /*CL=PL, and that specifies that we're creating profile likelihood confidence limits for the odds ratios. */
   
   
   effectplot slicefit(sliceby=dda x=ddabal) / noobs;
   effectplot slicefit(sliceby=dda x=depamt) / noobs;   
   
   /*EFFECTPLOT statement that produces a display of the fitted model and provides options for changing and enhancing the displays.*/
   /*SLICEFIT. That displays a curve of predicted values versus a continuous variable grouped by the levels of a class effect.*/
   /*SLICEBY=DDA that displays the fitted values at the different levels of, in this case, DDA (checking account), and that is why I have DDA as a class variable.*/
   /*X= option specifies values to display on the X axis. */
   /* NOOBS suppresses the display of observations.*/
   /*sliceby=dda x=depamt. So here, we're going to have the deposit amount on the X axis with the NOOBS option.*/

run;
title1;


Model Information,Model Information.1
Data Set,WORK.TRAIN
Response Variable,Ins
Number of Response Levels,2
Model,binary logit
Optimization Technique,Fisher's scoring

0,1
Number of Observations Read,21512
Number of Observations Used,21512

Response Profile,Response Profile,Response Profile
Ordered Value,Ins,Total Frequency
1,0,14061
2,1,7451

Class Level Information,Class Level Information,Class Level Information,Class Level Information
Class,Value,Design Variables,Design Variables.1
Res,R,1,0.0
,S,0,0.0
,U,0,1.0
DDA,0,0,
,1,1,

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,27759.675,26284.098
SC,27767.651,26355.885
-2 Log L,27757.675,26266.098

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,1491.5772,8,<.0001
Score,1315.6105,8,<.0001
Wald,1256.8282,8,<.0001

Type 3 Analysis of Effects,Type 3 Analysis of Effects,Type 3 Analysis of Effects,Type 3 Analysis of Effects
Effect,DF,Wald Chi-Square,Pr > ChiSq
DDA,1,484.002,<.0001
DDABal,1,317.1284,<.0001
Dep,1,26.0277,<.0001
DepAmt,1,10.1271,0.0015
CashBk,1,19.8706,<.0001
Checks,1,0.0309,0.8604
Res,2,0.1229,0.9404

Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates
Parameter,Unnamed: 1_level_1,DF,Estimate,Standard Error,Wald Chi-Square,Pr > ChiSq,Standardized Estimate
Intercept,,1,0.1706,0.0374,20.8591,<.0001,
DDA,1,1,-1.041,0.0473,484.002,<.0001,-0.2226
DDABal,,1,7.5e-05,4.188e-06,317.1284,<.0001,0.3135
Dep,,1,-0.0682,0.0134,26.0277,<.0001,-0.0648
DepAmt,,1,1.2e-05,3.819e-06,10.1271,0.0015,0.046
CashBk,,1,-0.6393,0.1434,19.8706,<.0001,-0.0468
Checks,,1,-0.00068,0.00384,0.0309,0.8604,-0.00193
Res,R,1,-0.0129,0.0388,0.1106,0.7395,-0.00308
Res,U,1,-0.00191,0.0343,0.0031,0.9557,-0.00051

Association of Predicted Probabilities and Observed Responses,Association of Predicted Probabilities and Observed Responses.1,Association of Predicted Probabilities and Observed Responses.2,Association of Predicted Probabilities and Observed Responses.3
Percent Concordant,67.2,Somers' D,0.357
Percent Discordant,31.5,Gamma,0.362
Percent Tied,1.3,Tau-a,0.162
Pairs,104768511.0,c,0.679

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Label,Odds Ratio,Estimate,95% Confidence Limits,95% Confidence Limits.1
Comparisons of Residential Classification,Res R vs S,0.987,0.915,1.065
Comparisons of Residential Classification 2,Res R vs U,0.989,0.918,1.066
Comparisons of Residential Classification 3,Res U vs S,0.998,0.933,1.068

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Effect,Unit,Estimate,95% Confidence Limits,95% Confidence Limits.1
DDA 1 vs 0,1.0,0.353,0.322,0.387
DDABal,1000.0,1.077,1.069,1.086
Dep,1.0,0.934,0.91,0.959
DepAmt,1000.0,1.012,1.005,1.02
CashBk,1.0,0.528,0.394,0.693
Checks,1.0,0.999,0.992,1.007
Res R vs S,1.0,0.987,0.915,1.065
Res U vs S,1.0,0.998,0.933,1.068


* The Type 3 Analysis of Effects table shows which input variables are statistically significant, controlling for all of the other input variables in the model.
* The c statistic is the probability of an observation with an event having a higher predicted probability than an observation without the event.

#### Scoring New Cases

One of the main purposes of predictive modeling is to score new cases—in other words, to build a model and then apply the model to new data to get a predicted probability for each new observation. You simply enter the new data input values into your logistic regression model to get a new predicted probability.

...


#### Correcting for Oversampling

...

#### Practice: Fitting a Logistic Regression Model
For the veterans' organization project, fit a logistic regression model to the pmlr.pva_train data set and use ODS statistical graphics to display the results.
1. Create a global macro variable named ex_pi1 that stores π1, the proportion of responders in the population. Note: To find the proportion of responders in the population, review the pva_raw_data data set description.

In [6]:
%global ex_pi1;
%let ex_pi1=0.05;

2. Add a PROC LOGISTIC step that does the following:
* fits a logistic regression model with Target_B as the target variable, and Pep_Star, Recent_Avg_Gift_Amt, and Frequency_Status_97NK as the input variables.
* models the probability that the target variable equals 1, and requests 95% profile likelihood confidence intervals for the odds ratio.
* uses the SCORE statement to create a temporary data set called scopva_train, which contains the data from the pva_train data set and the predicted probability of the event, correcting for oversampling.0
* creates effect plots with confidence bands for the three input variables
* creates effect plots of Recent_Avg_Gift_Amt by Pep_Star and Frequency_Status_97NK by PEP_STAR
* creates an odds ratio plot with a horizontal orientation and displays the statistics
* uses the ONLY global plot option to suppress the default plots

Note: To avoid a warning in the log about the suppression of plots that have more than 5000 observations, you can add the following MAXPOINTS= option to the PROC LOGISTIC statement: plots(maxpoints=none only). This change is optional. Omitting the MAXPOINTS= option does not affect the results of the practices in this course.

In [7]:
title1 "Logistic Regression Model of the Veterans' Organization Data";
proc logistic data=pmlr.pva_train plots(only)=
              (effect(clband x=(pep_star recent_avg_gift_amt
              frequency_status_97nk)) oddsratio (type=horizontalstat));
   class pep_star (param=ref ref='0');
   model target_b(event='1')=pep_star recent_avg_gift_amt
                  frequency_status_97nk / clodds=pl;
   effectplot slicefit(sliceby=pep_star x=recent_avg_gift_amt) / noobs; 
   effectplot slicefit(sliceby=pep_star x=frequency_status_97nk) / noobs; 
   score data=pmlr.pva_train out=work.scopva_train priorevent=&ex_pi1;
run;

Model Information,Model Information.1
Data Set,PMLR.PVA_TRAIN
Response Variable,TARGET_B
Number of Response Levels,2
Model,binary logit
Optimization Technique,Fisher's scoring

0,1
Number of Observations Read,9687
Number of Observations Used,9687

Response Profile,Response Profile,Response Profile
Ordered Value,TARGET_B,Total Frequency
1,0,7265
2,1,2422

Class Level Information,Class Level Information,Class Level Information
Class,Value,Design Variables
PEP_STAR,0,0
,1,1

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,10897.23,10663.061
SC,10904.409,10691.776
-2 Log L,10895.23,10655.061

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,240.169,3,<.0001
Score,242.9486,3,<.0001
Wald,237.2875,3,<.0001

Type 3 Analysis of Effects,Type 3 Analysis of Effects,Type 3 Analysis of Effects,Type 3 Analysis of Effects
Effect,DF,Wald Chi-Square,Pr > ChiSq
PEP_STAR,1,43.4902,<.0001
RECENT_AVG_GIFT_AMT,1,3.9559,0.0467
FREQUENCY_STATUS_97N,1,83.8209,<.0001

Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates
Parameter,Unnamed: 1_level_1,DF,Estimate,Standard Error,Wald Chi-Square,Pr > ChiSq
Intercept,,1,-1.6454,0.0831,392.448,<.0001
PEP_STAR,1.0,1,0.3371,0.0511,43.4902,<.0001
RECENT_AVG_GIFT_AMT,,1,-0.00579,0.00291,3.9559,0.0467
FREQUENCY_STATUS_97N,,1,0.2179,0.0238,83.8209,<.0001

Association of Predicted Probabilities and Observed Responses,Association of Predicted Probabilities and Observed Responses.1,Association of Predicted Probabilities and Observed Responses.2,Association of Predicted Probabilities and Observed Responses.3
Percent Concordant,59.9,Somers' D,0.208
Percent Discordant,39.0,Gamma,0.211
Percent Tied,1.1,Tau-a,0.078
Pairs,17595830.0,c,0.604

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Effect,Unit,Estimate,95% Confidence Limits,95% Confidence Limits.1
PEP_STAR 1 vs 0,1.0,1.401,1.267,1.549
RECENT_AVG_GIFT_AMT,1.0,0.994,0.988,1.0
FREQUENCY_STATUS_97N,1.0,1.243,1.187,1.303


3. Based on the results, you interpret the specified metrics as follows:


**the c statistic**
The c statistic is shown in the Association of Predicted Probabilities and Observed Responses table. The c statistic, 0.604, can be interpreted as the probability of a customer who donated to the national veterans' organization having a higher predicted probability (of donating to the organization) compared to a customer who did not donate.


**the odds ratio for Pep_Star**
The odds ratio for Pep_Star is shown in the Odds Ratio Estimates and Profile-Likelihood Confidence Intervals table. This odds ratio indicates that customers who are consecutive donors have 1.40 times the odds of responding to a solicitation compared to customers who are not consecutive donors.


**the effect plot for Recent_Avg_Gift_Amt by Pep_Star**
The effect plot for Recent_Avg_Gift_Amt by Pep_Star shows a negative relationship between the average donation amount in response to promotions since June 1994 and the predicted probabilities of responding to the solicitation in June of 1997. The consecutive donors have the higher probabilities across all the values of Recent_Avg_Gift_Amt. The consecutive donors have the highest probabilities across all values of Recent_Avg_Gift_Amt.


**the effect plot for Recent_Avg_Gift_Amt**
The effect plot for Recent_Avg_Gift_Amt shows a negative relationship between the average donation amount in response to promotions since June 1994 and the predicted probabilities of responding to the solicitation in June 1997. The highest confidence intervals correspond to the largest values of Recent_Avg_Gift_Amt.


4. Print the first ten adjusted probabilities.

In [8]:
title1 "Adjusted Predicted Probabilities of the Veteran's Organization Data";
proc print data=work.scopva_train(obs=10);
   var p_1 pep_star recent_avg_gift_amt frequency_status_97nk;
run;
title;

Obs,P_1,PEP_STAR,RECENT_AVG_GIFT_AMT,FREQUENCY_STATUS_97NK
1,0.04639,1,15.0,1
2,0.033094,0,17.5,1
3,0.06489,0,8.33,4
4,0.090167,1,5.0,4
5,0.059152,1,8.33,2
6,0.058117,1,11.57,2
7,0.046941,1,12.86,1
8,0.031733,0,25.0,1
9,0.045126,1,20.0,1
10,0.032091,0,23.0,1


#### Quiz

In [9]:

data work.pva(drop=CONTROL_NUMBER MONTHS_SINCE_LAST_PROM_RESP 
              FILE_AVG_GIFT FILE_CARD_GIFT);
   set pmlr.pva_raw_data;
run;

data work.pva;
   set work.pva;
   STATUS_FL=RECENCY_STATUS_96NK in("F","L");
   STATUS_ES=RECENCY_STATUS_96NK in("E","S");
run;

proc logistic data=work.pva plots(only)=(effect(clband
              x=(lifetime_card_prom recent_response_prop
                 months_since_last_gift       
                 recent_avg_gift_amt status_es)) 
                 oddsratio (type=horizontalstat)) 
                 namelen=25;
   model target_b(event='1')= lifetime_card_prom
         recent_response_prop months_since_last_gift
         recent_avg_gift_amt status_es / clodds=pl stb;
   units lifetime_card_prom=10 months_since_last_gift=6
         recent_avg_gift_amt=25 / default=1;
run;

Model Information,Model Information.1
Data Set,WORK.PVA
Response Variable,TARGET_B
Number of Response Levels,2
Model,binary logit
Optimization Technique,Fisher's scoring

0,1
Number of Observations Read,19372
Number of Observations Used,19372

Response Profile,Response Profile,Response Profile
Ordered Value,TARGET_B,Total Frequency
1,0,14529
2,1,4843

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics,Model Fit Statistics,Model Fit Statistics
Criterion,Intercept Only,Intercept and Covariates
AIC,21789.113,21345.336
SC,21796.984,21392.566
-2 Log L,21787.113,21333.336

Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0,Testing Global Null Hypothesis: BETA=0
Test,Chi-Square,DF,Pr > ChiSq
Likelihood Ratio,453.7764,5,<.0001
Score,461.6635,5,<.0001
Wald,446.9425,5,<.0001

Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates,Analysis of Maximum Likelihood Estimates
Parameter,DF,Estimate,Standard Error,Wald Chi-Square,Pr > ChiSq,Standardized Estimate
Intercept,1,-0.9669,0.1131,73.0559,<.0001,
LIFETIME_CARD_PROM,1,0.0141,0.00216,42.8106,<.0001,0.0667
RECENT_RESPONSE_PROP,1,1.5328,0.1696,81.6972,<.0001,0.0963
MONTHS_SINCE_LAST_GIFT,1,-0.0325,0.0043,57.2378,<.0001,-0.0723
RECENT_AVG_GIFT_AMT,1,-0.0103,0.00206,25.1344,<.0001,-0.058
STATUS_ES,1,0.1412,0.0453,9.7289,0.0018,0.0333

Association of Predicted Probabilities and Observed Responses,Association of Predicted Probabilities and Observed Responses.1,Association of Predicted Probabilities and Observed Responses.2,Association of Predicted Probabilities and Observed Responses.3
Percent Concordant,59.8,Somers' D,0.196
Percent Discordant,40.2,Gamma,0.196
Percent Tied,0.0,Tau-a,0.073
Pairs,70363947.0,c,0.598

Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals,Odds Ratio Estimates and Profile-Likelihood Confidence Intervals
Effect,Unit,Estimate,95% Confidence Limits,95% Confidence Limits.1
LIFETIME_CARD_PROM,10.0,1.152,1.104,1.202
RECENT_RESPONSE_PROP,1.0,4.631,3.321,6.457
MONTHS_SINCE_LAST_GIFT,6.0,0.823,0.782,0.865
RECENT_AVG_GIFT_AMT,25.0,0.772,0.697,0.853
STATUS_ES,1.0,1.152,1.054,1.258
