-
Notifications
You must be signed in to change notification settings - Fork 1
/
lesson19.sas
89 lines (73 loc) · 2.48 KB
/
lesson19.sas
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
* make a copy to WORK;
data helpmkh;
set library.helpmkh;
run;
* Encoding: UTF-8.
* ============================================.
* LESSON 19 - Logistic Regression
*
* Melinda Higgins, PhD
* dated 10/31/2017
* ============================================.
* ============================================.
* For this lesson we'll use the helpmkh dataset
*
* Let's focus on homeless as the main outcome variable
* which is dichotomous coded 0 and 1. We'll use
* logistic regression to look at predicting whether someone
* was homeless or not using these variables
* age, gender, pss_fr, pcs, mcs, cesd and indtot
* ============================================.
* ============================================.
* let's look at the correlations between these variables
* ============================================;
proc corr data=helpmkh;
var homeless age female pss_fr pcs mcs cesd indtot;
run;
* ============================================.
* Given the stronger correlation between indtot
* and homeless, let's run a t-test to see the comparison
* ============================================;
proc ttest data=helpmkh;
class homeless;
var indtot;
run;
* ============================================.
* Let's run a logistic regression of indtot to predict
* the probability of being homeless
* we'll also SAVE the predicted probabilities
* and the predicted group membership
*
* let's look at different thresholds pprob
* ctable gives us the classification table
*
* use the plots=roc to get the ROC curve
* ============================================;
proc logistic data=helpmkh plots=roc;
model homeless = indtot / ctable pprob=(0.2 to 0.8 by 0.1);
output out=m1 p=prob;
run;
* ============================================
using the saved probabilities
make a plot against the indtot predictor
* ============================================;
proc gplot data = m1;
plot prob*indtot;
run;
* ============================================.
* Given the correlation matrix above, it looks like
* gender, pss_fr, pcs, and indtot are all significantly
* associated with being homeless
*
* let's put all of these together into 1
* model
* ============================================;
proc logistic data=helpmkh;
model homeless = female pss_fr pcs indtot;
run;
* ============================================
let's also run using variable selection
* ============================================;
proc logistic data=helpmkh;
model homeless = female pss_fr pcs indtot / selection=forward;
run;