-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression influence diagnostics #7044
Comments
Arun Aryasomayajula commented: [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] and team to discuss and assign an engineer |
Wendy Wong commented: I have written a write-up describing my implementation: [^Regression Influence Diagnostics.pdf] |
JIRA Issue Details Jira Issue: PUBDEV-8638 |
Attachments From Jira Attachment Name: Regression Influence Diagnostics.pdf Attachment Name: Regression influence diagnostics feature request.docx |
Linked PRs from JIRA |
Regression influence diagnostics
The SAS documentation provides a conceptual overview and formulas for the diagnostics. The one we need in particular is the DFBETA statistic. This statistic measures how much the coefficient for a variable changes when an observation is deleted.
Proc Logistic: [https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect042.htm|https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect042.htm|smart-link]
Proc Reg:
[https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_reg_sect040.htm|https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_reg_sect040.htm|smart-link]
The minimal requirement is to implement the DFBETA statistic for a binary target (proc Logistic) model. The DFBETA statistic conceptually can also be calculated for a continuous target (proc Reg).
The desired output is a data frame that has the same number of rows as the data frame used to estimate the model. It should have columns that correspond to the variables included in the final model after any selection algorithms have been run. The columns should contain the DFBETA values and could perhaps be named DFBETA_. The ordering of the rows should correspond to the ordering of the rows of the input data frame used to estimate the model, so that the DFBETA values can be combined back with the input data.
The text was updated successfully, but these errors were encountered: