Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression influence diagnostics #7044

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 5 comments
Closed

Regression influence diagnostics #7044

exalate-issue-sync bot opened this issue May 11, 2023 · 5 comments
Assignees

Comments

@exalate-issue-sync
Copy link

Regression influence diagnostics

The SAS documentation provides a conceptual overview and formulas for the diagnostics.  The one we need in particular is the DFBETA statistic.  This statistic measures how much the coefficient for a variable changes when an observation is deleted.

Proc Logistic: [https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect042.htm|https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect042.htm|smart-link]

Proc Reg:
[https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_reg_sect040.htm|https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_reg_sect040.htm|smart-link]

The minimal requirement is to implement the DFBETA statistic for a binary target (proc Logistic) model.  The DFBETA statistic conceptually can also be calculated for a continuous target (proc Reg).

The desired output is a data frame that has the same number of rows as the data frame used to estimate the model.  It should have columns that correspond to the variables included in the final model after any selection algorithms have been run.  The columns should contain the DFBETA values and could perhaps be named DFBETA_.  The ordering of the rows should correspond to the ordering of the rows of the input data frame used to estimate the model, so that the DFBETA values can be combined back with the input data.

@exalate-issue-sync
Copy link
Author

Arun Aryasomayajula commented: [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] and team to discuss and assign an engineer

@exalate-issue-sync
Copy link
Author

Wendy Wong commented: I have written a write-up describing my implementation:

[^Regression Influence Diagnostics.pdf]

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Details

Jira Issue: PUBDEV-8638
Assignee: Wendy Wong
Reporter: Arun Aryasomayajula
State: Resolved
Fix Version: 3.40.0.1
Attachments: Available (Count: 2)
Development PRs: Available

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

Attachments From Jira

Attachment Name: Regression Influence Diagnostics.pdf
Attached By: Wendy Wong
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-8638/Regression Influence Diagnostics.pdf

Attachment Name: Regression influence diagnostics feature request.docx
Attached By: Arun Aryasomayajula
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-8638/Regression influence diagnostics feature request.docx

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

Linked PRs from JIRA

#6468

@h2o-ops h2o-ops closed this as completed May 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants