-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF/ENH vif variance inflation factor and feature selection #4582
base: main
Are you sure you want to change the base?
Conversation
…m.add_constant process for proper results.
---------- | ||
http://en.wikipedia.org/wiki/Variance_inflation_factor | ||
|
||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" instead of ''' please
data : DataFrame, (rows: observed values, columns: multivariate variables) | ||
design dataframe with all explanatory variables, as for example used in | ||
regression | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No extra line
------- | ||
Filtered_data : DataFrame | ||
A subset of the input DataFame | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no extra line
dropped = pd.DataFrame(columns=['var', 'vif']) | ||
|
||
# Startswith 'drop = True'(Assume that some variables will be dropped) | ||
dropCondition = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming conventions: dropCondition --> drop_condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto vifDict --> vif_dict, etc
while dropCondition: | ||
|
||
# 1. Calculate a VIF | ||
vifDict = {col: vif(data.loc[:, col], data.loc[:, data.columns != col]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any scenarios where data.columns
is not unique? Where None
or NaN
is among the columns?
Mostly style comments. Are there unit tests in the works? |
Codecov Report
@@ Coverage Diff @@
## master #4582 +/- ##
==========================================
+ Coverage 80.26% 80.26% +<.01%
==========================================
Files 564 565 +1
Lines 85592 85637 +45
Branches 9679 9689 +10
==========================================
+ Hits 68702 68740 +38
Misses 14662 14662
- Partials 2228 2235 +7
Continue to review full report at Codecov.
|
This pull request introduces 1 alert when merging 447582b into 3d87728 - view on lgtm.com new alerts:
Comment posted by lgtm.com |
This pull request introduces 1 alert when merging 9298d88 into 3d87728 - view on lgtm.com new alerts:
Comment posted by lgtm.com |
This pull request introduces 1 alert when merging 6bdfeb7 into 3d87728 - view on lgtm.com new alerts:
Comment posted by lgtm.com |
This pull request introduces 1 alert when merging efc98bf into 3d87728 - view on lgtm.com new alerts:
Comment posted by lgtm.com |
I'm confused by the result of |
AFAICS, you can ignore codecov/patch in this case. Code coverage went up according to both other code coverage measures. there might be some details to check e.g. is pandas.as_matrix creating an numpy array, and not a numpy matrix? |
I try to start reviewing this soon(ish). But I need to get back into what the plans for this are. e.g. I'm adding fit_collinear to the models which needs to have access to helper functions for dropping collinear columns, currently I'm using a stand-in for dropping in exog sequence using QR |
as_matrix method returns numpy.array, not matrix. |
I'm moving this to 0.11 see also #2380 another one of my 2015 PRs that didn't get finished/reviewed and merged |
Hello @pydemia! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-07-27 06:10:49 UTC |
@bashtage I've almost forgotten this request for a long time. |
Thanks. I'll double check and close of it is. |
I have fixed
variance_inflation_factor
function andfeature_selection_vif
function.In
variance_inflation_factor
function,add
add_constant
argument to choose explicitly.In
feature_selection_vif
function,All
print
functions were deleted as you mentioned before.