Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF/ENH vif variance inflation factor and feature selection #4582

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

pydemia
Copy link

@pydemia pydemia commented May 2, 2018

I have fixed variance_inflation_factor function and feature_selection_vif function.

In variance_inflation_factor function,
add add_constant argument to choose explicitly.

In feature_selection_vif function,
All print functions were deleted as you mentioned before.

@coveralls
Copy link

coveralls commented May 2, 2018

Coverage Status

Coverage increased (+0.008%) to 82.877% when pulling efc98bf on pydemia:pydemia into 3d87728 on statsmodels:master.

----------
http://en.wikipedia.org/wiki/Variance_inflation_factor

'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

""" instead of ''' please

data : DataFrame, (rows: observed values, columns: multivariate variables)
design dataframe with all explanatory variables, as for example used in
regression

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No extra line

-------
Filtered_data : DataFrame
A subset of the input DataFame

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no extra line

dropped = pd.DataFrame(columns=['var', 'vif'])

# Startswith 'drop = True'(Assume that some variables will be dropped)
dropCondition = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming conventions: dropCondition --> drop_condition

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto vifDict --> vif_dict, etc

while dropCondition:

# 1. Calculate a VIF
vifDict = {col: vif(data.loc[:, col], data.loc[:, data.columns != col])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any scenarios where data.columns is not unique? Where None or NaN is among the columns?

@jbrockmendel
Copy link
Contributor

Mostly style comments. Are there unit tests in the works?

@codecov-io
Copy link

codecov-io commented May 3, 2018

Codecov Report

Merging #4582 into master will increase coverage by <.01%.
The diff coverage is 72.34%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4582      +/-   ##
==========================================
+ Coverage   80.26%   80.26%   +<.01%     
==========================================
  Files         564      565       +1     
  Lines       85592    85637      +45     
  Branches     9679     9689      +10     
==========================================
+ Hits        68702    68740      +38     
  Misses      14662    14662              
- Partials     2228     2235       +7
Impacted Files Coverage Δ
statsmodels/stats/tests/test_vif.py 100% <100%> (ø)
statsmodels/stats/outliers_influence.py 78.7% <60%> (-1.11%) ⬇️
statsmodels/stats/multivariate_tools.py 25.27% <65.38%> (+25.27%) ⬆️
statsmodels/stats/descriptivestats.py 24.13% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d87728...efc98bf. Read the comment docs.

@josef-pkt
Copy link
Member

This pull request introduces 1 alert when merging 447582b into 3d87728 - view on lgtm.com

new alerts:

  • 1 for Unreachable code

Comment posted by lgtm.com

@josef-pkt
Copy link
Member

This pull request introduces 1 alert when merging 9298d88 into 3d87728 - view on lgtm.com

new alerts:

  • 1 for Unreachable code

Comment posted by lgtm.com

@josef-pkt
Copy link
Member

This pull request introduces 1 alert when merging 6bdfeb7 into 3d87728 - view on lgtm.com

new alerts:

  • 1 for Variable defined multiple times

Comment posted by lgtm.com

@josef-pkt
Copy link
Member

This pull request introduces 1 alert when merging efc98bf into 3d87728 - view on lgtm.com

new alerts:

  • 1 for Unreachable code

Comment posted by lgtm.com

@pydemia
Copy link
Author

pydemia commented May 4, 2018

I'm confused by the result of codecov/patch. What can I do?

@josef-pkt
Copy link
Member

AFAICS, you can ignore codecov/patch in this case. Code coverage went up according to both other code coverage measures.

there might be some details to check e.g. is pandas.as_matrix creating an numpy array, and not a numpy matrix?
https://coveralls.io/builds/16829769/source?filename=statsmodels%2Fstats%2Foutliers_influence.py#L187

@josef-pkt
Copy link
Member

I try to start reviewing this soon(ish). But I need to get back into what the plans for this are.

e.g. I'm adding fit_collinear to the models which needs to have access to helper functions for dropping collinear columns, currently I'm using a stand-in for dropping in exog sequence using QR
https://github.com/statsmodels/statsmodels/pull/4576/files#diff-b165b4bd4e10edd0dfb485e47c562b2cR645

@pydemia
Copy link
Author

pydemia commented May 5, 2018

as_matrix method returns numpy.array, not matrix.
It looks weird, I agree.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.as_matrix.html

@josef-pkt josef-pkt changed the title Pydemia REF/ENH vif variance inflation factor and feature selection Dec 31, 2018
@josef-pkt josef-pkt added this to the 0.11 milestone Dec 31, 2018
@josef-pkt
Copy link
Member

I'm moving this to 0.11
It will take some work to evaluate and think about the design.

see also #2380 another one of my 2015 PRs that didn't get finished/reviewed and merged

@josef-pkt josef-pkt modified the milestones: 0.11, josef Oct 5, 2019
@pydemia pydemia closed this Jul 27, 2021
@pydemia pydemia reopened this Jul 27, 2021
@pep8speaks
Copy link

pep8speaks commented Jul 27, 2021

Hello @pydemia! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 321:1: W293 blank line contains whitespace
Line 354:1: W391 blank line at end of file

Line 188:1: W293 blank line contains whitespace

Line 107:1: W391 blank line at end of file

Comment last updated at 2021-07-27 06:10:49 UTC

@pydemia pydemia closed this Jul 27, 2021
@bashtage bashtage reopened this Jul 27, 2021
@pydemia
Copy link
Author

pydemia commented Jul 27, 2021

@bashtage I've almost forgotten this request for a long time.
It looks merged in different branch already.

@bashtage
Copy link
Member

Thanks. I'll double check and close of it is.

@josef-pkt josef-pkt modified the milestones: josef, 0.14 Sep 7, 2021
@josef-pkt josef-pkt modified the milestones: 0.14, 0.15 Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants