Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permutations #721

Merged
merged 3 commits into from
Aug 20, 2020
Merged

Permutations #721

merged 3 commits into from
Aug 20, 2020

Conversation

trevismd
Copy link
Contributor

@trevismd trevismd commented Aug 20, 2020

Code of Conduct

Description

Permutation tests (mlxtend.evaluate.permutation.py) should return as p-value the proportion of permutations for which the calculated statistic is at least as extreme as the one observed in the experiment. (See in particular reference [3] of Wikipedia p-value page).
The current code only takes into account more extreme values.

In cases of approximation running a series of permutations but not all, it is also common to include the actual experiment as one additional permutation, avoiding the possible p-value of 0 and also closer to the exact p-value when the number of rounds is close to the number of possible combinations, as explained in detail in this paper:

Phipson B, Smyth GK. Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn. Stat Appl Genet Mol Biol. 2010;9:Article39. doi:10.2202/1544-6115.1585

The function could raise a warning if the user wants to run a number of simulations that is close (> 50% ?) to the total number of possible combinations, so that they can switch to the exact method in that case. I have not implemented this here but I could if it is found useful.

Finally, I ran across a test failure in mlxtend.feature_extraction.test_principal_analysis.py where test_variance_explained_ratio expects a float to be exactly 1. where on my machine it was 1.0000000000000002.
Fixed with assert_almost_equal.

Related issues or pull requests

Fixes #718

Pull Request Checklist

  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
  • Ran PYTHONPATH='.' pytest ./mlxtend -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
  • Checked for style issues by running flake8 ./mlxtend

@coveralls
Copy link

coveralls commented Aug 20, 2020

Coverage Status

Coverage increased (+0.003%) to 90.68% when pulling 4bcc3e9 on DarthTrevis:permutations into ac50506 on rasbt:master.

@trevismd trevismd marked this pull request as ready for review August 20, 2020 13:43
@rasbt
Copy link
Owner

rasbt commented Aug 20, 2020

Thanks for the PR and the very detailed description. I really appreciate it!

I also like the np.isclose enhancement, I think this will improve the stability of the function. The only issue I can see is if for some reason a dataset consists of very small values. I.e., take a given dataset, scale all the values by 10^(-8), and then the p value will likely substantially differ from the p-value on the unscaled data. However, I assume this issue would be rare and most people will be aware of floating point issues.

The function could raise a warning if the user wants to run a number of simulations that is close (> 50% ?) to the total number of possible combinations, so that they can switch to the exact method in that case.

Great idea. I will add that to the issue list as a future enhancement.

@rasbt rasbt merged commit 8eb6954 into rasbt:master Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid permutation test ?
3 participants