Permutations #721

trevismd · 2020-08-20T12:17:51Z

Code of Conduct

Description

Permutation tests (mlxtend.evaluate.permutation.py) should return as p-value the proportion of permutations for which the calculated statistic is at least as extreme as the one observed in the experiment. (See in particular reference [3] of Wikipedia p-value page).
The current code only takes into account more extreme values.

In cases of approximation running a series of permutations but not all, it is also common to include the actual experiment as one additional permutation, avoiding the possible p-value of 0 and also closer to the exact p-value when the number of rounds is close to the number of possible combinations, as explained in detail in this paper:

Phipson B, Smyth GK. Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn. Stat Appl Genet Mol Biol. 2010;9:Article39. doi:10.2202/1544-6115.1585

The function could raise a warning if the user wants to run a number of simulations that is close (> 50% ?) to the total number of possible combinations, so that they can switch to the exact method in that case. I have not implemented this here but I could if it is found useful.

Finally, I ran across a test failure in mlxtend.feature_extraction.test_principal_analysis.py where test_variance_explained_ratio expects a float to be exactly 1. where on my machine it was 1.0000000000000002.
Fixed with assert_almost_equal.

Related issues or pull requests

Fixes #718

Pull Request Checklist

Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
Ran PYTHONPATH='.' pytest ./mlxtend -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
Checked for style issues by running flake8 ./mlxtend

coveralls · 2020-08-20T12:28:12Z

Coverage increased (+0.003%) to 90.68% when pulling 4bcc3e9 on DarthTrevis:permutations into ac50506 on rasbt:master.

rasbt · 2020-08-20T14:48:53Z

Thanks for the PR and the very detailed description. I really appreciate it!

I also like the np.isclose enhancement, I think this will improve the stability of the function. The only issue I can see is if for some reason a dataset consists of very small values. I.e., take a given dataset, scale all the values by 10^(-8), and then the p value will likely substantially differ from the p-value on the unscaled data. However, I assume this issue would be rare and most people will be aware of floating point issues.

The function could raise a warning if the user wants to run a number of simulations that is close (> 50% ?) to the total number of possible combinations, so that they can switch to the exact method in that case.

Great idea. I will add that to the issue list as a future enhancement.

trevismd and others added 3 commits August 20, 2020 12:34

fix pvalue calculation

315b442

fix test_variance_explained_ratio

763b14d

Update documentation

4bcc3e9

trevismd marked this pull request as ready for review August 20, 2020 13:43

rasbt mentioned this pull request Aug 20, 2020

Add "exact" recommendation to permutation test #722

Open

rasbt merged commit 8eb6954 into rasbt:master Aug 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Permutations #721

Permutations #721

trevismd commented Aug 20, 2020 •

edited

coveralls commented Aug 20, 2020 •

edited

rasbt commented Aug 20, 2020

Permutations #721

Permutations #721

Conversation

trevismd commented Aug 20, 2020 • edited

Code of Conduct

Description

Related issues or pull requests

Pull Request Checklist

coveralls commented Aug 20, 2020 • edited

rasbt commented Aug 20, 2020

trevismd commented Aug 20, 2020 •

edited

coveralls commented Aug 20, 2020 •

edited