Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: New feature: Paired permutation test #768

Merged
merged 8 commits into from
Mar 3, 2021

Conversation

fjhheras
Copy link
Contributor

@fjhheras fjhheras commented Jan 19, 2021

Code of Conduct

Description

I would like to add the option to perform permutation tests when the two populations are paired.
http://axon.cs.byu.edu/Dan/478/assignments/permutation_test.php

The current pull request is just a first proposal. If you think this feature is useful enough to be implemented in mlxtend, I can improve documentation/style and introduce non-trivial tests (currently it is not well tested, so it can still be buggy).

Related issues or pull requests

Fixes #767

Pull Request Checklist

  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
  • [PARTIAL] Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
  • Ran PYTHONPATH='.' pytest ./mlxtend -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
  • Checked for style issues by running flake8 ./mlxtend

@pep8speaks
Copy link

pep8speaks commented Jan 19, 2021

Hello @fjhheras! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-03-03 01:33:55 UTC

@fjhheras fjhheras mentioned this pull request Jan 19, 2021
@coveralls
Copy link

coveralls commented Jan 19, 2021

Coverage Status

Coverage decreased (-0.04%) to 90.734% when pulling f51e028 on fjhheras:paired_permutation into 894dce8 on rasbt:master.

@rasbt
Copy link
Owner

rasbt commented Mar 1, 2021

Sorry for getting back to it so late. I had a weirdly configured email client and haven't been getting Github notifications last month.

Anyways, thanks for the PR, I really appreciate it! I think there were some issues with the methodology though. Coincidentally, I was teaching a stats course last semester (unfortunately, as per syllabus, I had to use R :P) where I implemented a paired permutation (randomization) test. As far as I know, the standard variant is just flipping the pairs. Below is my R reference implementation that matches the textbook results for a small toy dataset:

pairedSamples<-function(diff,nSim=1000000){
  d <- diff
  
  ld <- length(d)
  obsmeand <- mean(d)
  meandvec <- rep(NA,nSim)
  for (i in 1:nSim){
    rsigns <- 2 * rbinom(ld, 1, .5) - 1
    newd <- rsigns * d
    newmeand <- mean(newd)
    meandvec[i] <- newmeand
  }
  pl <- sum(meandvec >= obsmeand)
  pg <- sum(meandvec <= obsmeand)
  pval <-  ifelse(pl < pg, 2 * pl/nSim, 2 * pg/nSim)
  options(scipen=1)
  if (pval==0){
    pval <- paste("p <",1/nSim)
  } else {
    pval <- paste("p =",pval) 
  }
  options(scipen=0)
  
  hhh <- hist(meandvec, plot=FALSE)
  hist(meandvec, xlab="Mean of differences", 
       main=paste("Paired Sample Randomization Test,", pval),
       probability=TRUE, xlim=range(c(hhh$breaks, obsmeand)),
       breaks=100, col="tan")
  abline(v=obsmeand, col="red")
  mtext(paste(round(obsmeand, digits=4)), side=1, at=obsmeand)
  }

nineteen80 = c(3.67, 1.72, 3.46, 2.60, 2.03, 2.10, 3.01)
nineteen90 = c(2.11, 1.79, 2.71, 1.89, 1.69, 1.71, 2.01)

Unknown

For some reason, using the PR's code I couldn't reproduce these results. I think the problem were the arrays of different lengths. Tried to debug that but couldn't really figure out to fix it so I rewrote it using a more "naive" implementation (probably too many for-loops). However, it seems to work and reproduce the expected results now (added some unit tests).

Please let me know your feedback!

@fjhheras
Copy link
Contributor Author

fjhheras commented Mar 1, 2021

Well, that was embarrassing. My local version works fine, but somehow I forgot to add a np.where when doing the PR. My original implementation in the PR seems to output the right values after this change

indices_1 = np.where(rng.randn(n) > 0)[0]

instead of

indices_1 = rng.randn(n) > 0

In any case, the amended implementation you propose is much easier to follow. I checked times, and it is also a bit faster!

@rasbt
Copy link
Owner

rasbt commented Mar 3, 2021

No worries, and thanks for double-checking! Wow, I am positively surprised that this is actually not that slow. So, maybe we should keep the current version then ;). I add the docs.

@rasbt rasbt merged commit d87638c into rasbt:master Mar 3, 2021
@Sandy4321
Copy link

can you share full python code example pls
how to use this new option

@rasbt
Copy link
Owner

rasbt commented Sep 3, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Paired Permutation Test
5 participants