[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PennNGG/Quantitative-Neuroscience/blob/master/Hypothesis%20Testing/Python/Simple%20Non%2dParametric%20Tests.ipynb)

# Definitions

When comparing the central tendencies (e.g., means) of two samples, if you know how the samples are distributed and/or *n* is large enough in each sample so that you can assume that their means are normally distributed via the Central Limit Theorem, then it is reasonable to use parametric hypothesis tests like a [*t*-test](https://colab.research.google.com/drive/1M7xjaMwJUEyULPHfXc3tWG6-WVjCl-uQ?usp=sharing). Otherwise, nonparametric tests should be used. 

There is an increasing understanding by neuroscientists and others that parametric tests, while they tend to be simple and convenient, often have assumptions that are not well justified and thus should be used only when appropriate. See, for example:

[Running the Numbers](https://www.nature.com/articles/nn0205-123) from *Nature Neuroscience*, which discusses using nonparametric approaches when appropriate.

[An evaluation of nonparametric approaches in clinical trials](https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-5-35), which covers several of the tests described below.

Here we will not dive deep into the math but instead will provide a more practical guide for when to use three common forms of nonparametric one- and two-sample tests.


# Getting started with code

Matlab code is found in the [NGG Statistics GitHub Repository](https://github.com/PennNGG/Statistics.git) under "Hypothesis Testing/SimpleNonParametricTests.m".

Python code is included below. First run the code cell just below to make sure all of the required Python modules are loaded, then you can run the other cell(s).

In [3]:
pip install statsmodels

Collecting statsmodels
  Downloading statsmodels-0.14.5-cp39-cp39-macosx_10_9_x86_64.whl (10.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m37.5 MB/s[0m eta [36m0:00:00[0m [36m0:00:01[0m
Collecting patsy>=0.5.6
  Downloading patsy-1.0.1-py2.py3-none-any.whl (232 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.9/232.9 KB[0m [31m29.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: patsy, statsmodels
Successfully installed patsy-1.0.1 statsmodels-0.14.5
You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.9/bin/python3.9 -m pip install --upgrade pip' command.[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


In [4]:
import numpy as np
import scipy.stats as st
from statsmodels.stats.descriptivestats import sign_test

# Sign test: one sample, skewed distribution

This test is applied to a single sample but is typically used for paired measurements to test the hypothesis that there is no systematic direction of a treatment effect (positive=the treatment resulted in a bigger value, negative=the treatment resulted in a smaller value), regardless of magnitude. The Null hypothesis is an equal probability of an effect either direction, so the data are treated as a [binomial distribution](https://colab.research.google.com/drive/1q1KaEjkAzUKRFSLPQ0SFdqU_byc70Oi2?usp=sharing) with *p*=0.5

In [5]:
# Make some paired data
a = [3,10,4,20,4,7,50,3,5,5,7]
b = [5,9,10,15,6,5,43,6,2,1,0]
diff = [bi-ai for ai, bi in zip(a,b)]
_, p = sign_test(diff)
print(f'p={p:.2f}')

p=0.55


# Wilcoxon signed-rank test: one sample or paired samples, symmetric distribution(s)

This test is typically used as a substitute for a [one-sample *t*-test](https://colab.research.google.com/drive/1M7xjaMwJUEyULPHfXc3tWG6-WVjCl-uQ?usp=sharing) and can be used to test the (null) hypothesis that a sample came from a population with a particular median value, or that the median values of two paired samples are equal to each other (i.e., the median of the difference distribution is zero)


In [6]:
samples = np.random.randint(0, high=51, size=200)
null_hypothesis_median = 24

# Unlike in Matlab, the scipy implementation does not handle the case of comparing
#  to a median other than zero, so we make this a (fake) paired two-sample test
#  by subtracting the median from each value
_, p = st.wilcoxon(samples-null_hypothesis_median)
print(f'p = {p:.2f}')


p = 0.20


# Mann-Whitney: unpaired, two sample


This is the test that is typically used as a substitute for a [two-sample *t*-test](https://colab.research.google.com/drive/1M7xjaMwJUEyULPHfXc3tWG6-WVjCl-uQ?usp=sharing) and can be used to test the hypothesis that the two unpaired samples come from distributions that differ by a particular median value.



In [7]:
X = np.random.randint(0, high=51, size=200)
Y = 2 + np.random.randint(0, high=51, size=200)
_, p = st.mannwhitneyu(X,Y)
print(f'p = {p:.2f}')

p = 0.25


# Exercises

### Scenario 1 (Question):

You are a behavioral geologist measuring the reaction time of rocks in response to a tone. Specifically, you want to compare the median reaction time of geodes, to that of limestone. You recruit 20 rocks per group, and run your reaction-time experiment. What test would you use to compare median reaction times between geodes and limestone, and why?

## Scenario 1 (Answer):

Use a two-sample Wilcoxon rank-sum (Mann–Whitney U) test. Why: you’re comparing a continuous outcome between two independent groups and care about central tendency without assuming normality (reaction times are often skewed). The rank-sum test compares distributions’ locations; if the groups have similar shapes/spreads, it’s effectively a test of median differences. (If you want a test that targets the median specifically regardless of shape—but with less power—use Mood’s median test.)

### Scenario 2 (Question):

You are a brilliant scientist working at a biotech firm developing a vaccine that reverses aging. Wow! To test the efficacy of the vaccine, you recruit 50 people, give them a course of your vaccine, and measure their age with a (very) special scale before and after treatment. You want to start by refuting the simple that that the participants' measured ages are not changed by the treatment. What test do you use and why?

## Scenario 2 (Answer)

Use a paired t-test on the pre–post differences. Why: the same 50 participants are measured twice (within-subject design). The paired t-test directly tests whether the mean change (post − pre) differs from zero, leveraging each person as their own control and increasing power.
If the difference scores are clearly non-normal or have heavy outliers, use the nonparametric Wilcoxon signed-rank test to test whether the median change ≠ 0.

### Scenario 3 (Question)

You are a neuroeconomist and believe you have developed a wearable device that can change consumer preferences about a given product. To test your device, you present product X to a group of 40 individuals, and ask them to fill out a survery assessing how much they like the product (larger score means they like it more). Then, you have the individuals wear the device, present product X, and assess how much they like of the product. You want to know if the device reliably increases, decreases, or does not affect their liking of product X. What test would you use and why? What result would indicate that their liking has increased? 

## Scenario 3 (Answer)

Use a paired t-test on the within-person change scores (After − Before). Why: the same 40 people are rated twice, so observations are paired; the paired t-test asks whether the mean change differs from 0. (If the change scores are very non-normal/outlier-ridden, use the Wilcoxon signed-rank test for the median change instead.) How to read the result: Define Δᵢ = Afterᵢ − Beforeᵢ. Test H₀: μΔ = 0 (two-sided). If p < α and Δ ˉ > 0 Δ ˉ >0 (or the 95% CI for μΔ is entirely > 0), liking increased. If p < α and Δ ˉ < 0 Δ ˉ <0 (CI entirely < 0), liking decreased. If p ≥ α (CI includes 0), no reliable change. (If your sole question is “did it increase?”, a one-sided paired t-test with Hₐ: μΔ > 0 is appropriate.)

# Additional Resources


# Credits

Copyright 2021 by Joshua I. Gold, University of Pennsylvania