In [2]:
# | hidden: true
# | echo: false
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# some personal style settings to make the plots look nice
# and save some space in the notebook
plt.style.use("../style.mplstyle")

## Permutation Tests

There is one more frequently applicable sampling-based method that is super helpful in statistical inference: permutation (or randomization) tests.

Permutation tests, like the bootstrap, are non-parametric -- meaning they do not rely on assumptions about the underlying distribution of the data. Like the bootstrap, permutation tests rely on resampling -- only instead of resampling with replacement, we resample without replacement.

Say we have two samples, $X$ and $Y$, and we want to test if they have different underlying distributions (e.g. different means). Then, our null hypothesis ($H_0$) is that the two samples are drawn from the same distribution.

So, we can combine the two samples into one larger sample, $Z = X \cup Y$, and then randomly split $Z$ into two new samples, $X'$ and $Y'$, of the same size as the original samples. We can then compute the test statistic (e.g. the difference in means) for the new samples and repeat this process many times to build a distribution of test statistics under the null hypothesis.

The only assumption we need to make is that the two samples are **exchangeable** under the null hypothesis. This means that, if the null hypothesis is true, the two samples could be shuffled without changing the underlying distribution.

`np.random.permutation` is a useful function for this. It randomly permutes the elements of an array, which we can use to create our new samples. Let's define a function to perform a permutation test for 

In [None]:
#| code-fold: show
def permutation_test(test_func, x, y, num_permutations=10000, rng=None):
    # Compute the observed test statistic
    observed_stat = test_func(x, y)

    # Combine the two samples
    combined = np.concatenate([x, y])
    count = 0

    if rng is None:
        rng = np.random.default_rng()
    for _ in range(num_permutations):
        # Permute the combined array
        permuted = rng.permutation(combined)

        # Split the permuted array into two new samples
        x_perm = permuted[:len(x)]
        y_perm = permuted[len(x):]

        # Compute the test statistic for the permuted samples
        permuted_stat = test_func(x_perm, y_perm)

        # Compare the permuted statistic to the observed statistic
        if permuted_stat >= observed_stat:
            count += 1

    # Compute the p-value
    p_value = count / num_permutations
    return p_value


We can apply this to our NBA data to test if SGA and Giannis have different scoring rates. This is quite similar to the bootstrap hypothesis test we did in the previous lecture, but instead of resampling with replacement, we will resample without replacement.

Both versions work, but the permutation test tends to be more powerful. 

:::{.callout-note title="Statistical Power"}
Power
: The probability of correctly rejecting the null hypothesis when it is false. 

A more "powerful" test is more likely to detect a true effect. The power of a test is often written as $1 - \beta$, where $\beta$ is the probability of a Type II error (failing to reject the null hypothesis when it is false).

We usually want our tests to have high power, so we can detect true effects when they exist. We try to balance this against the risk of **falsely** rejecting the null hypothesis (Type I error) when it is actually true.
:::


In [10]:
#| code-fold: show

### Data import and preparation ###
sga_df = pd.read_csv("../data/sga-stats-24-25.csv")
giannis_df = pd.read_csv("../data/giannis-stats-24-25.csv")
# combine the dataframes and clean up the data
sga_df["player"] = "Shai Gilgeous-Alexander"
giannis_df["player"] = "Giannis Antetokounmpo"
compare_df = pd.concat([sga_df, giannis_df], ignore_index=True)
# filter out rows where the player did not play or was inactive
compare_df = compare_df.replace(
    {"Did Not Dress": np.nan, "Inactive": np.nan, "Did Not Play": np.nan, "": np.nan}
)
compare_df.dropna(subset=["PTS"], inplace=True)
# convert PTS to float/numeric and Date to datetime
compare_df["PTS"] = compare_df["PTS"].astype(float)
compare_df["Date"] = pd.to_datetime(compare_df["Date"])

rng = np.random.default_rng(42) 
# run the permutation test
p_value = permutation_test(
    lambda x, y: np.mean(x) - np.mean(y), # lambda functions can be used inline without naming the function
    compare_df[compare_df["player"] == "Shai Gilgeous-Alexander"]["PTS"],
    compare_df[compare_df["player"] == "Giannis Antetokounmpo"]["PTS"],
    rng=rng
)
print(
    f"P-value for the hypothesis that SGA scores more points than Giannis: {p_value:.4f}"
)


P-value for the hypothesis that SGA scores more points than Giannis: 0.0336


## There is really only one test!

The more statistics you learn and the more you are exposed to work in quantitative fields, the more you will see a wide variety of complicated statistical techniques and methods. 

Ultimately they all represent the same process:
1. Compute a test statistic on the observed data.
2. Choose a null hypothesis / model. Either specify the null distribution explicitly or use a simulation-based method to generate a distribution of test statistics under the null hypothesis.
3. Compute a p-value by comparing the observed test statistic to the distribution of test statistics under the null hypothesis. 

Most of the literature in classical statistics focuses on mathematically deriving analytical solutions for 2 and 3. They 

Check out [this blog post](https://allendowney.blogspot.com/2016/06/there-is-still-only-one-test.html) by Allen Downey for more on this idea and explanation of the advantages of simulation-based methods for hypothesis testing.