# Non-parametric Tests

In an ideal world, we could use the tests we have described above as well as others under similar assumptions, assuming normality and smooth sailing. In fact, because of how data is generally distributed in nature and the CLT, we can often get away with this assumption even when the distribution of our data is not exactly normal. There are also various methods of ensuring that your data follows approximately a normal distribution or trying to transform your data to be normally distributed. But every now and then, we may observe cases where the underlying distribution diverges too much from normality, and may not follow other useful distributions. Hence, we must touch upon the concept of __non-parametric tests__.

__Parametric tests__ are statistical tests that make some kind of assumption about the underlying distribution of the data, the most common one being of normality. When these assumptions are no longer valid, we turn to __non-paramtric tests__, also referred to as __distribution-free tests.__ Rather than testing about the means of our data, non-parametric tests test about the median of our data.

Let us consider the non-parametric test most commonly used instead of the one-sample t-test: the __Wilcoxon signed rank test.__ We will use this non-parametric test on the data we used for the one-sample t-test: the Amazon offset delivery time data.

### 1. Formulate a hypothesis
As we are no longer assuming the data has a normal underlying distribution, we will now test about the median of the data:

$\mathbf{H_{0}}$: The median offset time is 30 minutes

$\mathbf{H_{a}}$: The median offset time is not 30 minutes

### 2. Find the appropriate statistical test
As mentioned, since we are no longer assuming our data follows a normal distribution, we will be using a two-tailed Wilcoxon-signed rank test.

### 3. Choose a significance level
As before, we will use a significance level of $\alpha=0.1$.

### 4. Collect data and compute test statistic
We will read the data from our file once again.

In [2]:
import pandas as pd
import numpy as np

# Reading data from file
offset_data = pd.read_csv("amazon_data.csv")

We can now use this data, assuming our median to be $\mu_{0}=30$ minutes, to determine whether the median is truly 30 minutes. The test statistic for a Wilcoxon signed rank test is the $\mathbf{s_{+}}$ statistic, which can be calculated as follows:
1. Find the difference between every data point from the assumed median
2. Find the absolute value of each of those differences and rank them in ascending order
3. Compute $s_{+}$ as the sum of the rank of the positive differences

These steps are shown below for our data:

In [3]:
# Wilcoxon Signed Rank Test
median = 30
data = offset_data["0"] # ascending order
diffs = data - median

# Tracking sign of each difference
signs = dict()
absolute_diffs = np.array([])
for diff in diffs:
    absolute_diffs = np.append(absolute_diffs, np.abs(diff))
    if diff < 0:
        signs[np.abs(diff)] = '-'
    else:
        signs[np.abs(diff)] = '+'

# Sorting absolute differences in ascending order
sorted_absolute_diffs = np.sort(absolute_diffs)

# Computing s+ statistic
s_plus = 0
for idx, diff in enumerate(sorted_absolute_diffs):
    if signs[diff] == '+':
        s_plus += idx+1

# Creating structured dataframe
data_dic = {"data": data}
column_titles = ["differences", "absolute differences", "signs", "ranks"]
ordered_signs = [signs[sorted_absolute_diff] for sorted_absolute_diff in sorted_absolute_diffs]
data_columns = [diffs, sorted_absolute_diffs, ordered_signs, range(1, 18)]
processed_data = pd.DataFrame(data_dic)
for column_title, column_data in zip(column_titles, data_columns):
    processed_data[column_title] = column_data
print("s_plus:", s_plus)
print(processed_data)

s_plus: 76
         data  differences  absolute differences signs  ranks
0   57.925694    27.925694              0.232974     +      1
1   40.574152    10.574152              0.970240     +      2
2   30.232974     0.232974              1.696030     +      3
3   20.853756    -9.146244              1.737354     -      4
4   21.355134    -8.644866              4.707712     -      5
5   14.037427   -15.962573              8.644866     -      6
6   31.696030     1.696030              9.054106     -      7
7    5.784466   -24.215534              9.146244     -      8
8   65.960599    35.960599             10.574152     +      9
9   50.994944    20.994944             15.962573     -     10
10  30.970240     0.970240             17.649881     -     11
11   9.928767   -20.071233             20.071233     -     12
12  25.292288    -4.707712             20.994944     +     13
13  12.350119   -17.649881             24.215534     -     14
14  28.262646    -1.737354             24.300015     +     

We can also confirm our results with external Python libraries.

In [4]:
from scipy import stats

# Wilcoxon Signed Rank Test via scipy.stats
s_plus, p = stats.wilcoxon(diffs, alternative="two-sided")
print("s_plus:", s_plus)

s_plus: 76.0


### 5. Determine the p-value (probability)/critical value
Given the statistic computed and the known parameters we can use the Wilcoxon signed rank table to determine our critical value for a two-tailed test with $n=17$, which is $c=41$.

### 6. Reject/accept null hypothesis
From how we set up, the maximum possible statistic you can get is the sum of all possible ranks:
$$s_{+,max} = 1 + 2 + 3 + ... + n = \frac{n}{2}(n-1)$$

Therefore, for a two-tail test, the rejection region is given by $s_{+}\leq 41$ and $s_{+}\geq 95$. Given that our statistic is $s_{+} = 76$, we cannot reject and thus must accept the null hypothesis.

### 7. Make a decision
Just as observed before with the mean of our data for the one-sample t-test, the median our data does not show any significant difference from 30 minutes. Therefore, even by using a non-parametric test, we arrived at the same conclusion.

We have just shown that we can arrive by using a non-parametric test, we can test the same things as for a parametric test. A logical question to ask is: "if non-parametric tests can always be applied regardless of the underlying distributions, why don't we always use them instead of parametric tests?". The first reason is that __parametric tests have much more statistical power than non-parametric tests.__ This means that you are more likely to determine a significant effect if there is one by carrying out parametric tests.

The second reason is that even though non-parametric tests do not make any assumptions about the underlying distribution, they do make other assumptions about the data at hand. For instance, when comparing groups, non-parametric tests may not provide valid results if the dispersion of the data for each group is not similar, which is not the case for parametric tests as we have seen for the one-way ANOVA.

Some more comparable non-parametric tests are shown below:

| Parametric test of means | Non-parametric test of medians       |
|--------------------------|--------------------------------------|
| one-sample t-test        | one-sample Wilcoxon signed rank test |
| two-sample t-test        | Mann-Whitney test                    |
| One-way ANOVA            | Kruskal Wallis, Mood's median test   |

There are more parametric and non-parametric tests out there that we have not covered today that may be applied to different to contexts, as well as other applications of the same tests covered today. However, all of them will work under the same framework and logic learned through this section.