# Data transformations and non-Parametric Statistical Tests

Learning Goals:
1. Understand the concept and applications of non-parametric tests.
2. Learn about common non-parametric tests and their parametric counterparts.
3. Explore the advantages and limitations of non-parametric tests.
4. Utilize Python code examples to perform non-parametric tests.

## Introduction
Up to this point, we have been testing whether the assumptions of our particular test has been met by our data. However, it is not guaranteed that our assumptions will be met in all cases. If we find that the assumptions have been violated, we have a few options.

1. Ignore violations of assumptions: This usually only works when we are comparing means and the **normality assumption** is violated. If our *n* is high, we can ignore slight non-normality. We can’t really do this for differences in variance, especially if our samples are small or our uneven.

2. Transform your data: We can use different transformations on the data and then retest the assumptions. **Important:** While analyses should be done using transformed data, data should be presented and reported untransformed, *and* all data used in a particular study needs to be transformed using the same technique. Examples:
   
- Natural log transformation: ln(y + 1)
- Log 10 transformation: log10(y + 1)
- Square-root transformation: sqrt(y + 0.5)
- Arcsine square root transformation (for proportions): arcsin(sqrt(p))
- Reciprocal transformation: 1 / y



3. Use nonparametric tests: non-parametric tests are statistical tests that do not rely on assumptions about the underlying distribution of the data. They are used when data do not meet the assumptions of parametric tests or when the research question involves categorical or ranked data. Nonparametric tests tend to have a **lower power** than parametric tests.

### Applications of Non-Parametric Tests
Non-parametric tests are commonly used in various scenarios, including:
- Comparing medians or distributions across groups.
- Analyzing ranked or ordinal data.
- Testing for independence or association between variables.

## Common Non-Parametric Tests

### Mann-Whitney U Test (Replacement for two sample t-test)

The Mann-Whitney U test compares medians between two independent groups.

In [4]:
from scipy.stats import mannwhitneyu

# Example data (replace with your own data)
group1 = [6, 7, 8, 9, 10]
group2 = [3, 5, 6, 7, 9]

# Perform Mann-Whitney U test
statistic, p_value = mannwhitneyu(group1, group2)
print("Mann-Whitney U Test Results:")
print("Test Statistic:", round(statistic, 2))
print("p-value:", round(p_value, 2))

Mann-Whitney U Test Results:
Test Statistic: 19.5
p-value: 0.17


### Wilcoxon Signed-Rank Test (Replacement for Paired t-test)

The Wilcoxon Signed-Rank test compares medians between paired or matched samples.

In [5]:
from scipy.stats import wilcoxon

# Example data
before = [5, 7, 6, 4, 9]
after = [6, 8, 7, 5, 11]

# Perform Wilcoxon Signed-Rank test
statistic, p_value = wilcoxon(before, after)
print("Wilcoxon Signed-Rank Test Results:")
print("Test Statistic:", round(statistic, 2))
print("p-value:", round(p_value, 2))

Wilcoxon Signed-Rank Test Results:
Test Statistic: 0.0
p-value: 0.06


### Kruskal-Wallis H Test (Replacement for One-Way ANOVA)

The Kruskal-Wallis H test compares medians across two or more independent groups.

In [6]:
from scipy.stats import kruskal

# Example data (replace with your own data)
group1 = [5, 6, 7, 8, 9]
group2 = [3, 5, 6, 7, 9]
group3 = [2, 4, 6, 5, 7]

# Perform Kruskal-Wallis H test
statistic, p_value = kruskal(group1, group2, group3)
print("Kruskal-Wallis H Test Results:")
print("Test Statistic:", round(statistic, 2))
print("p-value:", round(p_value, 2))

Kruskal-Wallis H Test Results:
Test Statistic: 2.83
p-value: 0.24


## Advantages and Limitations of Non-Parametric Tests

Advantages
- Non-reliance on distributional assumptions.
- Robustness against outliers.

Limitations
- Less statistical power compared to parametric tests.
- Restricted to certain types of research questions.

## End of chapter question

Using the Palmer Penguin data set, compare the weights of male penguins of all three species using the non-parametric alternative to the ANOVA.