# Non-Parametric Test: Mann-Whitney U Test

The Mann-Whitney U Test is a non-parametric statistical test used to compare differences between two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed. It assesses whether one group tends to have higher values than the other group or vice versa, without making assumptions about the shape of the distribution in the underlying populations.

#### Mann-Whitney U Test Formula and Approach

The Mann-Whitney U statistic is calculated as follows:

$$U = R_1 - \frac{n_1(n_1 + 1)}{2}$$

where:

- $U$ is the Mann-Whitney U statistic.
- $R_1$ is the sum of ranks in the first group.
- $n_1$ is the number of observations in the first group.

The test essentially compares the ranks of the observations in the two groups. The null hypothesis ($H_0$) states that the distributions of the two groups are equal.

#### Steps

1. **Rank all observations:** Combine the data from both groups, rank them from smallest to largest, regardless of the group.
2. **Calculate U statistic:** Use the formula to calculate the U statistic for each group.
3. **Determine Significance:** Compare the calculated U statistic to critical values from the U distribution table or use a p-value obtained from statistical software to determine if the difference between groups is statistically significant.

#### Assumptions

- **Independence:** Observations between the two groups are independent.
- **Ordinal or Continuous Data:** The dependent variable should be ordinal or continuous.
- **Shape of Distribution:** No assumption is made about the form of the distribution, making it suitable for non-normally distributed data.

#### Scenario Description: PayPal Transaction Satisfaction by Region

PayPal aims to explore transaction satisfaction across two regions, North America and Europe, using a dataset of 10,000 transactions. This analysis will help understand regional differences in user satisfaction, contributing to enhanced user experience and operational efficiency.

#### Business Problem

The goal is to statistically determine if there's a significant difference in transaction satisfaction between users in North America and Europe, based on user feedback scores.

#### Python Code to Simulate Data and to Perform Non-Parametric Test: Mann-Whitney U Test


In [3]:
import numpy as np
import pandas as pd
from scipy.stats import mannwhitneyu

# Simulate the dataset
np.random.seed(42)
data = {
    'transaction_id': range(10000),
    'region': np.random.choice(['North America', 'Europe'], 10000, p=[0.5, 0.5]),
    'user_feedback_score': np.concatenate([np.random.randint(1, 11, 5000), np.random.randint(1, 11, 5000)])
}

df = pd.DataFrame(data)

# Filter data for North America and Europe
scores_na = df[df['region'] == 'North America']['user_feedback_score']
scores_eu = df[df['region'] == 'Europe']['user_feedback_score']

# Apply Mann-Whitney U Test
stat, p_value = mannwhitneyu(scores_na, scores_eu)

# Interpretation
if p_value < 0.05:
    conclusion = 'significant difference'
else:
    conclusion = 'no significant difference'

stat, p_value, conclusion
print(f"{stat:.4f}, {p_value:.4f}, Conclusion: no significant difference")

12544699.5000, 0.7404, Conclusion: no significant difference


#### Interpretation of Results

The Mann-Whitney U Test applied to the PayPal transaction satisfaction data between North America and Europe yielded the following results:

- **U Statistic:** 12544699.5  
- **P-Value:** 0.7404  
- **Conclusion:** No significant difference  

The p-value of 0.7404, being much greater than the conventional significance level of 0.05, indicates that there is no significant difference in transaction satisfaction between users in North America and Europe. This result suggests that the regional variability observed in user feedback scores does not statistically substantiate differing levels of satisfaction between these two regions.

For PayPal, this finding implies that, from a statistical standpoint, user satisfaction with transactions is consistent across North America and Europe. This uniformity in satisfaction levels suggests that any observed differences in feedback scores are likely due to random variation rather than systemic differences in service delivery or user experience between these regions.

Strategic Implications for PayPal:
- **Targeted Interventions Not Required by Region:** Since the transaction satisfaction does not significantly differ between the two regions, PayPal might not need to prioritize or customize service improvements specifically for North America or Europe based on the current analysis.

- **Focus on Universal Improvements:** PayPal can consider focusing on service enhancements that benefit all users, regardless of their region, to further increase overall satisfaction.

- **Continuous Monitoring:** Despite the current lack of significant regional differences, PayPal should continue to monitor user feedback and satisfaction scores regularly. This ensures that any emerging trends or shifts in user satisfaction can be addressed promptly.

This analysis highlights the value of applying non-parametric statistical tests like the Mann-Whitney U Test to make informed decisions based on empirical data, guiding strategic actions in a data-driven manner.

# Non-Parametric Test: Wilcoxon Signed-Rank Test

The Wilcoxon Signed-Rank Test is a non-parametric statistical test used to compare two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ. It's an alternative to the paired Student's T-test when the data cannot be assumed to be normally distributed.

#### Mathematical Formula and Approach

The Wilcoxon Signed-Rank Test involves ranking the absolute differences between pairs of observations, without considering their signs. Each difference is then assigned a rank, and ranks are summed according to the sign of the differences. The test statistic is the smaller of the two sums, often denoted as $W$.

The test statistic $W$ used in the Wilcoxon Signed-Rank Test is defined as the smaller of the two sums of ranks, i.e.,

$$
W = \min(W^+, W^-)
$$

where:

- $W^+$ is the sum of ranks for positive differences ($d_i$),
- $W^-$ is the sum of ranks for negative differences ($d_i$).

For large sample sizes, the distribution of $W$ can be approximated as normal, and a z-score can be calculated as:

$$
z = \frac{W - \mu_W}{\sigma_W}
$$

where $\mu_W$ and $\sigma_W$ are the mean and standard deviation of $W$ under the null hypothesis, calculated as:

$$
\mu_W = \frac{n(n + 1)}{4}
$$

$$
\sigma_W = \sqrt{\frac{n(n + 1)(2n + 1)}{24}}
$$

where $n$ is the number of non-zero differences. The p-value is then derived from the z-score, comparing it against the standard normal distribution to determine the probability of observing such a $W$ (or more extreme) under the null hypothesis.

#### Steps for Conducting the Wilcoxon Signed-Rank Test

1. **Calculate Differences:** For each pair of observations, calculate the difference.
2. **Rank the Absolute Differences:** Ignore the signs and rank the absolute differences from smallest to largest.
3. **Assign Signs to Ranks:** Assign the sign (+ or -) from the original difference to each rank.
4. **Sum Positive and Negative Ranks:** Calculate the sum of positive ranks ($W^+$) and the sum of negative ranks ($W^-$).
5. **Test Statistic:** The test statistic ($W$) is the smaller of $W^+$ and $W^-$.
6. **Determine Significance:** Use tables or statistical software to determine if the observed $W$ is significantly small. If so, the null hypothesis of no difference is rejected.

#### Assumptions

- The data are paired and come from the same population.
- Each pair is chosen randomly and independently.
- The data are measured on at least an ordinal scale, but need not be normally distributed.

#### Business Scenario: PayPal Transaction Satisfaction Improvement

#### Scenario Description

Following an update to their transaction processing system, PayPal wishes to evaluate the effectiveness of this update in improving transaction satisfaction across regions. The company has pre-update and post-update satisfaction scores for a sample of transactions.

#### Business Problem

PayPal needs to ascertain whether the update led to a significant improvement in transaction satisfaction, assessed by comparing pre-update and post-update user feedback scores.

#### Generating Relevant Data

In [5]:
import numpy as np
from scipy.stats import wilcoxon

# Simulate pre-update and post-update satisfaction scores
np.random.seed(42)
pre_update_scores = np.random.randint(1, 11, size=1000)  # Pre-update scores
post_update_scores = pre_update_scores + np.random.randint(-2, 3, size=1000)  # Post-update scores, with slight random improvements

# Apply the Wilcoxon Signed-Rank Test
stat, p_value = wilcoxon(pre_update_scores, post_update_scores)
print(f"{stat:.4f}, {p_value:.4f}")

162314.5000, 0.7197


#### Interpretation of Wilcoxon Signed-Rank Test Results

The results of the Wilcoxon Signed-Rank Test, with a test statistic of 162314.5000 and a p-value of 0.7197, suggest the following interpretation:

Given the p-value of 0.7197, which is significantly higher than the conventional alpha level of 0.05, we fail to reject the null hypothesis. This indicates that there is no statistically significant difference in the median user satisfaction scores before and after the PayPal transaction processing system update within the regions analyzed.

In practical terms, this result implies that the system update did not lead to a statistically significant improvement in user satisfaction scores. For PayPal, this finding suggests that while the update may have aimed to enhance transaction efficiency or security, it did not perceptibly alter the overall satisfaction levels among users, at least not to a degree detectable by this test.

PayPal may consider exploring additional factors that could influence user satisfaction or investigating other aspects of the transaction process that the update may have impacted. Furthermore, the company might also look into more targeted feedback or user experience studies to identify specific areas for improvement that could directly affect satisfaction scores.


# Non-Parametric Test: Kruskal-Wallis Test

The Kruskal-Wallis Test is a non-parametric statistical test used to compare the medians of three or more independent groups to determine if at least one of the groups differs from the others. It's an extension of the Mann-Whitney U Test for more than two groups and is used when the assumptions of ANOVA are not met, particularly the assumption of normality.

#### Mathematical Formula and Approach

The Kruskal-Wallis test statistic $H$ is calculated as follows:

$$H = \frac{12}{N(N+1)} \sum_{i=1}^{g} \frac{R_i^2}{n_i} - 3(N+1)$$

where:

- $N$ is the total number of observations across all groups.
- $g$ is the number of groups.
- $R_i$ is the sum of ranks in the $i^{th}$ group.
- $n_i$ is the number of observations in the $i^{th}$ group.

The null hypothesis ($H_0$) is that the medians of all groups are equal.

#### Steps for Conducting the Kruskal-Wallis Test

1. **Rank All Observations:** Combine all group observations and rank them together from smallest to largest.
2. **Calculate Group Rank Sums:** Calculate the sum of ranks for each group ($R_i$).
3. **Compute the Test Statistic:** Use the formula to calculate $H$.
4. **Determine Significance:** Compare the calculated $H$ to the chi-square distribution with $g-1$ degrees of freedom. A significant result indicates that at least one group's median significantly differs from the others.

### Assumptions

- Observations are independent within and across groups.
- The dependent variable should be measured at least at the ordinal level.
- The groups are sampled from populations with the same shape of distribution.

#### Business Scenario: PayPal Transaction Satisfaction by Region

#### Scenario Description

PayPal seeks to compare transaction satisfaction across multiple regions (e.g., North America, Europe, Asia) following updates to their payment system, using a dataset of 10,000 transactions.

#### Business Problem

Identify if there are significant differences in transaction satisfaction across different regions, guiding regional service improvements.

#### Generate Relevant Data

In [6]:
import numpy as np
import pandas as pd
from scipy.stats import kruskal

# Simulate transaction satisfaction scores for three regions
np.random.seed(42)
data = {
    'region': np.random.choice(['North America', 'Europe', 'Asia'], 10000, p=[0.3, 0.4, 0.3]),
    'user_feedback_score': np.random.randint(1, 11, 10000)
}

df = pd.DataFrame(data)

# Separate scores by region
scores_na = df[df['region'] == 'North America']['user_feedback_score']
scores_eu = df[df['region'] == 'Europe']['user_feedback_score']
scores_asia = df[df['region'] == 'Asia']['user_feedback_score']

# Apply Kruskal-Wallis Test
stat, p_value = kruskal(scores_na, scores_eu, scores_asia)
print(f"{stat:.4f}, {p_value:.4f}")

1.1961, 0.5499


#### Interpretation of Kruskal-Wallis Test Results

Given the Kruskal-Wallis test statistic of 1.1961 and a p-value of 0.5499, the interpretation is as follows:

The p-value is greater than the conventional alpha level of 0.05, indicating that we fail to reject the null hypothesis of the Kruskal-Wallis test. This suggests that there is no statistically significant difference in the median transaction satisfaction scores across the regions analyzed (North America, Europe, Asia).

#### Conclusion

For PayPal, this finding implies that regional factors might not significantly impact transaction satisfaction among users. This uniformity in satisfaction across regions suggests that PayPal's services are perceived similarly by users in different geographical areas, indicating consistency in service quality and user experience. PayPal can leverage this insight to maintain and further enhance their global service standards, ensuring a uniformly positive experience for users worldwide.

# Non-Parametric Test: Friedman Test

The Friedman Test is a non-parametric statistical test used to detect differences in treatments across multiple test attempts. It is essentially the non-parametric alternative to the one-way ANOVA with repeated measures, ideal for situations where the response variable does not meet the assumption of normality. The test assesses whether the rankings of different treatments are statistically different.

#### Mathematical Formula and Approach

The Friedman test statistic ($\chi^2_F$) is calculated using the formula:

$$
\chi^2_F = \frac{12N}{k(k+1)}\left[\sum_{j=1}^{k}R_j^2 - \frac{k(k+1)^2}{4}\right]
$$

where:
- $N$ is the number of subjects,
- $k$ is the number of treatments,
- $R_j$ is the sum of ranks for the $j^{th}$ treatment.

The null hypothesis ($H_0$) is that there are no differences among the treatments.

#### Steps

1. **Rank the Data:** Within each block (subject), rank the treatments from 1 to $k$ (lowest to highest).
2. **Sum Ranks for Each Treatment:** Calculate the sum of ranks for each treatment across all blocks.
3. **Compute the Friedman Test Statistic:** Use the formula provided.
4. **Determine Significance:** Compare the calculated $\chi^2_F$ statistic to the critical value from the chi-square distribution with $k-1$ degrees of freedom.

#### Assumptions

- The observations are paired across treatments due to the repeated measures design.
- The dependent variable should be ordinal or continuous.
- Observations within each pair (block) are ranked based solely on the treatment effect, implying that treatments are comparable within blocks.

#### Business Scenario: PayPal Transaction Satisfaction Improvement

#### Scenario Description

PayPal is interested in evaluating the effectiveness of three sequential updates (treatments) to their transaction processing system across a sample of users to identify which update yields the highest satisfaction.

#### Business Problem

Determine if there is a statistically significant difference in transaction satisfaction across the three updates to guide future improvements.

#### Generate Relevant Data and Analysis

In [9]:
import numpy as np
from scipy.stats import friedmanchisquare

np.random.seed(42)

# Simulate satisfaction scores for 100 users across three system updates
# Assume scores are on a scale of 1 to 10
n_users = 100
update1_scores = np.random.randint(1, 11, size=n_users)
update2_scores = np.random.randint(1, 11, size=n_users) + np.random.choice([-1, 0, 1], size=n_users)
update3_scores = np.random.randint(1, 11, size=n_users) + np.random.choice([-1, 0, 1], size=n_users)

# Conducting the Friedman Test
stat, p_value = friedmanchisquare(update1_scores, update2_scores, update3_scores)

print(f"{stat:.4f}, {p_value:.4f}")


2.3842, 0.3036


#### Interpretation of Friedman Test Results

Given the Friedman test statistic of 2.3842 and a p-value of 0.3036, the interpretation of the results as follows:

The p-value (0.3036) is greater than the conventional significance level of 0.05. This indicates that we fail to reject the null hypothesis of the Friedman test, suggesting that there are no statistically significant differences in transaction satisfaction scores across the three updates analyzed.

### Conclusion for PayPal Scenario

For PayPal, this result implies that the sequential updates to the transaction processing system did not lead to statistically significant differences in user satisfaction. This finding suggests that, from the perspective of user satisfaction, the updates were perceived similarly by the users. PayPal might consider this outcome as an indication that further improvements or different strategies may be necessary to significantly enhance user satisfaction. It also highlights the importance of exploring other factors or areas of the transaction process that could be optimized to improve the overall user experience.