# Non-Parametric Test: Mann-Whitney U Test

The Mann-Whitney U Test is a non-parametric statistical test used to compare differences between two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed. It assesses whether one group tends to have higher values than the other group or vice versa, without making assumptions about the shape of the distribution in the underlying populations.

#### Mann-Whitney U Test Formula and Approach

The Mann-Whitney U statistic is calculated as follows:

$$U = R_1 - \frac{n_1(n_1 + 1)}{2}$$

where:

- $U$ is the Mann-Whitney U statistic.
- $R_1$ is the sum of ranks in the first group.
- $n_1$ is the number of observations in the first group.

The test essentially compares the ranks of the observations in the two groups. The null hypothesis ($H_0$) states that the distributions of the two groups are equal.

#### Steps

1. **Rank all observations:** Combine the data from both groups, rank them from smallest to largest, regardless of the group.
2. **Calculate U statistic:** Use the formula to calculate the U statistic for each group.
3. **Determine Significance:** Compare the calculated U statistic to critical values from the U distribution table or use a p-value obtained from statistical software to determine if the difference between groups is statistically significant.

#### Assumptions

- **Independence:** Observations between the two groups are independent.
- **Ordinal or Continuous Data:** The dependent variable should be ordinal or continuous.
- **Shape of Distribution:** No assumption is made about the form of the distribution, making it suitable for non-normally distributed data.

#### Scenario Description: PayPal Transaction Satisfaction by Region

PayPal aims to explore transaction satisfaction across two regions, North America and Europe, using a dataset of 10,000 transactions. This analysis will help understand regional differences in user satisfaction, contributing to enhanced user experience and operational efficiency.

#### Business Problem

The goal is to statistically determine if there's a significant difference in transaction satisfaction between users in North America and Europe, based on user feedback scores.

#### Python Code to Simulate Data and to Perform Non-Parametric Test: Mann-Whitney U Test


In [3]:
import numpy as np
import pandas as pd
from scipy.stats import mannwhitneyu

# Simulate the dataset
np.random.seed(42)
data = {
    'transaction_id': range(10000),
    'region': np.random.choice(['North America', 'Europe'], 10000, p=[0.5, 0.5]),
    'user_feedback_score': np.concatenate([np.random.randint(1, 11, 5000), np.random.randint(1, 11, 5000)])
}

df = pd.DataFrame(data)

# Filter data for North America and Europe
scores_na = df[df['region'] == 'North America']['user_feedback_score']
scores_eu = df[df['region'] == 'Europe']['user_feedback_score']

# Apply Mann-Whitney U Test
stat, p_value = mannwhitneyu(scores_na, scores_eu)

# Interpretation
if p_value < 0.05:
    conclusion = 'significant difference'
else:
    conclusion = 'no significant difference'

stat, p_value, conclusion
print(f"{stat:.4f}, {p_value:.4f}, Conclusion: no significant difference")

12544699.5000, 0.7404, Conclusion: no significant difference


#### Interpretation of Results

The Mann-Whitney U Test applied to the PayPal transaction satisfaction data between North America and Europe yielded the following results:

- **U Statistic:** 12544699.5  
- **P-Value:** 0.7404  
- **Conclusion:** No significant difference  

The p-value of 0.7404, being much greater than the conventional significance level of 0.05, indicates that there is no significant difference in transaction satisfaction between users in North America and Europe. This result suggests that the regional variability observed in user feedback scores does not statistically substantiate differing levels of satisfaction between these two regions.

For PayPal, this finding implies that, from a statistical standpoint, user satisfaction with transactions is consistent across North America and Europe. This uniformity in satisfaction levels suggests that any observed differences in feedback scores are likely due to random variation rather than systemic differences in service delivery or user experience between these regions.

Strategic Implications for PayPal:
- **Targeted Interventions Not Required by Region:** Since the transaction satisfaction does not significantly differ between the two regions, PayPal might not need to prioritize or customize service improvements specifically for North America or Europe based on the current analysis.

- **Focus on Universal Improvements:** PayPal can consider focusing on service enhancements that benefit all users, regardless of their region, to further increase overall satisfaction.

- **Continuous Monitoring:** Despite the current lack of significant regional differences, PayPal should continue to monitor user feedback and satisfaction scores regularly. This ensures that any emerging trends or shifts in user satisfaction can be addressed promptly.

This analysis highlights the value of applying non-parametric statistical tests like the Mann-Whitney U Test to make informed decisions based on empirical data, guiding strategic actions in a data-driven manner.

In [4]:
!pip freeze >> requirements.txt