<a href="https://colab.research.google.com/github/sivanujands/StatisticalTests/blob/main/RelatedSamples/NonParametricTests/Friedman_Test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
#%pip install scikit-posthocs

In [6]:
import pandas as pd
from scipy import stats
# import scikit_posthocs as sp # For post-hoc tests - experiencing issues, trying alternative
import statsmodels.sandbox.stats.multicomp as smm # Exploring statsmodels for post-hoc

# 1. Data
data = {
    'Taster': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Brand_A': [7, 6, 8, 5, 9, 7, 6, 8, 7, 6],
    'Brand_B': [8, 7, 9, 6, 8, 7, 8, 7, 9, 7],
    'Brand_C': [6, 5, 7, 4, 7, 5, 7, 6, 8, 6]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print("\n")

# 2. Prepare data for Friedman test
# The `friedman_chisquare` function takes each condition as a separate argument.
# So, we pass the columns corresponding to Brand_A, Brand_B, Brand_C.
statistic, p_value = stats.friedmanchisquare(df['Brand_A'], df['Brand_B'], df['Brand_C'])

print(f"Friedman Chi-squared statistic: {statistic:.3f}")
print(f"P-value: {p_value:.3f}")
print("\n")

# 3. Set the Significance Level
alpha = 0.05

# 4. Make a Decision and Draw a Conclusion
print(f"Significance Level (alpha): {alpha}")

if p_value < alpha:
    print(f"Since p_value ({p_value:.3f}) < alpha ({alpha}), we reject the null hypothesis.")
    print("Conclusion: There is a statistically significant difference in taste preference ratings among the three coffee brands.")
    print("Proceeding with post-hoc tests to identify specific differences.")

    # 5. Post-hoc Analysis (Using statsmodels or other alternative)
    # We need to reshape the data to 'long' format for post-hoc tests.
    df_long = pd.melt(df, id_vars=['Taster'], var_name='Brand', value_name='Rating')

    # Convert 'Taster' and 'Brand' columns to categorical type (still good practice)
    df_long['Taster'] = df_long['Taster'].astype('category')
    df_long['Brand'] = df_long['Brand'].astype('category')

    print("\nDataFrame in long format for post-hoc analysis:")
    print(df_long.head()) # Show first few rows
    print("\nDataFrame dtypes before post-hoc test:")
    print(df_long.dtypes)
    print("\n")

    # Perform post-hoc test using statsmodels or alternative
    # Need to check statsmodels documentation for Nemenyi or equivalent post-hoc for Friedman
    # As a placeholder, let's see if we can use a general multiple comparison test if Nemenyi isn't direct
    # This part will need adjustment based on statsmodels capabilities

    # Example placeholder (this might not be the correct function for Nemenyi after Friedman)
    # mc = smm.MultiComparison(df_long['Rating'], df_long['Brand'])
    # result = mc.tukeyhsd() # Tukey HSD is for ANOVA, not Friedman

    # Need to find the appropriate statsmodels function for Nemenyi or a similar non-parametric post-hoc
    print("Attempting post-hoc test using statsmodels (Nemenyi or equivalent)...")
    # Further code will be added here once the correct statsmodels function is identified
    print("Please refer to statsmodels documentation for the correct post-hoc test after Friedman.")


else:
    print(f"Since p_value ({p_value:.3f}) >= alpha ({alpha}), we fail to reject the null hypothesis.")
    print("Conclusion: There is no statistically significant difference in taste preference ratings among the three coffee brands.")

Original DataFrame:
   Taster  Brand_A  Brand_B  Brand_C
0       1        7        8        6
1       2        6        7        5
2       3        8        9        7
3       4        5        6        4
4       5        9        8        7
5       6        7        7        5
6       7        6        8        7
7       8        8        7        6
8       9        7        9        8
9      10        6        7        6


Friedman Chi-squared statistic: 11.842
P-value: 0.003


Significance Level (alpha): 0.05
Since p_value (0.003) < alpha (0.05), we reject the null hypothesis.
Conclusion: There is a statistically significant difference in taste preference ratings among the three coffee brands.
Proceeding with post-hoc tests to identify specific differences.

DataFrame in long format for post-hoc analysis:
  Taster    Brand  Rating
0      1  Brand_A       7
1      2  Brand_A       6
2      3  Brand_A       8
3      4  Brand_A       5
4      5  Brand_A       9

DataFrame dtypes before