# Post-hoc Analysis following ANOVA

We will analyze the chemical_reactions dataset, which contains the categorical variable Catalyst with three possible values. Using these three groups, we will perform a one-way ANOVA test to determine whether there are statistically significant differences among the mean values of each group.

In [1]:
import pandas as pd
from scipy.stats import f_oneway

In [6]:
chemical_data = pd.read_csv('../data/chemical_reactions.csv')
chemical_data.head(3)

Unnamed: 0,Catalyst,Reaction_Time
0,Palladium,47.483571
1,Palladium,44.308678
2,Palladium,48.238443


In [7]:
pivot_table = chemical_data.pivot_table(
    values='Reaction_Time',
    index='Catalyst',
    aggfunc='mean'
)
pivot_table

Unnamed: 0_level_0,Reaction_Time
Catalyst,Unnamed: 1_level_1
Nickel,50.782139
Palladium,45.105022
Platinum,39.716107


In [8]:
# Create groups to prepare the data for ANOVA
catalyst_types = ['Nickel', 'Palladium', 'Platinum']
groups = [chemical_data[chemical_data['Catalyst'] == catalyst]['Reaction_Time'] for catalyst in catalyst_types]

# Conduct ANOVA
f_stat, p_val = f_oneway(*groups)
print(p_val)

4.710677600047866e-151


Assuming an alpha of 0.05, the P-value indicates significant differences in catalyst.

## Applying Tukey's HSD
Following the ANOVA analysis which suggested significant differences in the three types of catalyst. This is where Tukey's Honest Significant Difference (HSD) test comes into play. It's a post-hoc test used to make pairwise comparisons between group means after an ANOVA has shown a significant difference. Tukey's HSD test helps in identifying specific pairs of groups that have significant differences in their means.

In [10]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Perform Tukey's HSD test
tukey_results = pairwise_tukeyhsd(
    chemical_data['Reaction_Time'], 
    chemical_data['Catalyst'], 
    alpha=0.05
)

print(tukey_results)

    Multiple Comparison of Means - Tukey HSD, FWER=0.05    
  group1    group2  meandiff p-adj  lower    upper   reject
-----------------------------------------------------------
   Nickel Palladium  -5.6771   0.0  -6.5165  -4.8377   True
   Nickel  Platinum  -11.066   0.0 -11.9054 -10.2267   True
Palladium  Platinum  -5.3889   0.0  -6.2283  -4.5495   True
-----------------------------------------------------------


## Applying Bonferoni correction
After identifying significant differences between catalyst groups with Tukey's HSD, we want to confirm our findings with the Bonferroni correction. The Bonferroni correction is a conservative statistical adjustment used to counteract the problem of multiple comparisons. It reduces the chance of obtaining false-positive results by adjusting the significance level. Applying the Bonferroni correction will help ensure that the significant differences you observe between therapy groups are not due to chance.

In [13]:
from scipy.stats import ttest_ind
from statsmodels.sandbox.stats.multicomp import multipletests

p_values = []

catalyst_pairs = [('Palladium', 'Platinum'), ('Palladium', 'Nickel'), ('Platinum', 'Nickel')]

# Conduct t-tests and collect P-values
for pair in catalyst_pairs:
    group1 = chemical_data[chemical_data['Catalyst'] == pair[0]]['Reaction_Time']
    group2 = chemical_data[chemical_data['Catalyst'] == pair[1]]['Reaction_Time']
    t_stat, p_val = ttest_ind(group1, group2)
    p_values.append(p_val)

# Apply Bonferroni correction
reject, pvals_corr, alpha_corr, corrected_alpha = multipletests(p_values, alpha=0.05, method='bonferroni')

print(f'Reject the H0 {reject}')
print(f'p-vales: {pvals_corr}')

Reject the H0 [ True  True  True]
p-vales: [3.45766580e-044 2.19849625e-050 1.21027069e-132]


#### Conclusion:
We reject the null hypothesis and conclude that the mean values differ significantly across all groups.
This step is critical to control for Type I error, ensuring the reliability of your findings.