# Choosing the right test

In [2]:
import pandas as pd
from scipy.stats import ttest_ind
from scipy.stats import f_oneway
from scipy.stats import chi2_contingency

## Petrochemicals
In a chemistry research lab, scientists are examining the efficiency of three well-known catalysts—Palladium (Pd), Platinum (Pt), and Nickel (Ni)—in facilitating a particular reaction. Each catalyst is used in a set of identical reactions under controlled conditions, and the time taken for each reaction to reach completion is meticulously recorded. Your goal is to *compare the mean reaction times across the three catalyst groups* to identify which catalyst, if any, has a significantly different reaction time.

The key phrase in the text is *"compare the mean reaction across the three catalysts"*, which indicates that ANOVA is the appropriate statistical test, as it is designed to compare the means of more than two groups.

In [3]:
chemical_data = pd.read_csv('../data/chemical_reactions.csv')
chemical_data.head(3)

Unnamed: 0,Catalyst,Reaction_Time
0,Palladium,47.483571
1,Palladium,44.308678
2,Palladium,48.238443


In [4]:
catalyst_types = ['Palladium', 'Platinum', 'Nickel']

# Collect reaction times for each catalyst into a list
groups = [chemical_data[chemical_data['Catalyst'] == catalyst]['Reaction_Time'] for catalyst in catalyst_types]

# Perform the one-way ANOVA across the three groups
f_stat, p_val = f_oneway(*groups)
print(p_val)

4.710677600047866e-151


#### Conclusion:
Assuming a significance level of 0.01, the P-value is substantially smaller than the alpha value, indicating a significant difference in reaction times across the catalysts.

## Human resources
In human resources, it's essential to understand the relationships between different variables that might influence employee satisfaction or turnover. Consider a scenario where an HR department is interested in understanding the association between the department in which employees work and their participation in a new workplace wellness program. The HR team has compiled this data over the past two years and has asked you if there's any significant association between an employee's department and their enrolling in the wellness program.

Because the analysis concerns the association between two categorical variables, a *Chi-square test of association* is appropriate.

In [5]:
wellness_data = pd.read_csv('../data/hr_wellness.csv')
wellness_data.head(3)

Unnamed: 0,Department,Wellness_Program_Status
0,Marketing,Enrolled
1,Sales,Enrolled
2,Marketing,Not Enrolled


In [6]:
# Create a contingency table
contingency_table = pd.crosstab(
  wellness_data['Department'], 
  wellness_data['Wellness_Program_Status']
)

# Perform the chi-square test of association
chi2_stat, p_val, dof, expected = chi2_contingency(contingency_table)
print(p_val)


0.17573344450112738


#### Conclusion:
Assume a significance level of 0.05. Given the P-value, there's no significant association between department and enrollment in the wellness program, as the P-value is larger than 0.05.

## Finance
In the realm of finance, investment strategists are continually evaluating different approaches to maximize returns. Consider a scenario where a financial firm wishes to assess the effectiveness of two investment strategies: "Quantitative Analysis" and "Fundamental Analysis". The firm has applied each strategy to a separate set of investment portfolios for a year and now asks you to compare the annual returns to determine if there is any difference in strategy returns by comparing the mean returns of the two groups.

Because the comparison involves the means of two independent groups, an independent samples t-test is appropriate.

In [10]:
investment_data = pd.read_csv('../data/investment_returns.csv')
print(investment_data.head(3),'\n')
print(investment_data['Strategy_Type'].unique())

  Strategy_Type  Annual_Return
0  Quantitative      10.597379
1  Quantitative       1.656248
2  Quantitative       9.202100 

['Quantitative' 'Fundamental']


In [12]:
# Separate the annual returns by strategy type
quantitative_returns = investment_data[investment_data['Strategy_Type'] == 'Quantitative']['Annual_Return']
fundamental_returns = investment_data[investment_data['Strategy_Type'] == 'Fundamental']['Annual_Return']

print('quantitative mean returns: ', quantitative_returns.mean())
print('fundamental mean returns: ', fundamental_returns.mean(),'\n')

# Perform the independent samples t-test between the two groups
t_stat, p_val = ttest_ind(quantitative_returns, fundamental_returns)
print('p-value: ', p_val)

quantitative mean returns:  8.351169678745862
fundamental mean returns:  5.706740072154287 

p-value:  2.0567003424807146e-14


#### Conclusion:
Assume a significance level of 0.1. The P-value is much smaller than alpha, suggesting a significant difference in returns between the two strategies.