# Two-Sample T-Tests

Use when there is an association between a quantitative variable and a binary categorical variable.

In [2]:
from scipy.stats import ttest_ind
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('08_a_b_test_time_spent_on_website.csv')

# Quantitative variable (time spent on website) and a binary categorical variable (old color schedume or a new color scheme)

old = data.time_minutes[data.version=='old']
new = data.time_minutes[data.version=='new']

# By default, ttest_ind() runs a two-sided test.

tstat, pval = ttest_ind(old, new)
pval


0.0020408264429903995

Assuming a significant threshold of 0.05, there is a significant difference between the average amount of time.

# ANOVA and Tukey Tests

Use when there is an association between a quantitative variable and a non-binary (i.e. more than 2 categories) categorical variable. We could run multiple 2-stample t-tests (A vs B, B vs C, and A vs C, etc), but easier to use ANOVA (Analysis of Variance).

Can use `from scipy.stats import f_oneway` to perform an ANOVA.

ANOVA tells you the p-value, and if it's below our significance threshold, we conclude that at least one pair of our groups differ significantly, but it doesn't tell us which one. We use Tukey to tell us which one.

Can use `from statsmodels.stats.multicomp import pairwise_tukeyhsd` for Tukey calculation.

# Chi-Square Tests

Use when there is an association between two categorical variables.

If we want to understand whether the outcomes of two categorical variables are associated, we can use a Chi-Square test. It is useful in situations like:

* An A/B test where half of users were shown a green submit button and the other half were shown a purple submit button. Was one group more likely to click the submit button?
* People under and over age 40 were given a survey asking “Which of the following three products is your favorite?” Did these age groups have significantly different preferences?

In SciPy, we can use the function `chi2_contingency()` to perform a Chi-Square test.

For `chi2_contingency()`, we need a cross tab. Use this helpful method: `table = pd.crosstab(variable_1, variable_2)`

# Review

<img src="testing_for_an_association-review.png" height="700" />