- Two Sample T-Tests (for an association between a quantitative variable and a binary categorical variable)
- ANOVA and Tukey Tests (for an association between a quantitative variable and a non-binary categorical variable)
- Chi-Square Tests (for an association between two categorical variables)

Two-Sample T-Test
Suppose that a company is considering a new color-scheme for their website. They think that visitors will spend more time on the site if it is brightly colored. To test this theory, the company shows the old and new versions of the website to 50 site visitors, each — and finds that, on average, visitors spent 2 minutes longer on the new version compared to the old. Will this be true of future visitors as well? Or could this have happened by random chance among the 100 people in this sample?

One way of testing this is with a 2-sample t-test. The null hypothesis for this test is that average length of a visit does not differ based on the color of the website. In other words, if we could observe all site visitors in two alternate universes (one where they see each version of the site), the average visiting times in these universes would be equal.

We can use SciPy’s ttest_ind() function to perform a 2-sample t-test. It takes the values for each group as inputs and returns the t-statistic and a p-value.

`from scipy.stats import ttest_ind`

`tstat, pval = ttest_ind(times_version1, times_version2)`

The company randomly sampled 100 site visitors. They showed the old version of their website to half of their sample and the new version to the other half. The amount of time (in minutes) that each visitor spent on the website was recorded.



In [1]:
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import ttest_ind

In [2]:
data = pd.read_csv('../Datasets/version_time.csv')
data.head()

Unnamed: 0,time_minutes,version
0,11.92,new
1,12.9,old
2,13.76,old
3,15.68,old
4,16.28,old


In [3]:
new = data[data['version'] == 'new']['time_minutes']
old = data[data['version'] == 'old']['time_minutes']

In [4]:
new

0     11.92
6     16.93
7     17.20
13    19.52
14    19.65
19    20.91
22    21.18
23    21.38
25    21.64
26    21.70
31    22.06
32    22.22
34    22.31
37    23.13
40    23.49
41    23.54
43    23.63
46    23.92
52    25.00
56    25.51
57    25.73
58    25.95
60    26.38
62    27.23
63    27.25
64    27.26
65    27.69
66    27.75
69    28.39
70    28.69
72    28.92
75    29.57
76    29.72
77    29.88
78    29.90
81    30.73
82    30.76
83    31.00
84    31.57
85    31.74
86    32.04
87    32.10
88    32.34
90    33.17
91    33.46
94    33.94
96    35.35
97    35.37
98    36.19
99    37.29
Name: time_minutes, dtype: float64