In [8]:
from datascience import *
import numpy as np
from math import *
from scipy import stats
import matplotlib.pyplot as plt
%matplotlib inline

## Lesson 23: Hypothesis Testing, Continued

Recall in Lesson 22, we covered hypothesis testing. The structure of a hypothesis test is largely similar regardless of the context of the problem. We state the hypotheses, decide on a test statistic, calculate the $p$-value and reach a conclusion. To calculate a $p$-value, we need to find the distribution of the test statistic under the null hypothesis. 

### Example 1: The Lady Tasting Tea

The "lady tasting tea" problem is a now famous story during which, at a gathering one summer afternoon in Cambridge, some friends drank tea with milk. Among them, a woman claimed to be able to tell, based on taste, whether the milk or the tea was added first to the cup. A now famous statistician, Ronald Fisher, was at the gathering, and he studied the claim. The woman was offered 8 cups of tea mixed with milk (4 with milk added first and 4 with tea added first) and she successfully identified 6 (3 of each). What can we say about her ability to discriminate the teas? 

Step 1:  
(Hypothesis)    
null: H0: she is guessing   
alternate: Ha:  she knows which was added first  

Step 2:  
(Test Statistic)  
x = # of cups of tea "lady tasting tea" correctly identifies out of 8  
note: x should be a multiple of 2 because once she identifies 4 then the other 4 identifications are set by default
      (not really making 8 decisions, only making four because based on four decisions others are set by default)

Step 3:  
(p-value)  
need to find P(she gets 6 or 8 correct out of 8 given that she is just guessing ... = p-value)    
6/8 represents our data, 8/8 represents the only option that is more extreme than our data  
^ add those together for the p-value  
note: p-value is the probability of getting our results or more extreme  

In [3]:
# creates an array of four instances where milk was poured first ('MF') and four where tea was poured first ('TF')
my_data = np.repeat(['MF','TF'],[4,4])
my_data

array(['MF', 'MF', 'MF', 'MF', 'TF', 'TF', 'TF', 'TF'], dtype='<U2')

In [4]:
# random choices without replacement
np.random.choice(my_data,8,replace=False)

array(['MF', 'MF', 'TF', 'TF', 'MF', 'TF', 'MF', 'TF'], dtype='<U2')

In [5]:
np.random.seed(1016)
num_sim = 100000
results = []
for _ in range(num_sim):
    results = np.append(results,2*np.count_nonzero(np.random.choice(my_data,8,replace=False)[:4]=='MF'))

In [6]:
np.count_nonzero(results>=6)/num_sim

0.24381

In [15]:
# to do it as a hypergeometric
# change variable to the # correct out of four
my_rv=stats.hypergeom(8,4,4)
1-my_rv.cdf(2) # want the probability of 3 or 4 so this gives the probability to the right of 2

0.24285714285714288

note: could also approach this as hypogeometric problem  

Step 4:
(Conclude)  
Fail to reject the null. Based on our data there is not enough evidence to prove she knows what she is doing. (Based on alpha of 0.05 ... P<0.05)

### Example 2: iris dataset

The `iris` dataset is common in introductory statistics. It shows various characteristics of three different species of irises. Let's determine whether the virginica species has a larger mean sepal width than that of versicolor. 

In [10]:
iris=Table().read_table("iris.csv")
iris.group(4,np.mean)

species,sepal_length mean,sepal_width mean,petal_length mean,petal_width mean
setosa,5.006,3.418,1.464,0.244
versicolor,5.936,2.77,4.26,1.326
virginica,6.588,2.974,5.552,2.026


Step 1:  
(Hypothesis)  
null:  H0: virginica and versicolor have the same mean sepal width  
alternate:  Ha: virginica has a larger mean sepal width that versicolor  

Step 2:  
(Test Statistic)  
x = absolute value of mean virginica sepal width - mean versicolor sepal width  
note: if null is true x should be close to 0

Step 3: 
(p-value)  

In [11]:
iris_sub=iris.select(4,1).where(0,are.not_containing('setosa'))
obs=np.diff(iris_sub.group(0,np.mean).column(1))[0]
obs

0.20399999999999974

Step 4:  
(Conclude)  
p-value is high (use alpha of .05) therefore we do not have sufficient evidence to reject the null so we fail to reject the null.