### Problem 1
To test whether a coin is a fair coin, we decide to flip this coin 50 times and count the numbers of Heads (H) and Tails (T) we observe. Let X be the number of Hs observed from 50 tosses and p be the actual probability of getting H from flipping this coin. If this is a fair coin, then p = .5. Suppose the significance level $\alpha$ is .05. Here are the hypotheses:

$H_0: p = .5$ (This is a fair coin)

$H_a: p \neq .5$ (This is a biased coin)


In [2]:
# 1-a: Compute the p-value if we have observed 40 Ts and draw your conclusion whether this is a fair coin. (2 points) 

from scipy.stats import binom

p = 2*(1-binom.cdf(39, 50, 0.5))
print(p,'\n')

if p < 0.05:
    print('Reject H0 and conclude that this is a biased coin because the p-value is less than significance level.')
else:
    print('Accept H0 and conclude that this is a fair coin because the p-value is >= significance level.')


2.38613316768e-05 

Reject H0 and conclude that this is a biased coin because the p-value is less than significance level.


In [3]:
# 1-b: Compute the probability of Type II error given p = 0.55 for n = 50, 75, and 100, respectively. (3 points)
# What impact does n have on probability of Type II error?

# Let c1, c2 and c3 be the critical value for n = 50, 75 and 100, respectively
# Prob(at least c heads or at least c tails) = P(X >= c or X <= n - c) = 0.05
# Thus, prob(X >= c) = 0.025 and prob(X <= c-1) = 0.975
c1 = 1 + binom.ppf(1-0.05/2, 50, 0.5)
c2 = 1 + binom.ppf(1-0.05/2, 75, 0.5)
c3 = 1 + binom.ppf(1-0.05/2, 100, 0.5)

# beta = prob(accept H0 given H0 is false) = prob(conclude fair coin given p is NOT 0.5)
# beta = prob(n-c+1 <= X <= c-1  | p = 0.55) 
beta1 = binom.cdf(c1-1, 50, 0.55) - binom.cdf(50-c1, 50, 0.55)
beta2 = binom.cdf(c2-1, 75, 0.55) - binom.cdf(75-c2, 75, 0.55)
beta3 = binom.cdf(c3-1, 100, 0.55) - binom.cdf(100-c3, 100, 0.55)

print(beta1, beta2, beta3)

print('Probability of making Type II error decreases as sample size n increases.')

0.921234502171 0.887416199985 0.864807727046
Probability of making Type II error decreases as sample size n increases.


### Problem 2 (9-31, P413)
The coca-cola company reported that the mean per capita annual sales of its beverages in the united States was 423 eight-ounce servings. Suppose you are curious whether the consumption of coca-cola beverages is higher in atlanta, georgia, the location of coca-cola’s corporate headquarters. A sample of 36 individuals from the atlanta area showed a sample mean annual consumption of 460.4 eight-ounce servings with a standard deviation of s = 101.9 eight-ounce servings. Using $\alpha$ = .05, do the sample results support the conclusion that mean annual consumption of coca-cola beverage products is higher in atlanta?


##### 2-a: In this Markdown cell, please clearly specify your hypotheses. (1 point)

$H_0: \mu <= 423$ 

$H_a: \mu > 423$ 

In [4]:
# 2-b: Compute the corresponding t value or z value. (1 point)

xbar=460.4; mu_0=423; n=36; s=101.9; alpha=0.05

t_val = (xbar - mu_0)/s*pow(n, 0.5)
print(t_val)

2.202158979391559


In [5]:
# 2-c: Compute the corresponding p-value based on the t or z value from 2-b. Draw your conclusion to the test. (2 points)

from scipy.stats import t
p_val = 1 - t.cdf(t_val, n-1)

print(p_val)

if p_val < 0.05:
    print('Reject H0 because the p-value is less than significance level.')
else:
    print('Accept H0 because the p-value is >= significance level.')


0.0171673742785
Reject H0 because the p-value is less than significance level.


In [6]:
# 2-d: Compute the critical t value or z value based on the significant level of 5%. Draw your conclusion to the test. (1 point)

t_crit = t.ppf(1-alpha, n-1)
print(t_crit)

if t_val > t_crit:
    print('Reject H0 because the t-value is greater than critical t value.')
else:
    print('Accept H0 because the t-value is <= critical t value.')


1.68957245396
Reject H0 because the t-value is greater than critical t value.


### Problem 3 (9-42, P418)
According to the university of nevada center for logistics Management, 6% of all merchandise sold in the united States gets returned. A Houston department store sampled 80 items sold in January and found that 12 of the items were returned. 

In [7]:
# 3-a: Construct a point estimate of the proportion of items returned for the population of sales transactions at the Houston store.
# 1 point.
n=80; p_0 = 0.06
p_bar = 12/n
print(p_bar)

0.15


In [8]:
# 3-b: Construct a 95% confidence interval for the proportion of returns at the Houston store. (2 points)
CL = 0.95
from scipy.stats import norm
z_crit = norm.ppf(0.5+CL/2) # This is the upper critical z value.
se =pow(p_bar*(1-p_bar)/n, 0.5)
print(z_crit, se)
print('95% confidence interval is [{0}, {1}]'.format(p_bar-z_crit*se, p_bar+z_crit*se))

1.95996398454 0.03992179855667828
95% confidence interval is [0.07175471263084746, 0.22824528736915253]


In [17]:
# 3-c: Is the proportion of returns at the Houston store significantly different from the returns for the nation as a whole? 
# Provide statistical support for your answer. (2 points)

# H0: p = 0.06; Ha: p <> 0.06
se_ht = pow(p_0*(1-p_0)/n, 0.5)
z_val = (p_bar - p_0)/se_ht
p_val = 2*(1-norm.cdf(abs(z_val)))

print(z_val, p_val)

if abs(z_val) > abs(z_crit):
    print('Reject H0 because the absolute value of z-value is greater than the absolute value of critical z value.')
else:
    print('Accept H0 because the absolute value of z-value is <= the absolute value of critical z value.')


3.3895960971961925 0.000699956709323
Reject H0 because the absolute value of z-value is greater than the absolute value of critical z value.


### Problem 4 (9-49, P424)
A consumer research group is interested in testing an automobile manufacturer’s claim that a new economy model will travel at least 25 miles per gallon of gasoline (h0: $\mu$ >= 25).

##### 4-a: With a .02 level of significance and a sample of 30 cars, what is the rejection rule based on the value of x-bar for the test to determine whether the manufacturer’s claim should be rejected? Assume that $\sigma$ is 3 miles per gallon. (2 points)

In [10]:
from scipy.stats import norm
# Determine the critical value of the test statistic for alpha = 0.02, n = 30, and sigma = 3.

crit_z= norm.ppf(0.02)

print(crit_z)

print("This is a left-tailed test. We reject the null hypotheses if the observed z score is lower than crital z score.\n")

crit_xbar = 25+crit_z*3/(30**0.5)
print(crit_xbar)
print("This is a left-tailed test. We reject the null hypotheses if the observed xbar is lower than crital xbar.\n")


-2.05374891063
This is a left-tailed test. We reject the null hypotheses if the observed z score is lower than crital z score.

23.8751153942
This is a left-tailed test. We reject the null hypotheses if the observed xbar is lower than crital xbar.



In [11]:
# 4-b: What is the probability of committing a type ii error if the actual mileage is 23 miles per gallon? (1 point)

print((crit_xbar-23)/3*(30**0.5))

1-norm.cdf(crit_xbar, 23, 3/(30**0.5))


1.59773480607


0.055051004553206773

In [12]:
# 4-c: What is the probability of committing a type ii error if the actual mileage is 24 miles per gallon? (1 points)


1-norm.cdf(crit_xbar, 24, 3/(30**0.5))


0.59017962100490717

In [13]:
# 4-d: What is the probability of committing a type ii error if the actual mileage is 25.5 miles per gallon? (1 point)

# Cannot compute beta because when actual mileage is 25.5, the null hypothesis is true. 

### Problem 5
Create a function named “compute_power_in_two_tailed_test”. (5 points)
The function takes the following inputs: 
1. mu_b: a real number for population mean based on which we compute beta value and power; 
    * Recall that Type II error is accepting H0 when it is false.
2. mu_h: a real number for hypothesized population mean in the original hypotheses; 
3. n: a positive integer for sample size; 
4. alpha: a real number between 0 and 1 for significance level
5. sd: a positive real number for population or sample standard deviation; 
6. pop: a boolean indicating whether sd is population standard deviation or sample sd. The default is True.

The function returns a real number between 0 and 1 for the power of the hypothesis test. 
* Note that the returned value is power, not beta.

In [14]:

def compute_power_in_two_tailed_test(mu_b, mu_h, n, alpha, sd, pop=True):
    from scipy.stats import norm, t
    
    # start your code below
    if pop:
        crit_z = norm.ppf(1 - alpha/2.) # upper critical z value only for two-tailed test
        crit_Xbar1 = mu_h - crit_z*sd/pow(n, 0.5)
        crit_Xbar2 = mu_h + crit_z*sd/pow(n, 0.5)
        z1 = (crit_Xbar1 - mu_b)/sd*pow(n, 0.5)
        z2 = (crit_Xbar2 - mu_b)/sd*pow(n, 0.5)
        beta = norm.cdf(z2) - norm.cdf(z1)
    else:
        crit_z = t.ppf(1 - alpha/2., n-1)
        crit_Xbar1 = mu_h - crit_z*sd/pow(n, 0.5)
        crit_Xbar2 = mu_h + crit_z*sd/pow(n, 0.5)
        z1 = (crit_Xbar1 - mu_b)/sd*pow(n, 0.5)
        z2 = (crit_Xbar2 - mu_b)/sd*pow(n, 0.5)
        beta = t.cdf(z2, n-1) - t.cdf(z1, n-1)
    return 1 - beta

In [16]:
print(compute_power_in_two_tailed_test(16.5, 16, 30, 0.05, 0.8, False))
print(compute_power_in_two_tailed_test(16.5, 16, 30, 0.05, 0.8))

0.910637011996
0.928307656198
