# Chapter 21 - Exercises

In [1]:
import math
from scipy.stats import norm

## 21.1 

### Answers

* a)
    * two-sided
    * H0: p that prefer Diet Coke = 0.5
    * HA: p that prefer Diet Coke <> 0.5
    * the wording of this question is ambiguous and confusing; I think what they're really implicitly asking is "Do students have a preference?", _not_ "Do students prefer diet coke over diet pepsi, or vice-versa?"
    * A better phrasing would have been: "A business student conducts a taste test to see whether students show a preference for either Diet Coke or Diet Pepsi."
* b)
    * one-sided
    * H0: p (prefer new formula) = 0.5
    * HA: p > 0.5
* c)
    * one-sided
    * H0: p (vote in favor) = 2/3
    * HA: p > 2/3
    * shouldn't this instead be "p = 2/3n - 1" and "p > 2/3n - 1"? does p = 2/3 => passing or not passing? (assume it passes)
* d)
    * two-sided
    * H0: p (proportion of increases) = 0.5
    * HA: p <> 0.5

## 21.3

### Answers

* Under conditions of the null hypothesis (i.e. the new treatment has the same effectiveness as the traditional ointment), we'd expect to see the results we obtained (or more extreme results) purely due to sampling variability only 4.7% of the time.

## 21.5

### Answers

* He would have made the same decision at $\alpha = 0.10$ because the expectations under that condition are _less_ limiting.  We'd need to know his results to answer whether he would have made the same decision under $\alpha = 0.01$.

## 21.7

### Answers

* a) If the true proportion is 90%, we would only expect to see these (or more extreme) results 1.1% of the time due to pure sampling variability.
* b) The size of the effect is a difference of only 0.6%.  Although this is fairly small, applied across a huge population (i.e. all children in the US?) could potentially have a large practical impact.

## 21.9 

### Answers

* a)
  - randomized telephone poll; may be response bias; n_success/n_failure both > 10; assume 1302 is < 10% of entire pop.
  - [1.9%, 4.1%]
* b) The CI indicates that it has fallen below the 5% mark, in that 5% isn't included in the interval.
* c) H0 = p = 0.05, HA = p < 0.05 (one-sided); therefore alpha = 0.01 => 99% significance level.

In [11]:
# a)
n_s = 1302
n_x = 39
alpha = 0.02

p = n_x / n_s
sd = math.sqrt(p * (1 - p) / n_s)
ci = norm.ppf([alpha / 2, (1 - alpha / 2)], p, sd)

print("p: {}".format(p))
print("sd: {}".format(sd))
print("ci: {}".format(ci))

p: 0.029953917050691243
sd: 0.004724082815914828
ci: [ 0.01896406  0.04094378]


## 21.11

### Answers

* a) [27.3%, 32.7%]
* b) reject H0 - evidence supports claim that his approval rating was better

In [2]:
p = 0.3
n = 1125
alpha = 0.05

sd = math.sqrt(p * (1 - p) / n)
ci = norm.ppf([alpha / 2, (1 - alpha / 2)], p, sd)

print("p: {}".format(p))
print("sd: {}".format(sd))
print("ci: {}".format(ci))

p: 0.3
sd: 0.013662601021279464
ci: [ 0.27322179  0.32677821]


## 21.13

### Answers

* a) less than 10 success/failures in sample => fails that condition
* b) [4.8%, 25.6%]

In [5]:
n_s = 42
n_x = 5

# Agresti-Coull "plus-four"
p_tilde = (n_x + 2) / (n_s + 4)
p = p_tilde  # use p_tilde for p in follow on calcs
n = n_s + 4

sd = math.sqrt(p * (1 - p) / n)
ci = norm.ppf([alpha / 2, (1 - alpha / 2)], p, sd)

print("p: {}".format(p))
print("sd: {}".format(sd))
print("ci: {}".format(ci))

p: 0.15217391304347827
sd: 0.052959585336062626
ci: [ 0.04837503  0.25597279]


## 21.15

### Answers

* H0: applicant will repay the loand
* HA: applicant will not repay the loan (score below threshold)

* a) Type II
* b) Type I
* c) higher alpha; the critical value is moved closer to the mean
  NOTE: answer key has the opposite answer, which is counter-intuitive to me;  if we lower the threshold, we're increasing the area under the curve (to the right of the threshold) in which our z score can fall so that it leads us to reject the null hypothesis; this area corresponds to the alpha value (i.e. at alpha = 0.01 this area will be smaller than at alpha = 0.05, etc.); so by moving the threshold down (to the left) we increase the alpha value, not decrease it
* d) lower chance of Type I, greater chance of Type II

## 21.17

### Answers

* a) the probability of correctly denying a loan to someone who would not repay the loan
* b) increase the cutoff point
* c) that increase the chance of a Type II error (false negative - rejecting a candidate that would have repayed)

## 21.19

### Answers

* a) H0: p = low value; HA: p > low value
* b) There is no real increase, but they decide to continue the tax breaks
* c) There is a real increase, but the decide to discontinue the tax breaks
* d) 
  - Type I: city is harmed by applying tax revenues towards a program that isn't really working;
  - Type II: first-time buyers are harmed by not getting a benefit that appears to actually help
* e) The probability of identifying a real increase when it's present  

## 21.21

### Answers

* a) shop is "honest", but they determine the shop is "cheating"
* b) shop is "cheating", but they determine the shop is "honest"
* c) Type I
* d) Type II

## 21.23

### Answers

* a) the probability that they will identify the shop as cheating (under the condition that it actually is cheating)
* b) 40 cars - power increases with sample size
* c) 10% - higher alpha leads to greater chance of rejecting H0
* d) a lot - power increases with increase in effect size

## 21.25

### Answers

* a) one-tailed: the implication is that a lower proportion of minorities are being hired than actually applied
* b) Type I: hiring is fair, but test determines that it's not
* c) Type II: hiring is unfair, but test determines that it's fair
* d) The probability of identifying unfair hiring when it's present
* e) Power will increase
* f) Lower power with a lower sample size

## 21.27

### Answers

* a) one-tailed: interested in whether the software causes a decrease in drop-out rate
* b) H0: p = 0.13, HA: p < 0.13
* c) professor determines the software helped, even though it doesn't
* d) professor determines the software didn't help, even though it does
* e) the probability that the professor determines that the software helps when it actually does

## 21.29

### Answers

* a) yes - the sample shows a drop-out rate of 5.4%, with a 95% CI of 2.3% to 8.5%
* b) if the software _didn't_ actually help, we would expect to see these (or more extreme) results due to sampling variability < 0.07% of the time

In [15]:
p_0 = 0.13
n_s = 203
n_x = 11
alpha = 0.05

p = n_x / n_s
sd = math.sqrt(p * (1 - p) / n_s)
ci = norm.ppf([alpha / 2, (1 - alpha / 2)], p, sd)

print("p: {}".format(p))
print("sd: {}".format(sd))
print("ci: {}".format(ci))

p: 0.054187192118226604
sd: 0.01588923177336244
ci: [ 0.02304487  0.08532951]


In [16]:
sd2 = math.sqrt(p_0 * (1 - p_0) / n_s)
z = (p - p_0) / sd2
print("z: {}".format(z))
pValue = norm.cdf(p, p_0, sd2)
pValue
print("pValue: {}".format(pValue))

z: -3.2118799061285754
pValue: 0.0006593474382024779


## 

### Answers

* 

## 

### Answers

* 

## 

### Answers

* 

## 

### Answers

* 