# Chapter 20 - Exercises

In [1]:
import math
from scipy.stats import norm

## 20.1

### Answers

* a)
    * H0: p = 30%
    * HA: p < 30%
* b)
    * H0: p = 0.5
    * HA: p <> 0.5
* c)
    * H0: p = 0.2
    * HA: p > 0.2

## 20.3

### Answers

* (D)

## 20.5

### Answers

* no - There's a 27% chance that natural sampling variation would lead to the obtained results when the new formula has the same effectiveness as the old formula.  But this doesn't provide evidence that they are the same effectiveness.  We fail to reject H0, but we don't _accept_ it as a result of this.

## 20.7

### Answers

* a) No: That particular outcome could happen 25% of the time purely due to chance.
* b) 12.5%
* c) No: The inability of people to control the outcome of a coin toss is fairly well established -- I'd expect stronger evidence to convince me otherwise.
* d) 7 - at this point, the P-value is < 0.01

## 20.9

### Answers

* HA should b: p < 0.96
* sd should be calculated using the H0 p (0.96) not 0.94
* P-Value of 0.12 isn't "strong evidence"
* this should use a one-sided alternative: we only care about results 0.96 or greater
* failure condition not met: 0.04 * 200 = 8 < 10

## 20.11

### Answers

* a) 
  * H0: p = 0.3
  * HA: p > 0.3
* b) 
  * assume his customers are randomly sampled
  * expect 24 successes / 56 failures - ok
  * presumably less than 10% of entire population
* c) 
  * sd = 0.0512
  * p-value = 0.232
* d) 
  * There's a 23.2% chance that natural sampling variation would produce results these good or better, even if his true success rate were 30% (i.e. no better than that of the typical approach for locating a well).
* e)
  * We fail to reject the null hypothesis -- the results don't provide enough evidence to support his claim.

In [9]:
n = 80
p = 0.3
p_hat = 27 / n
sd = math.sqrt(p * (1 - p) / n)
print("p_hat: {}".format(p_hat))
print("sd: {}".format(sd))

z = (p_hat - p) / sd
print("z: {}".format(z))

p_value = 1 - norm.cdf(z, 0, 1)
print("p-value: {}".format(p_value))

p_hat: 0.3375
sd: 0.05123475382979799
z: 0.7319250547114006
p-value: 0.2321071563855155


## 20.13

### Answers

* a)
  * H0: p = 0.34
  * HA: p < 0.34
* b) we'll assume students are independent wrt attendance; assume they are randomly selected; 8302 should be less than 10% of all students nationally; we can expect >= 10 successes and failures given sample size
* c) see below
* There's only a 2.7% chance of normal sampling variability leading to results that differ this much (or more) under the conditions of the null hypothesis (i.e. no difference).  As a result, this provides evidence that there _is_ a change in the attendance rate.
* Appears to be statistically significant, but it's not clear whether this has practical significance.  The actual change is very low.

In [6]:
n = 8302
p = 0.34
pHat = 0.33

sd = math.sqrt(p * (1 - p) / n)
z = (pHat - p) / sd
pValue = norm.cdf(z, 0, 1)

print("sd: {}\nz: {}\npValue: {}".format(sd, z, pValue))

sd: 0.005199002924996011
z: -1.9234457345506653
pValue: 0.027212047670411802


## 20.15

### Answers

* a)
  * H0: p = 0.05
  * HA: p < 0.05
* b) 
  * random sample, 100K - not sure if this is < 10% of entire pop; should have >= 10 success/failures
* c) yes - P-Value is < 0.001 => supports claim of a potential drop in rate

In [11]:
n = 100000
nd = 4781
p = 0.05
pHat = nd / n

sd = math.sqrt(p * (1 - p) / n)
z = (pHat - p) / sd
pValue = norm.cdf(z, 0, 1)

print("pHat: {}\nsd: {}\nz: {}\npValue: {}".format(pHat, sd, z, pValue))

pHat: 0.04781
sd: 0.0006892024376045111
z: -3.1775859754818576
pValue: 0.0007425332600565135


## 20.17

### Answers

* a)
  * H0: p = 0.63
  * HA: p > 0.63
* b) problem states representative (i.e. random vs biased condition met); assume independent; less than 10% of pop.; 240 X .63 and 240 X .37 are both > 10
* c) No - P-Value = 0.06; fail to reject the null hypothesis at significance of alpha = 0.05

In [10]:
p = 0.63
n = 240
nx = 163
pHat = nx / n

sd = math.sqrt(p * (1 - p) / n)
z = (pHat - p) / sd
pValue = 1 - norm.cdf(z, 0, 1)

print("pHat: {}\nsd: {}\nz: {}\npValue: {}".format(pHat, sd, z, pValue))

pHat: 0.6791666666666667
sd: 0.031164884084494842
z: 1.5776303397556386
pValue: 0.05732527904291429


## 20.19

### Answers

* H0: p = 0.20
* HA: p > 0.20
* conditions aren't met: 
  * sample size (22) is greater than 10% of entire population (150)
  * 22 * .2 = 4.4 < 10

## 20.21

### Answers

* H0: p = 0.03
* HA: p <> 0.03
* conditions met: self-selected; independent, < 10% of entire pop; at least 10 successes and failures in sample
  * since not randomly selected, this may lead to biased results
* P-Value (2-alternative test) = 0.0557 => reject null hypothesis; provides evidence that the rate of twins deliveries to teenagers at this hospital is different from the national average for all mothers

In [19]:
p = 0.03

n_samp = 469
nx = 7

pHat = nx / n_samp

sd = math.sqrt(p * (1 - p) / n_samp)
z = (pHat - p) / sd
pValue = 2 * norm.cdf(z, 0, 1)

print("pHat: {}\nsd: {}\nz: {}\npValue: {}".format(pHat, sd, z, pValue))

pHat: 0.014925373134328358
sd: 0.007876985991835013
z: -1.9137557031709123
pValue: 0.05565137792171398


## 20.23

### Answers

* 

## 

### Answers

* 

## 

### Answers

* 

## 

### Answers

* 

## 

### Answers

* 

## 

### Answers

* 