# Chapter 19 - Exercises

In [1]:
import math
from scipy.stats import norm

## 19.1 

### Answers

* It means, the newscaster is confident (to some %) that the interval between {estimate} - 4% and {estimate} + 4% captures the _true_ proportion.

## 19.3

### Answers

* a)
    * population: all cars (or at least, all cars within their jurisdiction?)
    * sample: 134 cars stopped at an auto checkpoint
    * $p$: the % of all cars that have at least one safety violation
    * $\hat{p}$: the % of the 134 cars stopped that have at least one safety violation: (14/134): 10.4%
    * can use methods of this chapter?: yes - random and appropriate sample size
* b)
    * population: general public
    * sample: 602 viewers of a TV talk show who voluntarily logged onto a website to vote
    * $p$: % of general public that favors prayer in schools
    * $\hat{p}$: % of sample described above that favors prayer in schooles: 488 / 602 = 81.6%
    * can use methods of this chapter?: No - not randomly selected nor representative of the general population; sample being limited to only viewers of the TV talk show, and self-selection introduce bias.
* c)
    * population: all parents in the district
    * sample: all parents in the district that voluntarily responded to the questionnaire
    * $p$: % of parents that favor uniforms
    * $\hat{p}$: % of the responding parents that favor uniforms: 228 / 380: 60%
    * can use methods of this chapter?: no - self-selection + high non-response introduces bias
* d)
    * population: all freshmen admitted to this college
    * sample: all freshmen admitted to this college in year $y$
    * $p$: % of freshmen who graduate on time
    * $\hat{p}$: % of freshmen admitted in year $y$ that graduate on time: 85.1%
    * can use methods of this chapter?: assuming the proportion in year $y$ is representative of other years, then yes


## 19.5

### Answers

* a) no: This conclusion is expressing certainty about the interval
* b) no: This conclusion is not making a statement about the true proportion, but rather, other samples
     also: the % of samples showing 88% will be much lower than 95%, so the statement is just wrong
* c) no: similar to the previous one
* d) yes: uncertainty is around the _interval_
     X: this should be "no": it's talking specifically about the _sample_, which we know with certainty
* e) no: this implies that the true proportion is changing from day-to-day

## 19.7

### Answers

* a) false
* b) true
* c) true
* d) false: requires a sample 4X larger

## 19.9

### Answers

* The output is 90% confident that the true proportion of cars manufactured in Japan is captured by the interval from 29.9% and 47.0%

## 19.11

### Answers

* a) see below
* b) I'm 95% confident that the interval from 79.7% to 86.3% captures the true proportion.  OR I'm 95% confident that between 79.7% and 86.3% of all broiler chicken sold in the United States are contaminated with Campylobacter.
* c) assuming the samples were representative of the overall population, then no: randomly selected, sufficient sample size, not greater than 10% of overall population - otherwise, the impact of the size of the sample is captured in the CI itself.

In [2]:
def standard_error(pHat, n):
    num = pHat * (1 - pHat)
    return math.sqrt(num / n)

In [3]:
n = 525
pHat = 0.83

se = standard_error(pHat, n)
ci = (pHat - 2 * se, pHat + 2 * se)

#a )
print("95% CI: {}".format(ci))


95% CI: (0.7972120812028635, 0.8627879187971365)


## 19.13

### Answers

* a) see below: 2.5%
* b) We can be 90% confident that the interval from 36% - 2.5% to 36% + 2.5% captures the true proportion of adult baseball fans.
* c) Larger - without an increase in sample size, we'd need to decrease the precision of the estimate in order to increase our confidence in the estimate.
* d) see below: 3.9%
* e) Less confidence
* f) Probably not - the difference of 1% could easily be the result of sample variability.

In [20]:
n = 1006
pHat = 0.36

#a)
se = standard_error(pHat, n)

# approach 1 using z-score * se
# norm.ppf(.95, 0, 1) * se

# approach 2 using N(pHat, se) - pHat
z = norm.ppf(.95, pHat, se)
z - pHat

0.024892556629847462

In [21]:
norm.ppf(.995, 0, 1) * se

0.038981570005258107

## 19.15

### Answers

* a) (4.6%, 4.9%)
* b) depends on what we mean by "plausible" - if we consider the 95% CI our driver of plausibility, then it argues against 5%

In [27]:
n = 100000
pHat = 4781 / n
se = standard_error(pHat, n)

ci = (pHat - 2 * se, pHat + 2 * se)

#a )
print("95% CI: {}".format(ci))

95% CI: (0.046460567468896645, 0.04915943253110335)


## 19.17

### Answers

* a) see below
* b) We are 95% confident that between 12.7% and 18.6% of all auto accidents involve teenage drivers.
* c) There's a 95% chance that the interval between 12.7% and 18.6% captures the true population proportion.
* d) No - closer to 1 in 6 thru 1 in 8

In [8]:
def ci(pHat, se, alpha):
    me = norm.ppf(1 - (alpha / 2), 0, 1) * se
    return (pHat - me, pHat + me)

In [48]:
n = 582
pHat = 91 / n
print("pHat: {}".format(pHat))
se = standard_error(pHat, n)
print("se: {}".format(se))

#a)
ci(pHat, se, 0.05)

pHat: 0.1563573883161512
se: 0.015054868459122572


(0.12685038834428294, 0.18586438828801946)

## 19.19

### Answers

* Given the selection bias, probably not much.  

## 19.21

### Answers

* a) response bias
* b) (53.96%, 60.04%)
* c) should have a narrower margin of error, given smaller sample size

In [9]:
#b)

yesB = 510 * .6
yesA = 510 * .54
yes = nA + nB
yes

n = 1020
pHat = yes / n

se = standard_error(pHat, n)

ci(pHat, se, 0.05)

(0.53961776904511116, 0.60038223095488896)

## 19.23

### Answers

* a) (18.2%, 21.8%)
* b) We are 98% confident that between 18.2% and 21.8% of children in England are deficient in vitamin D
* c) 98% of randomly drawn samples will capture the true proportion of the overall population.

In [11]:
#a)
n = 2700
pHat = 0.2

se = standard_error(pHat, n)
ci(pHat, se, 0.02)

(0.18209176571591743, 0.2179082342840826)

## 19.25

### Answers

* a) wider: sample size is smaller given that it's a subset of the larger sample
* b) smaller margin of error: it's sample size is larger than the first poll

## 19.27

### Answers

* a) (15.5%, 26.3%)
* b) n * 4 = 612
* c) The deers taken by hunters may not be representative of the overall population of deer - hunting restrictions, hunter selection, etc.

In [15]:
#a) 
n = 153
pHat = 32 / n

se = standard_error(pHat, n)
print("se: {}".format(se))
ci(pHat, se, 0.1)

se: 0.032879903033857905


(0.15506769903833173, 0.26323295455643952)

In [16]:
n * 4

612

## 19.29

### Answers

* a) at least 141
* b) at least 317
* c) at least 564

In [26]:
pHat = 0.25
me = [0.06, 0.04, 0.03]
zStar = norm.ppf(0.95, 0, 1)
print("z*: {}".format(zStar))

[(margin, pHat * (1 - pHat) * ((zStar / margin) ** 2)) for margin in me]

z*: 1.6448536269514722


[(0.06, 140.91372156746942),
 (0.04, 317.05587352680618),
 (0.03, 563.65488626987769)]

## 19.31

### Answers

* at least 1801

In [28]:
pHat = 0.25
me = [0.02]
zStar = norm.ppf(0.975, 0, 1)
print("z*: {}".format(zStar))

[(margin, pHat * (1 - pHat) * ((zStar / margin) ** 2)) for margin in me]

z*: 1.959963984540054


[(0.02, 1800.683822200371)]

## 19.33

### Answers

* at least 384 

In [29]:
pHat = 9 / 60
me = 0.03
zStar = norm.ppf(0.95, 0, 1)
pHat * (1 - pHat) * (zStar / me) ** 2

383.28532266351687

## 19.35

### Answers

* 90%

In [34]:
pHat = 0.65
qHat = 1 - pHat
n = 972
me = 0.025

se = math.sqrt(pHat * qHat / n)

zStar = me / se
2 * norm.cdf(zStar, 0, 1) - 1

0.89776515526451206