# Chapter 18 - Exercises

## 18.1

### Answers

* shape is approximately normal (and approximation improves as sample size increases)
* center is at $p$ (0.5)
* spread, $\sigma$, is (see below based on $n$)

In [34]:
import math
p = 0.05
sigmas = [math.sqrt((p * (1 - p)) / n) for n in [20, 50, 100, 200]]
sigmas

[0.04873397172404482,
 0.030822070014844882,
 0.021794494717703367,
 0.015411035007422441]

## 18.3 

### Answers

* a) mu = 0.5; sigma (see above)
* b) see below
* c) 200 is the first that shows no discernable skew
* d) 200 is also the first that we'd expect at least 10 successes and 10 failures in each sample

In [35]:
meandiffs = [m - p for m in [0.0497, 0.0516, 0.0497, 0.0501]]
meandiffs

[-0.00030000000000000165,
 0.0015999999999999973,
 -0.00030000000000000165,
 9.999999999999593e-05]

In [36]:
sigmadiffs = [a-b for (a,b) in zip(sigmas, [0.0479, 0.0309, 0.0215, 0.0152])]
sigmadiffs

[0.00083397172404482,
 -7.792998515511809e-05,
 0.00029449471770336827,
 0.00021103500742244118]

## 18.5

### Answers

* a) Approximately normal; underlying distribution is uniform, n * p is less than 10, though
* b) At or near p = 0.5
* c) Sigma = 0.125
* d) Success/Failure condition not met.

## 18.7

### Answers

* a) 
  * 68% of sampled proportions fall between 0.4 and 0.6
  * 95% of sampled proportions fall between 0.3 and 0.7
  * 99.7% of sampled proportions fall between 0.2 and 0.8
* b) yes: success/failure condition met: n * p = 12.5
* c) N(0.5, 0.0625) : see below
* d) the spread decreases, and distribution narrows, as the sample size increases

In [37]:
# a)
mu = 0.5
sigma = math.sqrt(0.25 / 25)
[(delta, mu + (delta * sigma)) for delta in range(-3,4)]

[(-3, 0.19999999999999996),
 (-2, 0.3),
 (-1, 0.4),
 (0, 0.5),
 (1, 0.6),
 (2, 0.7),
 (3, 0.8)]

In [38]:
# c)
mu = 0.5
sigma = math.sqrt(.25 / 64)
[(delta, mu + (delta * sigma)) for delta in range(-3,4)]

[(-3, 0.3125),
 (-2, 0.375),
 (-1, 0.4375),
 (0, 0.5),
 (1, 0.5625),
 (2, 0.625),
 (3, 0.6875)]

## 18.9

### Answers

* Her result is 2.6 std deviations below the mean.  There's a 1.2% chance of getting this result.

In [39]:
n = 200
p = 0.5

mu = p
sigma = math.sqrt((p * (1 - p)) / n)


z = (mu - 0.42) / sigma
z

2.2627416997969525

In [40]:
from scipy.stats import norm
norm.cdf(0.42, mu, sigma)

0.011825808327677984

## 18.11

### Answers

* a) see below
* b) yes: more than 10 successes and failures; probably independent, but some factors may affect that

In [41]:
# a)
p = 0.7
n = 80

mu = p
sigma = math.sqrt((p * (1 - p)) / n)

[(delta, mu + (delta * sigma)) for delta in range(-3,4)]

[(-3, 0.546295738510606),
 (-2, 0.597530492340404),
 (-1, 0.6487652461702019),
 (0, 0.7),
 (1, 0.751234753829798),
 (2, 0.8024695076595959),
 (3, 0.8537042614893939)]

In [42]:
# b)
[80 * p for p in [.7, .3]]

[56.0, 24.0]

## 18.13

### Answers

* p = 0.12
* n = 170

* a) yes: 
  * assumptions: 
    - independence: +
    - sample size is large enough: (see conditions)
  * conditions:
    - randomization: + (assume yes with respect to this question)
    - 10% condition: + sample size is less than 10% of population
    - success/failure criteria + (20+ success, more failures)
* b) see below
* c) they should expect between 12 and 29 students to be nearsighted (+/- 2SD -- see below)

In [43]:
#b)
import math
p = 0.12
n = 170
mu = p
sigma = math.sqrt((p * (1 - p)) / n)

print("mean: {}".format(mu))
print("sigma: {}".format(sigma))

[(delta, mu + (delta * sigma), n * (mu + (delta * sigma))) for delta in range(-3,4)]

mean: 0.12
sigma: 0.024923412097628914


[(-3, 0.04522976370711325, 7.689059830209253),
 (-2, 0.07015317580474217, 11.92603988680617),
 (-1, 0.09507658790237108, 16.163019943403086),
 (0, 0.12, 20.4),
 (1, 0.1449234120976289, 24.636980056596915),
 (2, 0.16984682419525782, 28.873960113193828),
 (3, 0.19477023629288676, 33.11094016979075)]

## 18.15

### Answers

* a) see below
* b)
  - assumptions: independence (yes - maybe?); sample size ok =>
  - conditions: random (yes); sample less than 10% of pop (yes - if pop is _all_ borrowers); success/failure min 10: yes
* c) ~ 5%, see below

In [44]:
p = 0.07
n = 200

mu = p
sigma = math.sqrt((p * (1 - p)) / n)

print("mean: {}".format(mu))
print("std dev: {}".format(sigma))

[(delta, mu + (delta * sigma)) for delta in range(-3,4)]

mean: 0.07
std dev: 0.01804161855266872


[(-3, 0.01587514434199385),
 (-2, 0.03391676289466257),
 (-1, 0.05195838144733129),
 (0, 0.07),
 (1, 0.08804161855266873),
 (2, 0.10608323710533744),
 (3, 0.12412485565800616)]

In [45]:
# c)
from scipy.stats import norm
1 - norm.cdf(.1, mu, sigma)

0.048174037094531386

## 18.17

### Answers

* see below - should expect between 70-79% return rate;
* conditions met:
  - independent
  - sample size is "correct": < 10% of pop; random; success/failures > 10 ea.
  
  
* [note: the question states that they're random samples, while the answer key states that they're not; question also asks "... we expect to return to _that_ school for their..." -- maybe they left out a detail -- e.g these students are all from a single school?  -- not really clear what the question is asking.]

In [46]:
p = 0.74
n = 400

mu = p
sigma = math.sqrt( (p * (1 - p)) / n  )

print("mean: {}".format(mu))
print("stddev: {}".format(sigma))

[(delta, mu + (delta * sigma)) for delta in range(-3,4)]

mean: 0.74
stddev: 0.02193171219946131


[(-3, 0.674204863401616),
 (-2, 0.6961365756010773),
 (-1, 0.7180682878005387),
 (0, 0.74),
 (1, 0.7619317121994613),
 (2, 0.7838634243989226),
 (3, 0.805795136598384)]

## 18.19

### Answers

* yes - this is more than 7 SDs above the mean

In [47]:
n = 603
mu = p
sigma = math.sqrt( (p * (1 - p)) / n  )

print("mean: {}".format(mu))
print("stddev: {}".format(sigma))

pct = 522 / 603
print("{:%}".format(pct))

(pct - mu) / sigma
#1 - norm.cdf(pct, mu, sigma)

mean: 0.74
stddev: 0.01786256728793726
86.567164%


7.035474787317489

## 18.21

### Answers

* 21.2%
* assuming a large enough voter population, assumptions and conditions are met: independent, random, > 10% of pop, at least 10 s/f

In [48]:
n = 400
p = 0.52

mu = p
sigma = math.sqrt(p * (1 - p) / n)

norm.cdf(.5, mu, sigma)

0.2116698207912216

## 18.23

### Answers

* n = 150
* reject if > 5% blemished
* p = 0.08

* what's the probability that the sample will _not_ be rejected? 8.8%

In [49]:
n = 150 
p = 0.08
mu = p
sigma = math.sqrt( p * (1 - p) / n)

print("mu: {}".format(mu))
print("sigma: {}".format(sigma))

norm.cdf(0.05, mu, sigma)

mu: 0.08
sigma: 0.022150996967781535


0.087813830030137985

## 18.25

### Answers

* p = 0.6
* n = 120
* see below

In [50]:
n = 120
p = 0.6
mu = p
sigma = math.sqrt( p * (1 - p) / n)
print("mu: {}".format(mu))
print("sigma: {}".format(sigma))

norm.ppf(0.9985, mu, sigma) * n

mu: 0.6
sigma: 0.044721359549995794


87.926552977124274

In [51]:
(mu + 3 * sigma) * n

88.09968943799848

## 18.27

### Answers

* a) normal dist.; centered at mean of population; std dev sqrt(p * 1-p / n)
* b) same center; spread narrows

## 18.29

### Answers

* a) centered at pop mean, skewed to the right, otherwise, approximately normal
* b) as sample size increases, spread "narrows" and distribution becomes closer to normal, more symmetric, and less skewed

## 18.31

### Answers

* a) see below
* b) pretty close to the theoretical values
* c) 10 looks to be the first where there's almost no visible skew
* d) the skew

In [52]:
n = 250
mu = 36.33
sigma = 4.019

In [57]:
#a) 

samples = [(n, sigma / math.sqrt(n)) for n in [2, 5, 10, 20]]
for s in samples:
    print("n: {}, mu: {}, sigma: {}".format(s[0], mu, s[1]))

n: 2, mu: 36.33, sigma: 2.8418621535887345
n: 5, mu: 36.33, sigma: 1.797351440314331
n: 10, mu: 36.33, sigma: 1.2709193916216717
n: 20, mu: 36.33, sigma: 0.8986757201571655


## 18.33

### Answers

* mu = 3.4
* sigma = 0.35

* randomly assigned, so assumed independence; each sample is presumably less than 10% of entire population; since unimodal and only slightly skewed, n=25 is probably sufficient

In [62]:
mu = 3.4
sigma = 0.35
n = 25

s_mu = mu
s_sigma = sigma / math.sqrt(n)

[(d, s_mu + d * s_sigma) for d in range(-3, 4)]

[(-3, 3.19),
 (-2, 3.26),
 (-1, 3.33),
 (0, 3.4),
 (1, 3.4699999999999998),
 (2, 3.54),
 (3, 3.61)]

## 18.35

### Answers

* a) as sample size increases (i.e. larger total sales => larger sample size), variance of the sampling distribution decreases, while the sampling mean is centered around the underlying population mean (~ 62.5% payout) -- [not clear on how the NY lottery is making money off this...]
* b) each case within a sample is random, with the same chance across all cases

## 18.37

### Answers

* a) 21.1%
* b) 276.8
* c) N(266, 2.07)
* d) < 0.2%

In [63]:
#a)
mu = 266
std = 16
norm.cdf(280, mu, std) - norm.cdf(270, mu, std)

0.21050672146456562

In [64]:
#b)
norm.ppf(.75, mu, std)

276.79183600313729

In [65]:
#c)
s_mu = mu
s_sigma = std / math.sqrt(60)
print("N({}, {})".format(s_mu, s_sigma))

N(266, 2.065591117977289)


In [67]:
norm.cdf(260, s_mu, s_sigma)

0.0018378060588706117

## 18.39 

### Answers

* a) c-section delivery and naturally occurring premature delivery
* b) a&b - yes, c - no: a&b are questions about the underlying population, c is a question about a _sampling distribution_ from the underlying population - CLT supports use of normal modell 

## 18.41

### Answers

* 1,3,5 -> \$0
* 2,4 -> \$1
* 6 -> $10

* see below

In [21]:
#a)
vmap = { 
    1: 0,
    3: 0,
    5: 0,
    2: 1,
    4: 1,
    6: 10    
}

vweighted = [v * 1/6 for v in vmap.values()]
mu = sum(vweighted)
print("expected value: {}".format(mu))

import math
sigma = math.sqrt(sum([(v - mu) ** 2 for v in vmap.values()]) / 6)
print("std dev: {}".format(sigma))


expected value: 2.0
std dev: 3.605551275463989


In [22]:
#b)
print("expected value: {}".format(2 * mu))
print("std dev: {}".format(2 * sigma / math.sqrt(2)))

expected value: 4.0
std dev: 5.0990195135927845


In [23]:
#c)
mean = 40 * mu
stddev = 40 * sigma / math.sqrt(40)

print("mean: {}".format(mean))
print("stddev: {}".format(stddev))

from scipy.stats import norm
1 - norm.cdf(100, mean, stddev)

mean: 80.0
stddev: 22.80350850198276


0.19022756262519425

## 18.43

### Answers

* a) see below
* b) no - bimodal, non-normal distribution
* c) approx. normal, mean @ 2.859, std = .209

In [31]:
#a)
vmap = {
    5: 12.6,
    4: 22.2,
    3: 25.3,
    2: 18.3,
    1: 21.6
}

vweighted = [v * p / 100 for (v,p) in vmap.items()]
mu = sum(vweighted) 
print("expected value: {}".format(mu))

import math
sigma = math.sqrt(sum([((v - mu) ** 2) * p / 100 for (v,p) in vmap.items()]))
print("std dev: {}".format(sigma))


expected value: 2.859
std dev: 1.3240540019198612


In [33]:
#c)
sigma / math.sqrt(40)

0.20935131955638586

## 18.45

### Answers

* 19.9% (see below)

In [36]:
s_mu = mu
s_sigma = sigma / math.sqrt(63)

print("mean: {}".format(s_mu))
print("stddev: {}".format(s_sigma))

1 - norm.cdf(3, s_mu, s_sigma)


mean: 2.859
stddev: 0.1668151243571329


0.19898644513207286

## 18.47

### Answers

* a) mean: 2.9, sd: 0.0447
* b) 1.27%
* c) 2.97

In [37]:
#a)
mean = 2.9
sd = 0.4

y_bar = mean
s = sd / math.sqrt(80)

print("mean: {}".format(y_bar))
print("sd: {}".format(s))

mean: 2.9
sd: 0.044721359549995794


In [41]:
#b)
norm.cdf(3.1, y_bar, s) - norm.cdf(3.0, y_bar, s)

0.012669787230518592

In [39]:
#c)
norm.ppf(0.95, y_bar, s)

2.9735600904580113

## 18.49

### Answers

* a) distribution is not normal, so can't use normal to estimate probability
* b) yes: because the sampling distribution will be approximately normal; sample size might be too small, though
* c) no; prob is les than 1% : see below

In [47]:
mean = 9.6
sd = 5.4
n = 10

y_bar = mean
s = sd / math.sqrt(n)

print("probability: {}".format(1 - norm.cdf(15, y_bar, s)))
print("that would be {} sd's above the mean".format((15 - 9.6) / s))

probability: 0.0007827011290012509
that would be 3.1622776601683795 sd's above the mean


## 18.51

### Answers

* a) asking if probability that total across 40 is >= \$500 is the same as asking if mean weight is >= \$500 / 40; see below
* b) $427.77

In [50]:
#a) 
check = 500 / 40

s = sd / math.sqrt(40)

1 - norm.cdf(check, mean, s)

0.00034124228767662412

In [52]:
#b)
norm.ppf(.90, mean, s) * 40

427.76831636961538

## 18.53 

### Answers

* see below

In [53]:
me = 130
sde = 8
mw = 120
sdw = 10

In [54]:
#a)
1 - norm.cdf(125, me, sde)

0.73401447095129946

In [56]:
#b)
# We select a student at random from each school. 
# Find the probability that the East State student’s IQ is at least 5 points higher than the West State student’s IQ.

mean_diff = me - mw
sd_diff = math.sqrt(sde ** 2 + sdw ** 2)

1 - norm.cdf(5, mean_diff, sd_diff)

0.65189232437813005

In [57]:
#c) 
# We select 3 West State students at random.  
# Find the probability that this group’s average IQ is at least 125 points.

1 - norm.cdf(125, mw, sdw / math.sqrt(3))

0.19323811538561642

In [58]:
#d)
# We also select 3 East State students at random. 
# What’s the probability that their average IQ is at least 5 points higher than the average for the 3 West Staters?

sd_diff = math.sqrt(
  (sde / math.sqrt(3)) ** 2 +
  (sdw / math.sqrt(3)) ** 2
)

1 - norm.cdf(5, mean_diff, sd_diff)

0.75055974046294038