# Hypothesis Testing Exercises

In [None]:
from pyreadr import read_r
from scipy import stats
from statsmodels.stats.proportion import proportions_ztest, proportion_confint

## 1) Radon Detectors

A sample of 12 radon detectors of a certain type was selected, and each was exposed to 100 pCi/L of radon.
The resulting readings were as follows. (Data ex08.32) 

**Does this data suggest that the population mean reading under these conditions differs from 100?**

* Write the relevant hypotheses, select the applicable test and write the relevant Python code.
* Mark your decision regarding the Null-hypothesis ($\alpha = .05$) and write a conclusion with regard to the original question.
* Which type of error could have you made with your decision?

In [93]:
df = read_r('../data/devore7/ex08.32.rda')['ex08.32']
df.head()

Unnamed: 0_level_0,C1
rownames,Unnamed: 1_level_1
1,105.6
2,90.9
3,91.2
4,96.9
5,96.5


* $H_0: \mu = 100$
* $H_A: \mu \neq 100$

In [70]:
stats.ttest_1samp(df['C1'], popmean=100, alternative='two-sided')

TtestResult(statistic=-0.9213828271018268, pvalue=0.37661608746499975, df=11)

## 8) Effect of Temperature

An experiment to determine the effects of temperature on the survival of insect eggs was described in the article “Development Rates and a Temperature-Dependent Model of Pales Weevil” (Environ. Entomology, 1987: 956–962).
At 11°C, 73 of 91 eggs survived to the next stage of development.
At 30°C, 102 of 110 eggs survived.

**Do the results of this experiment suggest that the survival rate (proportion surviving in the population) is higher for 30°C temperature?**

* Write the relevant hypotheses, select the applicable test and write the relevant Python code.
* Mark your decision regarding the Null-hypothesis ($\alpha = .05$) and write a conclusion with regard to the original question.
* Which type of error could have you made with your decision?

* $\pi_{11} \geq \pi_{30}$
* $\pi_{11} < \pi_{30}$

In [71]:
from statsmodels.stats.proportion import proportions_ztest

proportions_ztest(
    count=[73, 102],
    nobs=[91, 110],
    alternative='smaller'
    )

(-2.630144568163633, 0.004267428251156687)

## 6) Robots

Scientists think that robots will play a crucial role in factories in the next several decades.
Suppose that in an experiment to determine whether the use of robots to weave computer cables is feasible,
a robot was used to assemble 500 cables.
The cables were examined and there were 10 defectives.

**If human assemblers have a defect rate of .035 (3.5%), does this data support the hypothesis that the proportion of defectives is lower for robots than for humans?**

* Write the relevant hypotheses, select the applicable test and write the relevant Python code.
* Mark your decision regarding the Null-hypothesis ($\alpha = .05$) and write a conclusion with regard to the original question.
* Which type of error could have you made with your decision?

* $H_0: \pi \geq 0.035$
* $H_A: \pi < 0.035$

In [72]:
proportions_ztest(
    count=10,
    nobs=500,
    value=0.035,
    alternative='smaller'
)

(-2.3957871187497752, 0.008292359711399333)

## 4) Drywall

With domestic sources of building supplies running low several years ago, roughly 60,000 homes were built with imported Chinese drywall.
According to the article “Report Links Chinese Drywall to Home Problems” (New York Times, Nov. 24, 2009),
federal investigators identified a strong association between chemicals in the drywall and electrical problems,
and there is also strong evidence of respiratory difficulties due to the emission of hydrogen sulfide gas.
An extensive examination of 51 homes found that 41 had such problems.
Suppose these 51 were randomly sampled from the population of all homes having Chinese drywall.

**Does the data provide strong evidence for concluding that more than 50% of all homes with Chinese drywall have electrical/environmental problems?**

* Write the relevant hypotheses, select the applicable test and write the relevant Python code.
* Mark your decision regarding the Null-hypothesis ($\alpha = .01$) and write a conclusion with regard to the original question.
* Which type of error could have you made with your decision?

Extra: Calculate a confidence interval using a confidence level of 99% for the percentage of all such homes that have electrical/environmental problems.

* $H_0: \pi \leq 0.5$
* $H_A: \pi > 0.5$

In [73]:
proportions_ztest(count=41, nobs=51, value=0.5, alternative='larger')

(5.466695171450571, 2.292517684598636e-08)

## 5) Sweetgum Lumber

The article “Development of Novel Industrial Laminated Planks from Sweetgum Lumber” (J. of Bridge Engr., 2008: 64–66) described the manufacturing and testing of composite beams designed to add value to low-grade sweetgum lumber.
The data set contains the modulus of elasticity obtained 1 minute after loading in a certain configuration and also 4 weeks after loading for the same lumber specimens. (Data: ex09.44)

**Does the data provide strong evidence for concluding that on average the 4-week modulus is lower than the 1-minute modulus?**

* Write the relevant hypotheses, select the applicable test and write the relevant Python code.
* Mark your decision regarding the Null-hypothesis ($\alpha = .05$) and write a conclusion with regard to the original question.
* Which type of error could have you made with your decision?

Extra: Calculate and interpret a 95%-confidence interval for the true average difference between 1-minute modulus and 4-week modulus.

In [74]:
df = read_r('../data/devore7/ex09.44.rda')['ex09.44']
df['diff'] = df['X1min'] - df['X4weeks']
df.head()

Unnamed: 0_level_0,X1min,X4weeks,diff
rownames,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,10490,9110,1380
2,16620,13250,3370
3,17300,14720,2580
4,15480,12740,2740
5,12970,10120,2850


* $H_0: \mu_{diff} = 0$
* $H_A: \mu_{diff} \neq 0$

In [75]:
stats.ttest_1samp(df['diff'], popmean=0, alternative='greater')

TtestResult(statistic=20.726642571925268, pvalue=9.400604048000547e-13, df=15)

In [76]:
stats.ttest_1samp(df['X1min'] - df['X4weeks'], popmean=0, alternative='greater')

TtestResult(statistic=20.726642571925268, pvalue=9.400604048000547e-13, df=15)

In [77]:
stats.t.interval(0.95, loc=df['diff'].mean(), scale=df['diff'].sem(), df=15)

(2364.587291455535, 2906.662708544465)

## 3) Organic Matter in Soil

A random sample of soil specimens was obtained, and the amount of organic matter (%) in the soil was determined for each specimen, resulting in the accompanying data (from “Engineering Properties of Soil,” Soil Science, 1998: 93–102). (Data: ex08.54)

**Does this data suggest that the true average percentage of organic matter in such soil is something other than 3%?**

* Write the relevant hypotheses, select the applicable test and write the relevant Python code.
* Mark your decision regarding the Null-hypothesis ($\alpha = .10$) and write a conclusion with regard to the original question.
* Which type of error could have you made with your decision?

Extra: Calculate the sample mean, sample standard deviation, and (estimated) standard error of the mean.

Extra: Would your conclusion be different if α = .05 had been used?

In [78]:
df = read_r('../data/devore7/ex08.54.rda')['ex08.54']
df.head()

Unnamed: 0_level_0,percorg
rownames,Unnamed: 1_level_1
1,1.1
2,5.09
3,0.97
4,1.59
5,4.6


In [79]:
# TODO: calculate statistial parameters
mean = df['percorg'].mean()
std = df['percorg'].std()
sem = df['percorg'].sem()
mean, std, sem

(2.481333333333333, 1.615640650839065, 0.2949742764289613)

* $H_0: \mu = 3$
* $H_A: \mu \neq 3$

In [80]:
# TODO: perform test
stats.ttest_1samp(df['percorg'], popmean=3, alternative='two-sided')

TtestResult(statistic=-1.7583454155588971, pvalue=0.08923961541442524, df=29)

## 2) Food Contamination

Recent incidents of food contamination have caused great concern among consumers.
The article: “How Safe Is That Chicken?” (Consumer Reports, Jan. 2010: 19–23) reported that 35 of 80 randomly selected Perdue brand broilers tested positively for either campylobacter or salmonella (or both),
the leading bacterial causes of food-borne disease,
whereas 66 of 80 Tyson brand broilers tested positive.

**Does it appear that the true proportion of non-contaminated Perdue broilers differs from that for the Tyson brand?**

* Write the relevant hypotheses, select the applicable test and write the relevant Python code.
* Mark your decision regarding the Null-hypothesis ($\alpha = .01$) and write a conclusion with regard to the original question.
* Which type of error could have you made with your decision?

* $H_0: \pi_{Perdue} = \pi_{Tyson}$
* $H_A: \pi_{Perdue} \neq \pi_{Tyson}$

In [81]:
proportions_ztest(count=[35, 66], nobs=[80, 80], alternative='two-sided')

(-5.079664071409531, 3.7810285696151036e-07)

## 5) Soil Heat

The article “Orchard Floor Management Utilizing Soil-Applied Coal Dust for Frost Protection” (Agri. and Forest Meteorology, 1988: 71–82) reports the following values for soil heat flux of eight plots covered with coal dust. (Data: ex08.66)
The mean soil heat flux for plots covered only with grass is 29.0.

**Assuming that the heat-flux distribution is approximately normal, does the data suggest that the coal dust is effective in increasing the mean heat flux over that for grass?**

* Write the relevant hypotheses, select the applicable test and write the relevant Python code.
* Mark your decision regarding the Null-hypothesis ($\alpha = .05$) and write a conclusion with regard to the original question.
* Which type of error could have you made with your decision?

* $H_0: \mu \leq 29.0$
* $H_A: \mu > 29.0$

In [82]:
df = read_r('../data/devore7/ex08.66.rda')['ex08.66']
stats.ttest_1samp(df['SoilHeat'], popmean=29, alternative='greater')

TtestResult(statistic=0.7742408478324565, pvalue=0.2320653906988781, df=7)

## 2) Batteries

A manufacturer of nickel-hydrogen batteries randomly selects 100 nickel plates for test cells, cycles them a specified number of times, and determines that 14 of the plates have blistered. 

**Does this provide compelling evidence for concluding that more than 10% of all plates blister under such circumstances?**

* Write the relevant hypotheses, select the applicable test and write the relevant Python code.
* Mark your decision regarding the Null-hypothesis ($\alpha = .05$) and write a conclusion with regard to the original question.
* Which type of error could have you made with your decision?

* $H_0: \pi \leq 10\%$
* $H_A: \pi > 10\%$

In [83]:
proportions_ztest(count=14, nobs=100, value=0.1, alternative='larger')

(1.1527808354084703, 0.12450017622604997)

## 3) Bearings.

The derailment of a freight train due to the catastrophic failure of a traction motor armature bearing provided the impetus for a study reported in the article “Locomotive Traction Motor Armature Bearing Life Study” (Lubrication Engr., Aug. 1997: 12–19).
A sample of 17 high-mileage traction motors was selected,
and the amount of cone penetration (mm/10) was determined both for the pinion bearing and for the commutator armature bearing, resulting in the given data (Data: ex09.72)

**Does the population mean penetration differ for the two types of bearings?**

* Write the relevant hypotheses, select the applicable test and write the relevant Python code.
* Mark your decision regarding the Null-hypothesis ($\alpha = .05$) and write a conclusion with regard to the original question.
* Which type of error could have you made with your decision?

Extra: Calculate a 95% confidence interval estimate of the population mean difference between penetration for the commutator armature bearing and penetration for the pinion bearing.

In [84]:
df = read_r('../data/devore7/ex09.72.rda')['ex09.72']
df.head()

Unnamed: 0_level_0,Motor,Commutator,Pinion
rownames,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1,211,226
2,2,273,278
3,3,305,259
4,4,258,244
5,5,270,273


* $H_0: \mu_C = \mu_P$
* $H_A: \mu_C \ne \mu_P$

In [85]:
stats.ttest_ind(df['Commutator'], df['Pinion'], alternative='two-sided')

TtestResult(statistic=-0.4140944939853426, pvalue=0.681567462952498, df=32.0)

## 6) Survey Response Rate

It is thought that the front cover and the nature of the first question on mail surveys influence the response rate.
The article “The Impact of Cover Design and First Questions on Response Rates for a Mail Survey of Skydivers” (Leisure Sciences, 1991: 67–76) tested this theory by experimenting with different cover designs.
One cover was plain; the other used a picture of a skydiver.
The researchers speculated that the return rate would be lower for the plain cover.

| Cover    | Number Sent | Number Returned |
|----------|-------------|-----------------|
| Plain    | 207 | 104 |
| Skydiver | 213 | 109 |

**Does this data support the researchers’ hypothesis?**

* Write the relevant hypotheses, select the applicable test and write the relevant Python code.
* Mark your decision regarding the Null-hypothesis ($\alpha = .05$) and write a conclusion with regard to the original question.
* Which type of error could have you made with your decision?

* $H_0: \pi_P \geq \pi_S$
* $H_A: \pi_P < \pi_S$

In [86]:
proportions_ztest(count=[104, 109], nobs=[207, 213], alternative='smaller')

(-0.19103657276130578, 0.42424846993179954)

## 7) Flame Time

The accompanying observations on residual flame time (sec) for strips of treated children’s nightwear were given in the article “An Introduction to Some Precision and Accuracy of Measurement Problems” (J. of Testing and Eval., 1982: 132–140).
Suppose a true average flame time of at most 9.75 had been mandated. (Data: ex08.70)

**Does the data suggest that this condition has not been met?**

* Write the relevant hypotheses, select the applicable test and write the relevant Python code.
* Mark your decision regarding the Null-hypothesis ($\alpha = .05$) and write a conclusion with regard to the original question.
* Which type of error could have you made with your decision?

In [87]:
df = read_r('../data/devore7/ex08.70.rda')['ex08.70']
df.head()

Unnamed: 0_level_0,time
rownames,Unnamed: 1_level_1
1,9.85
2,9.93
3,9.75
4,9.77
5,9.67


* $H_0: \mu \leq 9.75$
* $H_A: \mu > 9.75$

In [88]:
stats.ttest_1samp(df['time'], popmean=9.75, alternative='greater')

TtestResult(statistic=4.7523152326745155, pvalue=6.928587717491128e-05, df=19)

## 1) Cement Mortar

An experiment to compare the tension bond strength of polymer latex modified mortar (Portland cement mortar to which polymer latex emulsions have been added during mixing) to that of unmodified mortar resulted in 

Generate similar data by:

In [89]:
unmodified = stats.norm.rvs(size=32, loc=16.8, scale=1.4, random_state=42)
modified = stats.norm.rvs(size=40, loc=18, scale=1.6, random_state=42)

Assume that the bond strength distributions are both normal.

**Does the data provide compelling evidence to conclude that the modified mortar on average has a higher bond strength than the unmodified mortar?**

* Write the relevant hypotheses, select the applicable test and write the relevant Python code.
* Mark your decision regarding the Null-hypothesis ($\alpha = .01$) and write a conclusion with regard to the original question.
* Which type of error could have you made with your decision?

* $H_0: \mu_{mod} \leq \mu_{unmod}$
* $H_A: \mu_{mod} > \mu_{unmod}$

In [90]:
stats.ttest_ind(modified, unmodified, alternative='greater')

TtestResult(statistic=3.0536735228662875, pvalue=0.0015977456674382724, df=70.0)

## 7) Arsenic in Water

Arsenic is a known carcinogen and poison.
The standard laboratory procedures for measuring arsenic concentration (μg/L) in water are expensive.
A new relatively quick and inexpensive field laboratory method has been introduced.
See the article “Evaluation of a New Field Measurement Method for Arsenic in Drinking Water Samples,” J. of Envir. Engr., 2008: 382–388).
Suppose the arsenic concentration was measured with two methods.

Generate exercise data by

In [91]:
method1 = stats.norm.rvs(size=20, loc=19.70, scale=1.1)
method2 = stats.norm.rvs(size=20, loc=19.70, scale=1.1)

**Is there a significant difference between the means of arsenic concentration measured by method1 and method2?**

* Write the relevant hypotheses, select the applicable test and write the relevant Python code.
* Mark your decision regarding the Null-hypothesis ($\alpha = .05$) and write a conclusion with regard to the original question.
* Which type of error could have you made with your decision?

**Extra:** Is this an appropriate way of comparing the two methods?

* $H_0: \mu_1 = \mu_2$
* $H_A: \mu_1 \ne \mu_2$

In [92]:
# TODO
stats.ttest_ind(method1, method1, alternative='two-sided')

TtestResult(statistic=0.0, pvalue=1.0, df=38.0)

No, this is **not** an appropriate way of comparing the methods.
A valid comparision would require testing the pairwise difference instead of the difference of the means, i.e.

* $H_0: \mu_{diff} = 0$
* $H_A: \mu_{diff} \ne 0$