# Tools and Methods of Data Analysis
## Session 8 - Part 2

Niels Hoppe <<niels.hoppe.extern@srh.de>>

In [1]:
from pyreadr import read_r
from scipy import stats
from statsmodels.stats.proportion import proportions_ztest

## 1) Cement Mortar

An experiment to compare the tension bond strength of polymer latex modified mortar (Portland cement mortar to which polymer latex emulsions have been added during mixing) to that of unmodified mortar resulted in 

Generate similar data by:

In [2]:
modified = stats.norm.rvs(size=40, loc=18, scale=1.6)
unmodified = stats.norm.rvs(size=32, loc=16.8, scale=1.4)

Assume that the bond strength distributions are both normal.

1. Test $H_0: \mu_{mod} \leq \mu_{unmod}$ versus $H_1: \mu_{mod} > \mu_{unmod}$ at a significance level of .01.
2. Interpret your test result and the possible test error type.

In [3]:
stat, pval = stats.ttest_ind(modified, unmodified, alternative='greater')
pval

1.3193987019000564e-07

## 2) Food Contamination

Recent incidents of food contamination have caused great concern among consumers.
The article: “How Safe Is That Chicken?” (Consumer Reports, Jan. 2010: 19–23) reported that 35 of 80 randomly selected Perdue brand broilers tested positively for either campylobacter or salmonella (or both),
the leading bacterial causes of food-borne disease,
whereas 66 of 80 Tyson brand broilers tested positive.

Does it appear that the true proportion of non-contaminated Perdue broilers differs from that for the Tyson brand?
Carry out a test of hypotheses using a significance level .01.

In [4]:
stat, pval = proportions_ztest(count=[35, 66], nobs=[80, 80])
pval

3.7810285696151036e-07

## 3) Bearings.

The derailment of a freight train due to the catastrophic failure of a traction motor armature bearing provided the impetus for a study reported in the article “Locomotive Traction Motor Armature Bearing Life Study” (Lubrication Engr., Aug. 1997: 12–19).
A sample of 17 high-mileage traction motors was selected,
and the amount of cone penetration (mm/10) was determined both for the pinion bearing and for the commutator armature bearing, resulting in the given data (Data: ex09.72)

Calculate a 95% confidence interval estimate of the population mean difference between penetration for the commutator armature bearing and penetration for the pinion bearing.

Does the population mean penetration differ for the two types of bearings? (α = .05)

In [5]:
data = read_r('../data/devore7/ex09.72.rda')
df = data['ex09.72']

stat, pval = stats.ttest_ind(df['Commutator'], df['Pinion'])
pval

0.681567462952498

## 4) Ultimate Strength of Alloys

Two different types of alloy, A and B, have been used to manufacture experimental specimens of a small tension link to be used in a certain engineering application.
The ultimate strength (kilopounds per square inch, ksi) of each specimen was determined,
and the results are summarized in the following frequency distribution.

|       | A  | B  |
|-------|----|----|
|  < 34 | 18 | 13 |
| >= 34 | 22 | 29 |
| Sum   | 40 | 42 |

Compute a 95% CI for the difference between the true proportions of all specimens of alloys A and B that have an ultimate strength of at least 34 ksi.

In [6]:
stat, pval = proportions_ztest(count=[22, 29], nobs=[40, 42])
pval

0.18976020932219406

## 5) Sweetgum Lumber

The article “Development of Novel Industrial Laminated Planks from Sweetgum Lumber” (J. of Bridge Engr., 2008: 64–66) described the manufacturing and testing of composite beams designed to add value to low-grade sweetgum lumber.
The data set contains the modulus of elasticity obtained 1 minute after loading in a certain configuration and also 4 weeks after loading for the same lumber specimens. (Data: ex09.44)

Calculate and interpret a 95%-confidence interval for the true average difference between 1-minute modulus and 4-week modulus. Is the difference significant? (α = .05)

## 5) Sweetgum Lumber (cont.)

* $H_0: \mu_{1m} = \mu_{4w}$
* $H_1: \mu_{1m} \neq \mu_{4w}$

In [7]:
data = read_r('../data/devore7/ex09.44.rda')
df = data['ex09.44']
df['Difference'] = df['X1min'] - df['X4weeks']

stat, pval = stats.ttest_1samp(df['Difference'], popmean=0.)
pval

1.8801208096001095e-12

In [8]:
stat, pval = stats.ttest_rel(df['X1min'], df['X4weeks'])
pval

1.8801208096001095e-12

## 6) Survey Response Rate

It is thought that the front cover and the nature of the first question on mail surveys influence the response rate.
The article “The Impact of Cover Design and First Questions on Response Rates for a Mail Survey of Skydivers” (Leisure Sciences, 1991: 67–76) tested this theory by experimenting with different cover designs.
One cover was plain; the other used a picture of a skydiver.
The researchers speculated that the return rate would be lower for the plain cover.

| Cover    | Number Sent | Number Returned |
|----------|-------------|-----------------|
| Plain    | 207 | 104 |
| Skydiver | 213 | 109 |

Does this data support the researchers’ hypothesis? Test the relevant hypotheses using $\alpha =.5$.

* $H_0: p_{plain} \geq p_{skydiver}$
* $H_1: p_{plain} < p_{skydiver}$

In [9]:
stat, pval = proportions_ztest(count=[104, 109], nobs=[207, 213],
                               alternative='smaller')
pval

0.42424846993179954

## 7) Arsenic in Water

Arsenic is a known carcinogen and poison.
The standard laboratory procedures for measuring arsenic concentration (μg/L) in water are expensive.
A new relatively quick and inexpensive field laboratory method has been introduced.
See the article “Evaluation of a New Field Measurement Method for Arsenic in Drinking Water Samples,” J. of Envir. Engr., 2008: 382–388).
Suppose the arsenic concentration was measured with two methods.

Generate exercise data by

In [10]:
method1 = stats.norm.rvs(size=20, loc=19.70, scale=1.1)
method2 = stats.norm.rvs(size=20, loc=19.70, scale=1.1)

1. Is there a significant difference between the means of arsenic concentration measured by method1 and method2? (α = .05)
2. Is this an appropriate way of comparing the two methods?

* $H_0: \mu_2 = \mu_1$
* $H_1: \mu_2 \neq \mu_1$

In [11]:
stat, pval = stats.ttest_rel(method1, method2)
pval

0.022059506204102356

In [12]:
stat, pval = stats.ttest_1samp(abs(method1 - method2), popmean=0.)
pval

0.00021057692318965517

## 8) Effect of Temperature

An experiment to determine the effects of temperature on the survival of insect eggs was described in the article “Development Rates and a Temperature-Dependent Model of Pales Weevil” (Environ. Entomology, 1987: 956–962).
At 11°C, 73 of 91 eggs survived to the next stage of development.
At 30°C, 102 of 110 eggs survived.

Do the results of this experiment suggest that the survival rate (proportion surviving in the population) is higher for 30°C temperature?
Calculate the P-value and use it to test the appropriate hypotheses. (α = .05)

* $H_0: p_{11} \geq p_{30}$
* $H_1: p_{11} < p_{30}$

In [13]:
stat, pval = proportions_ztest(count=[73, 102], nobs=[91, 110],
                               alternative='smaller')
pval

0.004267428251156687