In [1]:
import numpy as np
import pandas as pd
from scipy import stats
from IPython.display import Markdown

# Q8.1
The mutans streptococci (MS) are bacteria (all related to Streptococcus mutans)
that can cause tooth decay. 167 persons with gum disease (elevated oral MS
levels) were recruited into a study with three treatment arms: chewing gum with
an active drug, chewing gum with no active drug, and no gum. Randomization to
the three groups was 1:1:1 (equal allocation) within blocks defined by current
smoker status. Participants in the gum groups were asked to chew the gum three
times daily for a minimum of 5 min each time and to carry out their usual oral
hygiene (tooth brushing, mouthwash, etc.). Participants in the group without
gum were asked to carry out their usual oral hygiene. During the 14 days prior to
randomization, subjects rinsed their mouths twice daily with a 0.12 % chlorhexidine
gluconate mouthrinse. They were asked to follow their assigned treatment
for three weeks. The outcome (“colony‐forming units” (CFU)/ml, a count of
blotches on a standard sized petri dish after standard preparation)
was recorded at
randomization and after 1, 2, and 3 weeks. The primary outcome was the CFU
ratio, week 3 divided by week 0. The question of interest is whether the active
gum treatment caused a decline in the level of oral MS. There are some missing
CFU data, corresponding to participants who missed visits.

(a) Examine the distribution of the primary outcome, CFU at week 3 divided
by CFU at week 0. In your judgement, is it sufficiently close to normally
distributed to consider using an ANOVA model? (We will revisit these data
in Chapter 11, where we consider models for count data.)

(b) Make a means plot (as in Figure 8.2). Just from visual inspection, do there
appear to be any differences between treatment groups? Now make the
same plot but separately for each block. Do the treatment group differences
appear to be similar between the two blocks? (If so, then we may not expect
to see a significant block by treatment interaction.)

(c) Fit a two‐way ANOVA with fixed effects for treatment group, for blocks,
and for their interaction. Justify whether treating blocks as fixed is
appropriate.

(d) Test the block by treatment interaction and justify whether or not it can be
removed from the model. Remove the interaction, and re‐fit the model
without it, if you feel it is appropriate to do so.

(e) Using your final model from (d), prepare diagnostic plots (as in Figures 8.3,
8.4, and 8.5) to assess whether the data fit the two‐way ANOVA model
assumptions. For each plot, describe the conclusions you are drawing
about the assumptions based on that plot. Also justify whether or not you
feel the independence assumption is likely to be satisfied by these data.

(f) Using your final model from (d), compute the least squares means for the
three treatment groups. Which groups are significantly different from
which other groups? Use a multiple comparisons adjustment procedure to
control for type I error inflation.

(g) Write a brief summary of the study’s conclusions.

## A8.1

# Q8.2
A study of residential 60‐Hz magnetic field levels was conducted in the Midwest.
51 homes in the Twin Cities (Minnesota, N = 24) and Detroit (Michigan, N = 27)
were selected to participate, based on a random digit dial sampling scheme.
Each home was visited seven times, approximately every two months. At each
visit, an electro-magnetic field (EMF) data‐logging meter (Emdex‐C) was used
to collect EMF levels over a 24 hour period. 24‐hour measurements are taken
from spot measurements every 30 seconds. An Emdex‐C was placed in a child’s
bedroom under the bed and in the kitchen of each home. The response of interest
is the (base 10) log‐transformed 24‐hour mean EMF measurement, multiplied
by 100. The wiring configuration of each house was also recorded and is coded
on the four point Wertheimer–Leeper scale: 1 = very low current configuration
(VLCC), 2 = ordinary low current configuration (OLCC), 3 = ordinary high
current configuration (OHCC), and 4 = very high current configuration
(VHCC).
The researchers want to quantify how the response differs
across the room types
(bedroom, kitchen) and wiring configurations. Any differences between the two
states are not of direct interest.
For this exercise, use the baseline data only (visit = 1).

(a) Make a means plot (as in Figure 8.2) with horizontal axis for wiring
configuration
and separate lines for kitchen and bedroom. You can do this
in SAS using PROC GLM PLOTS=(MEANPLOT); when you fit a twoway
ANOVA with wiring configuration, room type, and their interaction. You can do this in R using the function interaction.plot. Just from visual
inspection, do there appear to be any differences between room types? Do
the room type differences appear to be consistent across the levels of wiring
configuation? (If so, we may not expect to see a significant room by wiring
interaction.)

(b) Now make the same plots as in (a) but separately for each state. Do the
patterns in EMF across room types and wiring configuartions appear to be
similar between the two states? (If so, we may not expect to see a significant
room by wiring by state interaction.) Be cautious in reading too much into
what you see: there are relatively few observations per group when we get
down to all combinations of the three factors: room type, wiring configuration,
and state.

(c) Fit a three‐way ANOVA with fixed effects for room type, wiring configuration,
state, and all interactions. Justify whether or not treating each of
these factors as fixed is appropriate.

(d) Prepare diagnostic plots (as in Figures 8.3, 8.4, and 8.5) to assess whether
these visit‐1 data fit the three‐way ANOVA model assumptions. For each
plot, describe the conclusions you are drawing about the assumptions
based on that plot. Also justify whether or not you feel the independence
assumption is likely to be satisfied by these data. (Hint: We will revisit
these data in Chapter 12.)

## A8.2

# Q8.3
Consider the electromagnetic field data of Exercise 8.2. Again use the baseline
data only (visit = 1).

(a) Fit a three‐way ANOVA with fixed effects for room type, wiring configuration,
state, and all interactions.

(b) Now fit another three‐way ANOVA with fixed effects for room type, wiring
configuration, state, and only the interaction of room with wiring.
Carry out a general linear F‐test to compare this model with the model of
(a). What is your conclusion: is it appropriate to drop all those interactions
with state?

(c) Now fit another three‐way ANOVA with fixed effects for room type, wiring
configuration, and state (no interactions at all). Carry out a general
linear F‐test to compare this model with the model of (a). What is your
conclusion: is it appropriate to drop all the interactions?

(d) Using your preferred model [choose from (a), (b), and (c)], compute the
least squares means for each level of each of the three factors. Within
factor, which groups are significantly different from which other groups?
Use a multiple comparisons adjustment procedure to control for type I
error inflation for each factor.

(e) Write a brief summary of the study’s conclusions about how the effects
of room type, wiring configuration, and state are associated with EMF
levels.

## A8.3

# Q8.4
A mold was grown in each of 12 culture dishes under three moisture levels for
the environment in which they were grown (four plates at each moisture level);
other environmental conditions, specifically temperature, light, and nutrients,
were held constant across all dishes. Growth (measured as the diameter from
starting edge to farthest edge of the mold within the dish) was measured every
24 hours for 9 days. The diameter was measured twice each time, across the
dish at each of two reference marks on the rim of the dish, 90 degrees apart
(so the two measurements were taken at right angles to each other). We will
refer to these two measurements as “replicate”
measurements. For this exercise,
use the last observation time only (week = 9).

(a) Make a means plot (as in Figure 8.2) with horizontal axis for moisture
level and separate lines for the two replicates. You can do this in SAS
using PROC GLM PLOTS=(MEANPLOT); when you fit a two‐way
ANOVA with moisture level, replicate, and their interaction. You can do
this in R using the function interaction.plot. Just from visual inspection,
do there appear to be any differences between the moisture levels? Do the
moisture level differences appear to be consistent across the two replicates?
(If so, we may not expect to see a significant moisture by replicate
interaction.)

(b) Fit a two‐way ANOVA with fixed effects for moisture, replicate, and their
interaction. Justify whether or not treating each of these factors as fixed is
appropriate.

(c) Fit another two‐way ANOVA with fixed effects for moisture and replicate,
no interaction. Justify whether or not removing the interaction from the
model is appropriate.

(d) Using your preferred model [choose from (b) and (c)], compute the least
squares means for each level of moisture. Which groups are significantly
different from which other groups? Use a multiple comparisons adjustment
procedure to control for type I error inflation.

(e) Based on your model in (d), prepare diagnostic plots (as in Figures 8.3, 8.4,
and 8.5) to assess whether these week 9 data fit the two‐way ANOVA model
assumptions. For each plot, describe the conclusions you are drawing about
the assumptions based on that plot. Also justify whether or not you feel the
independence assumption is likely to be satisfied by these data.

(f) Write a brief summary of the study’s conclusions about how mold quantity
differs by moisture level and replicate.

## A8.4

# Q8.5
Consider the mold data of Exercise 8.4. Again use the last observation only
(week = 9).

(a) Fit a two‐way ANOVA with a fixed effect for moisture, and random effects
for replicate and the moisture by replicate interaction (as in Example 8.4).
What is a justification for treating replicate as random? Determine, and
then comment on, the sizes of the three variances: for replicate,
for replicate
by moisture interaction, and for error.

(b) Remove the random replicate by moisture interaction and re-fit the model.
Compute the least squares means for each level of moisture. How do the
estimated means, and standard errors of the means, from this model
compare
to those from the model fit in Exercise 8.4(c)?

(c) Write a brief summary of the study’s conclusions about how mold quantity
differs by moisture level.

## A8.5

# Q8.6
Magnetic resonance spectroscopy (MRS) is a magnetic resonance imaging
technique
that quantifies levels of certain biochemicals. Spinocerebellar ataxia
(SCA) is a genetically linked disease characterized by progressive degeneration of
muscle control. MRS has been used to quantify changes in the brain of people with
SCA. A mouse model of SCA has also been developed, in which the SCA can be
“turned off” by giving a drug that blocks the genetic cause of the SCA. In this
experiment, 24 SCA mice were randomly assigned 1:1 (equal allocation) to two
groups: drug and placebo. 12 control mice (same background mouse strain but
without the genetic cause of the SCA) were also studied. All mice had MRS of the
brain at week 12 and at week 24 after birth. Several biochemicals
were quantified,
including total creatine (creatine plus phosphocreatine); higher creatine may
reflect changes in energy metabolism. Mouse sex was also recorded.
For this exercise, use the week 24 measurements only.

(a) Make a means plot (as in Figure 8.2) with horizontal axis for the three
groups and separate lines for the two sexes. You can do this in SAS using
PROC GLM PLOTS=(MEANPLOT); when you fit a two‐way ANOVA
with group, sex, and their interaction. You can do this in R using the
function interaction.plot. Just from visual inspection, do there appear to
be any differences between the groups? Do the group differences appear to
be consistent across the two sexes? (If so, we may not expect to see a
significant group by sex interaction.)

(b) Fit a two‐way ANOVA with fixed effects for group, sex, and their interaction.
Justify whether or not treating each of these factors as fixed is appropriate.

(c) Fit another two‐way ANOVA with fixed effects for group and sex, no interaction.
Justify whether or not removing the interaction from the model is appropriate.

(d) Using your preferred model [choose from (b) and (c)], compute the least
squares means for each group. Which groups are significantly different
from which other groups? Use a multiple comparisons adjustment
procedure to control for type I error inflation.

(e) Based on your model in (d), prepare diagnostic plots (as in Figures 8.3, 8.4,
and 8.5) to assess whether these week 24 data fit the two‐way ANOVA
model assumptions. For each plot, describe the conclusions you are drawing
about the assumptions based on that plot. Also justify whether or not you
feel the independence assumption is likely to be satisfied by these data.

(f) Write a brief summary of the study’s conclusions about how creatine
differs
by group and sex.

## A8.6

# Q8.7
Consider the magnetic resonance spectroscopy data of Exercise 8.6. Here we
will use all the data: both week 12 and week 24.

(a) Make a means plot (as in Figure 8.2) with horizontal axis for week and separate
lines for the groups. You can do this in SAS using PROC GLM
PLOTS=(MEANPLOT); when you fit a two‐way ANOVA with week, group,
and their interaction. You can do this in R using the function interaction.plot.
Just from visual inspection, do there appear to be any differences between
groups in the trajectories across the weeks? (If there are no apparent differences,
we may not expect to see a significant group by week interaction.)

(b) Fit a three‐way ANOVA with fixed effects for group, week, sex, and all
interactions. Justify whether or not treating each of these factors as fixed is
appropriate.

(c) Prepare diagnostic plots (as in Figures 8.3, 8.4, and 8.5) to assess whether
these data fit the three‐way ANOVA model assumptions. For each plot,
describe the conclusions you are drawing about the assumptions based on
that plot. Also justify whether or not you feel the independence assumption
is likely to be satisfied by these data. (Hint: We will revisit these data in
Chapter 12.)

(d) Write a brief summary of the study’s conclusions about how creatine
differs
by group and week.

## A8.7