# R Coding Concepts for Design and Analysis of Experiments

## General Coding

### Obtain Data


```
# importing data from a file
data <- read.table("path/to/file/data.txt", header=TRUE)
# make predictors into factors
treatmentFact <- as.factor(data(dollar)treatment)
```

### Distribution Values
- t: t
- F: f
- chi-squared: chisq

```
# critical value
qf(alpha, v-1, n-v, lower.tail = FALSE)
# p value
pf(Fobs, v-1, n-v, lower.tail = FALSE)
```

## Check Model Assumptions
Assumptions
- fit: mean responses are adequately described by $E(Y_{ij})$
- outliers: no unusually small or large values in the data
- independence: errors are independent
- constant variance: errors have a constant variance
- normality: errors are a sample from a normal distribution

```
# standardized residuals
n <- length(data(dollar)yVar)
SSE <- modelTable(dollar)'Sum Sq'[2]
z <- model(dollar)residuals/sqrt(SSE/(n-1))
```

### Standardized Residuals vs Treatment
- fit: if model is a good fit, residuals are evenly scattered with no pattern
    - correction: add required independent variable to model and refit
- outliers: if there are no outliers, all points are between (-3, 3) and most are between (-1, 1)
    - correction: remove outliers

```
# standardized residuals vs treatment
plot(data(dollar)treatment, z, xlab='Treatment', ylab='Standardized Residuals', main='Scatterplot for Model Fit 
    and Outliers')
abline(h=0)
```

### Standardized Residuals vs Fitted Values
- constant variance: residuals fall evenly on both sides of y=0 line wiht no fanning or funneling pattern
    - correction: variance stabilizing transformation
    
```
# standardized residuals vs fitted values
plot(model(dollar)fitted.values, z, xlab='Fitted Values', ylab='Standardized Residuals', 
    main='Scatterplot for Constant Variance')
abline(h=0)
```

### Standardized Residuals vs Time or Spatial Order
- independence: residuals are evenly scattered with no pattern
    - correction: divide data into factors for time or space
    
```
# standardized residuals vs time order
plot(data(dollar)time, z, xlab='Time', ylab='Standardized Residuals', main='Scatterplot for 
    Independence')
abline(h=0)
```

### Normal Probability Plot
- normality: Q-Q plot is linear with slope 1
    - correction: transform response

```
# Q-Q plot
    # method 1
nscore <- qnorm((rank(z)-0.375)/(n+0.25))
plot(nscore, z, xlab='Theoretical Standard Normal Quantile', ylab='Observed Quantile', 
    main='Normal Probability Plot')
qqline(z)
    # method 2
qqnorm(z)
qqline(z)
```

## Variance Stabilizing Transformation
- $var(\epsilon_{ij}) = \sigma^2_i = k(\mu+\tau_i)^q$ or $s_i^2 = k\bar{Y}_{i\cdot}^q$
- estimate $q$ by $ln(s_i^2) = constant + qln(\bar{Y}_{i\cdot})$
    - find slope to obtain estimate
- $\begin{equation}h(Y_{ij})=\begin{cases}Y_{ij}^{1-q/2} &\text{if q doesn't equal 2}\\ln(Y_{ij}) &\text{if q equals 2 and all Yij's are nonzero} \\ln(Y_{ij}+1) &\text{if q equals 2 and some Yij's are zero} \end{cases}\end{equation}$
- fit the model $h(Y_{ij})=\mu^*+\tau_i^*+\epsilon_{ij}^*$

```
# split data by each treatment
splitData <- split.data.frame(data, data(dollar)treatment)
# find group means
ybarA. <- mean(splitData(dollar)'A'(dollar)response)
...
groupMeans <- c(ybarA., ...)
# find group variances
varA. <- var(splitData(dollar)'A'(dollar)response)
...
groupVars <- c(varA., ...)
# create linear model
logLM <- lm(log(groupVars)~log(groupMeans))
q <- logLM(dollar)coefficients[2]
# transform data
yNew <- data(dollar)response^(1-(q/2))
# refit model
newModel <- aov(yNew ~ xVar1 + xVar2 + xVar1*xVar2)  # use appropriate equation for model
newModelTable <- anova(newModel)
```

## Estimate Parameters
- least squares estimates that minimize the squared error

```
# split data based on treatment
splitDataB <- split.data.frame(data, data(dollar)block)
splitDataT <- split.data.frame(data, data(dollar)treatment) 

# find means
ybar.. <- mean(data(dollar)response)
ybar1. <- mean(splitDataB(dollar)X(dollar)response)
ybar2. <- mean(splitDataB(dollar)Y(dollar)response)
ybar.1 <- mean(splitDataT(dollar)A(dollar)response)
ybar.2 <- mean(splitDataT(dollar)B(dollar)response)
ybar.3 <- mean(splitDataT(dollar)C(dollar)response)

# find estimates for beta
b1 <- ybar1. - ybar..
b2 <- ybar2. - ybar..

# find estimates for tau
t1 <- ybar.1 - ybar..
t2 <- ybar.2 - ybar..
t3 <- ybar.3 - ybar..

# output estimate for mu
ybar..

# output estimate for sigma^2
MSE
```

Confidence Interval for Treatment Mean <br>
- $\mu_s = \mu + \tau_s$
- $\mu_s \in (\bar{Y}_{s\cdot} \pm t_{dfE,\;\alpha/2}\sqrt{\frac{MSE}{r_s}} )$

```
# find values used in confidence intervals
# correct parameters for correct error df and number of treatment observations
MSE <- modelTable(dollar)'Mean Sq'[2]
v <- numberOfTreatments
ri <- length(data(dollar)response/v
n <- length(data(dollar)response)
tval <- qt(alpha/2, n-v, lower.tail = FALSE)  # two sided, default alpha = 0.05
# width of confidence interval
width <- tval * sqrt(MSE/ri)

# CI for treatment A mean
treatmentA <- data[data(dollar)treatment == 'A',]
treatmentAMean <- mean(treatmentA(dollar)response)
treatmentACILower <- treatmentAMean - width
treatmentACIUpper <- treatmentAMean + width

# CI for treatment B mean
treatmentB <- data[data(dollar)treatment == 'B',]
treatmentBMean <- mean(treatmentB(dollar)response)
treatmentBCILower <- treatmentBMean - width
treatmentBCIUpper <- treatmentBMean + width

# CI treatment C mean
treatmentC <- data[data(dollar)treatment == 'C',]
treatmentCMean <- mean(treatmentC(dollar)response)
treatmentCCILower <- treatmentCMean - width
treatmentCCIUpper <- treatmentCMean + width
```

Confidence Interval for Variance
- $P(\sigma^2 < U) = 1-\alpha$ where $U = \frac{SSE}{\chi_{1-\alpha,\;dfE}^2}$
    - default $\alpha$ is 0.05

```
# find values used in confidence interval
SSE <- modelTable(dollar)'Sum Sq'[2]
chisquare <- qchisq(1-alpha, dfE, lower.tail = FALSE)
# find confidence interval upper limit
CIupper <- SSE/chisquare
```
### Boxplot for Each Treatment

```
# boxplot
boxplot(response ~ treatment, data=data, xlab='Treatment', ylab='Response', main='Boxplots 
        for Each Treatment')
```

## Contrasts
- a linear combination of treatment effects $\tau_1,\;...,\;\tau_\nu$ of the form $\sum_{i=1}^\nu c_i\tau_i$ where $\sum_{i=1}^\nu c_i=0$

### General Parameters
- $dfE$ is error degrees of freedom
- $dfT$ is treatment degrees of freedom
- $r_i$ is the number of observations for a specific $\tau_i$

### Confidence Interval for Single Contrast
- $\sum c_i\tau_i \in [\sum c_i\bar{Y}_{i\cdot} \pm t_{dfE,\;\alpha/2}\sqrt{MSE\sum\frac{c_i^2}{r_i}}]$

### Hypothesis Test for Single Contrast
- $H_0: \sum_{i=1}^\nu c_i\tau_i = 0$ vs $H_1: \sum_{i=1}^\nu c_i\tau_i \neq 0$
- $T_{obs} = \frac{\sum c_i\bar{Y}_{i\cdot}}{\sqrt{MSE\sum\frac{c_i^2}{r_i}}}$ 
    - under $H_0$, $T \sim t_{dfE}$
- reject $H_0$ if $T_{obs} > t_{\alpha,\;dfE}$ or p value < $\alpha$

### Multiple Comparisons
- error accumulation occurs
- use the correction method that producess the shortest interval length
- interval length = $2w\sqrt{MSE(\frac{1}{r_i}+\frac{1}{r_s})}$

```
# confidence interval width
width <- w * sqrt(MSE*(2/r))
# confidence interval for control - low dose
CLlower <- (meanControl - meanLowDose) - width
CLupper <- (meanControl - meanLowDose) + width
# confidence interval for control - high dose
CHlower <- (meanControl - meanHighDose) - width
CHupper <- (meanControl - meanHighDose) + width
# confidence interval for low dose - high dose
LHlower <- (meanLowDose - meanHighDose) - width
LHupper <- (meanLowDose - meanHighDose) + width
```

### Bonferroni Method for Multiple Comparisons
- $\sum c_i\tau_i \in [\sum c_i\bar{Y}_{i\cdot} \pm w_B\sqrt{MSE\sum\frac{c_i^2}{r_i}}]$
- $w_B = t_{dfE,\;\alpha/(2m)}$
- $m$ preplanned experiments conducted
- $\frac{\alpha}{m}$ is type I error, so total experimental-wise error is $\alpha$

```
# Bonferroni value
wB <- qt(alpha/(2*m), dfE, lower.tail=FALSE)  # default alpha is 0.05
```

### Scheffe Method for Multiple Comparisons
- $\sum c_i\tau_i \in [\sum c_i\bar{Y}_{i\cdot} \pm w_S\sqrt{MSE\sum\frac{c_i^2}{r_i}}]$
- $w_S = \sqrt{(dfT)F_{dfT,\;dfE;,\alpha}}$

```
# Scheffe value
Fval <- qf(alpha, dfT, dfE, lower.tail = FALSE)
wS <- sqrt((dfT)*Fval)
```

### Tukey Method for Multiple Comparisons
- for contrasts of the form $\tau_i-\tau_s$ where $i \neq s$
- $(\tau_i-\tau_s) \in [(\bar{Y}_{i\cdot}-\bar{Y}_{s\cdot}) \pm w_T\sqrt{MSE(\frac{1}{r_i}+\frac{1}{r_s})}]$
- $w_T = \frac{1}{\sqrt{2}}q_{\alpha(dfT+1, dfE)}$

```
# Tukey value
wT <- qtukey(alpha, dfT+1, dfE, lower.tail = FALSE)
```

## One Way ANOVA/ CRD Model
### Equation
$Y_{ij}=\mu+\tau_i+\epsilon_{ij}$ where $i=1,...,\nu$ and $j=1,...,r_i$
- $Y_{ij}$ = $j^{th}$ observation from $i^{th}$ group
- $\mu$ = grand mean
- $\tau_i$ = $i^{th}$ treatment effect, $\sum_{i=1}^\nu\tau_i=0$
- $\epsilon_{ij}$ = random error of $j^{th}$ observation from $i^{th}$ group, assume $\epsilon_{ij} \stackrel{iid}{\sim}N(0,\;\sigma^2)$

### ANOVA Table
|Sources of Variation|Degrees of Freedom|Sum of Squares|Mean Square|F Value Observed|
|:----:|:----:|:----:|:----:|:----:|
|Treatments|$\nu-1$|SStr|MStr|$F=\frac{MStr}{MSE}$|
|Errors|$n-\nu$|SSE|MSE|-|
|Total|$n-1$|SSto|-|-|

Test Treatment Effect
- $H_0: \tau_1=...=\tau_\nu$ vs $H_1:$ at least one $\tau_i$ is not equal
- $F_{obs} = \frac{MStr}{MSE}$
    - under $H_0$, $F \sim F_{\nu-1,\;n-\nu}$
- reject $H_0$ if $F_{obs} > F_{\alpha,\;\nu-1,\;n-\nu}$ or p value < $\alpha$
    - if $H_0$ is rejected, treatment effect is present

```
# create model
model <- aov(response ~ treatment, data=data)
modelTable <- anova(model)
```

### Estimate Parameters
- $\hat{\mu} = \bar{Y}_{\cdot\cdot} = \frac{1}{n}\sum_{i=1}^\nu\sum_{j=1}^{r_i}Y_{ij}$
- $\hat{\tau}_i = \bar{Y}_{i\cdot} - \bar{Y}_{\cdot\cdot} = \frac{1}{r_i}\sum_{j=1}^{r_i}Y_{ij} - \frac{1}{n}\sum_{i=1}^\nu\sum_{j=1}^{r_i}Y_{ij}$

## Two Way ANOVA/RCBD Model
### Equation
$Y_{ij}=\mu+\beta_i+\tau_j+\epsilon_{ij}$ where $i=1,...,r$ and $j=1,...,t$
- $Y_{ij}$ = response of $j^{th}$ treatment in $i^{th}$ block
- $\beta_i$ = effect of $i^{th}$ block, $\sum_{i=1}^r\beta_i=0$
- $\tau_j$ = effect of $j^{th}$ treatment, $\sum_{j=1}^t\tau_j=0$
- $\epsilon_{ij}$ = random error, assume $\epsilon_{ij} \stackrel{iid}{\sim}N(0,\;\sigma^2)$

### ANOVA Table
|Sources of Variation|Degrees of Freedom|Sum of Squares|Mean Square|F Value Observed|
|:----:|:----:|:----:|:----:|:----:|
|Block|$r-1$|SSB|MSB|$F=\frac{MSB}{MSE}$|
|Treatments|$t-1$|SStr|MStr|$F=\frac{MStr}{MSE}$|
|Errors|$(r-1)(t-1)$|SSE|MSE|-|
|Total|$rt-1$|SSto|-|-|

Test Block Effect
- $H_{0B}: \beta_1=...=\beta_r$ vs $H_{1B}:$ at least one $\beta_i$ is not equal
- $F_{B} = \frac{MSB}{MSE}$
    - under $H_{0B}$, $F_B \sim F_{r-1,\;(r-1)(t-1)}$
- reject $H_{0B}$ if $F_{B} > F_{\alpha,\;r-1,\;(r-1)(t-1)}$ or p value < $\alpha$
    - if $H_{0B}$ is rejected, block effect is present <br>
    
Test Treatment Effect
- $H_{0T}: \tau_1=...=\tau_t$ vs $H_{1T}:$ at least one $\tau_j$ is not equal
- $F_{T} = \frac{MStr}{MSE}$
    - under $H_{0T}$, $F_B \sim F_{t-1,\;(r-1)(t-1)}$
- reject $H_{0T}$ if $F_{T} > F_{\alpha,\;t-1,\;(r-1)(t-1)}$ or p value < $\alpha$
    - if $H_{0T}$ is rejected, treatment effect is present

```
# create model
model <- aov(response ~ treatment + block, data=data)
modelTable <- anova(model)
```

### Estimate Parameters
- $\hat{\mu} = \bar{Y}_{\cdot\cdot} = \frac{1}{n}\sum_{i=1}^r\sum_{j=1}^{t}Y_{ij}$
- $\hat{\beta}_i = \bar{Y}_{i\cdot} - \bar{Y}_{\cdot\cdot} = \frac{1}{r}\sum_{i=1}^{r}Y_{ij} - \frac{1}{n}\sum_{i=1}^r\sum_{j=1}^{t}Y_{ij}$
- $\hat{\tau}_j = \bar{Y}_{\cdot j} - \bar{Y}_{\cdot\cdot} = \frac{1}{t}\sum_{j=1}^{t}Y_{ij} - \frac{1}{n}\sum_{i=1}^r\sum_{j=1}^{t}Y_{ij}$

### Pairwise Comparisons
- used to see which pairs are responsible for creating factor effects

Block Pairwise Comparison
- $H_0(i, i'): \beta_i=\beta_i'$ vs $H_1(i, i'): \beta_i \neq \beta_i'$
- reject $H_0$ if $|\bar{Y}_{i\cdot} - \bar{Y}_{i'\cdot}| > t_{\alpha/2,\;(r-1)(t-1)}\sqrt{\frac{2MSE}{r}}$

Treatment Pairwise Comparison
- $H_0(j, j'): \tau_j=\tau_j'$ vs $H_1(i, i'): \tau_j \neq \tau_j'$
- reject $H_0$ if $|\bar{Y}_{\cdot j} - \bar{Y}_{\cdot j'}| > t_{\alpha/2,\;(r-1)(t-1)}\sqrt{\frac{2MSE}{t}}$

## Two Way Full Model with Interaction

### Equation
$Y_{ijk}=\mu+\alpha_i+\beta_j+(\alpha\beta)_{ij}+\epsilon_{ijk}$ where $i=1,...,a$; $j=1,...,b$; and $k=1,...,n$
- $Y_{ijk}$ = response of $i^{th}$ level of factor A and $j^{th}$ level of factor B
- $\alpha_i$ = effect of $i^{th}$ level of factor A, $\sum_{i=1}^a\alpha_i=0$
- $\beta_j$ = effect of $j^{th}$ level of factor B, $\sum_{j=1}^b\beta_j=0$
- $(\alpha\beta)_{ij}$ = interaction of the i,j factor combination, $\sum_{i=1}^a(\alpha\beta)_{ij}=0$, $\sum_{j=1}^b(\alpha\beta)_{ij}=0$
- $\epsilon_{ijk}$ = random error, assume $\epsilon_{ijk} \stackrel{iid}{\sim}N(0,\;\sigma^2)$

### ANOVA Table
|Sources of Variation|Degrees of Freedom|Sum of Squares|Mean Square|F Value Observed|
|:----:|:----:|:----:|:----:|:----:|
|Factor A|$a-1$|SSA|MSA|$F=\frac{MSA}{MSE}$|
|Factor B|$b-1$|SSB|MSB|$F=\frac{MSB}{MSE}$|
|Interaction AB|$(a-1)(b-1)$|SSAB|MSAB|$F=\frac{MSAB}{MSE}$|
|Errors|$ab(n-1)(t-1)$|SSE|MSE|-|
|Total|$abn-1$|SSto|-|-|

Test Interaction Effect
- $H_{0AB}: (\alpha\beta)_{ij}=0$ for all $i,j$ vs $H_{1AB}:$ at least one $(\alpha\beta)_{ij}$ does not equal 0
- $F_{AB} = \frac{MSAB}{MSE}$
    - under $H_{0AB}$, $F_{AB} \sim F_{(a-1)(b-1),\;ab(n-1)}$
- reject $H_{0AB}$ if $F_{AB} > F_{\alpha,\;(a-1)(b-1),\;ab(n-1)}$ or p value < $\alpha$
    - if $H_{0AB}$ is rejected, block effect is present
    - if rejected, no need to test for main effects of A and B since must include these terms in the model
- drop interaction term if fail to reject $H_{0AB}$

Test Main Effect of A
- $H_{0A}: \alpha_1=...=\alpha_a=0$ vs $H_{1A}:$ at least one $\alpha_i$ does not equal 0
- $F_{A} = \frac{MSA}{MSE}$
    - under $H_{0A}$, $F_A \sim F_{a-1,\;ab(n-1)}$
- reject $H_{0A}$ if $F_{A} > F_{\alpha,\;a-1,\;ab(n-1)}$ or p value < $\alpha$
    - if $H_{0A}$ is rejected, main effect of A is present

Test Main Effect of B
- $H_{0B}: \beta_1=...=\beta_b=0$ vs $H_{1B}:$ at least one $\beta_j$ does not equal 0
- $F_{B} = \frac{MSB}{MSE}$
    - under $H_{0B}$, $F_B \sim F_{b-1\;ab(n-1)}$
- reject $H_{0B}$ if $F_{B} > F_{\alpha\;,b-1,\;ab(n-1)}$ or p value < $\alpha$
    - if $H_{0B}$ is rejected, main effect of B is present

```
# create model
model <- aov(factorA + factorB + factorA*factorB)
modelTable <- anova(model)
```

### Estimate Parameters
- $\hat{\mu} = \bar{Y}_{\cdots} = \frac{1}{N}\sum_{i=1}^a\sum_{j=1}^b\sum_{k=1}^nY_{ijk}$
- $\hat{\alpha}_i = \bar{Y}_{i\cdot\cdot} - \bar{Y}_{\cdots} = \frac{1}{bn}\sum_{j=1}^b\sum_{k=1}^nY_{ijk} - \frac{1}{N}\sum_{i=1}^a\sum_{j=1}^b\sum_{k=1}^nY_{ijk}$
- $\hat{\beta}_j = \bar{Y}_{\cdot j\cdot} - \bar{Y}_{\cdots} = \frac{1}{an}\sum_{i=1}^a\sum_{k=1}^nY_{ijk} - \frac{1}{N}\sum_{i=1}^a\sum_{j=1}^b\sum_{k=1}^nY_{ijk}$
- $(\hat{\alpha\beta})_{ij} = \bar{Y}_{ij\cdot} - \bar{Y}_{i\cdot\cdot} - \bar{Y}_{\cdot j\cdot} + \bar{Y}_{\cdots} = \frac{1}{n}\sum_{k=1}^nY_{ijk} - \frac{1}{bn}\sum_{j=1}^b\sum_{k=1}^nY_{ijk} - \frac{1}{an}\sum_{i=1}^a\sum_{k=1}^nY_{ijk} + \frac{1}{N}\sum_{i=1}^a\sum_{j=1}^b\sum_{k=1}^nY_{ijk}$

### Interaction Plot
Interpreting Plot:
- parallel lines indicate no interaction
    - sample data rarely produces lines that are exactly parallel
    - therefore, almost parallel lines can be interpreted as no interaction
Parameters:
- type corresponds to type of plot (b means lines and points)
- fixed indicates the legend is ordered by levels of factor B

```
# interaction plot
interaction.plot(x.factor=factorA, trace.factor=factorB, response=yVar, fun=mean, type='b', 
                fixed=TRUE, xlab='Factor A', ylab='Mean of Y Corresponding to A and B')
```

### Contrasts
Confidence Interval for Single Contrast
- $\sum c_i\alpha_i \in [\sum c_i\bar{Y}_{i\cdot\cdot} \pm t_{\alpha/2,;/,ab(n-1)}\sqrt{MSE\sum_{i=1}^a\frac{c_i^2}{bn}}]$
- $\sum c_j\beta_j \in [\sum c_j\bar{Y}_{\cdot j\cdot} \pm t_{\alpha/2,;/,ab(n-1)}\sqrt{MSE\sum_{j=1}^b\frac{c_j^2}{an}}]$

## Latin Square Incomplete Block Design

|Ai\Bj |B1   |B2   |B3   |
|----- |-----|-----|-----|
|**A1**|Z=20 |Y=21 |X=18 |
|**A2**|X=9  |Z=5  |Y=6  |
|**A3**|Y=1  |X=2  |Z=4  |

### Latin Square Data Frame

```
# assign data to variables
row <- c(rep('a1',1), rep('a2',1), rep('a3',1))
column <- c(rep('b1',3), rep('b2',3), rep('b3',3))
treatment <- c('Z', 'X', 'Y', 'Y', 'Z', 'X', 'X', 'Y', 'Z')
freq <- c(20, 9, 1, 21, 5, 2, 18, 6, 4)
# create data frame
data <- data.frame(row, column, treatment, freq
```

### Equation
$Y_{ijk}=\mu+\alpha_i+\beta_j+\tau_k+\epsilon_{ijk}$ where $i=1,...,t$; $j=1,...,t$; and $k=1,...,t$
- $Y_{ijk}$ = response of $i^{th}$ row, $j^{th}$ column, and $k^{th}$ treatment
- $\alpha_i$ = effect of $i^{th}$ row, $\sum_{i=1}^t\alpha_i=0$
- $\beta_j$ = effect of $j^{th}$ column, $\sum_{j=1}^t\beta_j=0$
- $\tau_k$ = effect of $k^{th}$ treatment, $\sum_{k=1}^t\tau_k=0$
- $\epsilon_{ijk}$ = random error, assume $\epsilon_{ijk} \stackrel{iid}{\sim}N(0,\;\sigma^2)$

### ANOVA Table
|Sources of Variation|Degrees of Freedom|Sum of Squares|Mean Square|F Value Observed|
|:----:|:----:|:----:|:----:|:----:|
|Row|$t-1$|SSrow|MSrow|$F=\frac{MSrow}{MSE}$|
|Column|$t-1$|SScol|MScol|$F=\frac{MScol}{MSE}$|
|Treatment|$t-1$|SStr|MStr|$F=\frac{MStr}{MSE}$
|Errors|$(t-1)(t-2)$|SSE|MSE|-|
|Total|$t^2-1$|SSto|-|-|

Test Row Block Factors
- $H_{0A}: \alpha_1=...=\alpha_t$ vs $H_{1A}:$ not all $\alpha_i$'s are equal
- $F_A = \frac{MSrow}{MSE}$
    - under $H_{0A}$, $F_A \sim F_{t-1,\;(t-1)(t-2)}$
- reject $H_{0A}$ if $F_A > F_{\alpha,\;t-1,\;(t-1)(t-2)}$
    - if $H_{0A}$ is rejected, row block effect is present

Test Column Block Factors
- $H_{0B}: \beta_1=...=\beta_t$ vs $H_{1B}:$ not all $\beta_j$'s are equal
- $F_B = \frac{MScol}{MSE}$
    - under $H_{0B}$, $F_B \sim F_{t-1,\;(t-1)(t-2)}$
- reject $H_{0B}$ if $F_B > F_{\alpha,\;t-1,\;(t-1)(t-2)}$
    - if $H_{0B}$ is rejected, column block effect is present

Test Treatment Factors
- $H_{0T}: \tau_1=...=\tau_t$ vs $H_{1T}:$ not all $\tau_k$'s are equal
- $F_T = \frac{MStr}{MSE}$
    - under $H_{0T}$, $F_T \sim F_{t-1,\;(t-1)(t-2)}$
- reject $H_{0T}$ if $F_T > F_{\alpha,\;t-1,\;(t-1)(t-2)}$
    - if $H_{0T}$ is rejected, treatment effect is present

```
# create model
model <- aov(freq ~ row + column + treatment, data=data)
modelTable <- anova(model)
```

### Estimate Parameters
- $\hat{\mu} = \bar{Y}_{\cdots} =\frac{1}{t^2}\sum_{i=1}^t\sum_{j=1}^t\sum_{k=1}^tY_{ijk}\lambda_{ijk}$ 
- $\hat{\alpha}_{i} = \bar{Y}_{i\cdot\cdot}- \bar{Y}_{\cdots} = \frac{1}{t}\sum_{j=1}^t\sum_{k=1}^tY_{ijk}\lambda_{ijk} - \frac{1}{t^2}\sum_{i=1}^t\sum_{j=1}^t\sum_{k=1}^tY_{ijk}\lambda_{ijk}$
- $\hat{\beta}_{j} = \bar{Y}_{\cdot j\cdot}- \bar{Y}_{\cdots} = \frac{1}{t}\sum_{i=1}^t\sum_{k=1}^tY_{ijk}\lambda_{ijk} - \frac{1}{t^2}\sum_{i=1}^t\sum_{j=1}^t\sum_{k=1}^tY_{ijk}\lambda_{ijk}$
- $\hat{\tau}_{k} = \bar{Y}_{\cdot\cdot k}- \bar{Y}_{\cdots} = \frac{1}{t}\sum_{i=1}^t\sum_{j=1}^tY_{ijk}\lambda_{ijk} - \frac{1}{t^2}\sum_{i=1}^t\sum_{j=1}^t\sum_{k=1}^tY_{ijk}\lambda_{ijk}$

```
# create matrix of the data
dataMat <- matrix(data(dollar)freq,3,3)
# find means for rows and columns
rowAve <- rowMeans(dataMat)
colAve <- colMeans(dataMat)
# split data based on treatment
splitData <- split.data.frame(data, data(dollar)treatment)

# find means
ybar... <- mean(data(dollar)freq)
ybar1.. <- rowAve[1]
ybar2.. <- rowAve[2]
ybar3.. <- rowAve[3]
ybar.1. <- colAve[1]
ybar.2. <- colAve[2]
ybar.3. <- colAve[3]
ybar..1 <- mean(splitData(dollar)X(dollar)freq)
ybar..2 <- mean(splitData(dollar)Y(dollar)freq)
ybar..3 <- mean(splitData(dollar)Z(dollar)freq)
```

## 2^k Factorial Design
- k treatment factors
    - two levels (low and high)
- $2^k$ treatment combinations
- $2^k-1$ treatment effects
- r blocks
    - each treatment appears once in each block (r total times in the experiment)

### 2^k Factorial Effects
- $[a]$ = total response of treatment a from all replicates
    - $[a]=\sum_{i=1}^ra_i$ or $[a]=r*(a)$
- $(a)$ = mean yield of treatment a from all replicates
    - $(a)=\frac{[a]}{r}$
- $[A]$ = total effect of factor A, estimated by sign table method
- $A$ = main effect of factor A
    - $A = \frac{[A]}{2^{k-1}r}$
- $B_i$ = block totals

Main and Interaction Effects
- $A = \frac{1}{2^{k-1}}(a-1)(b+1)\times ... \times(n+1)$
- $AB = \frac{1}{2^{k-1}}(a-1)(b-1)(c+1)\times ... \times(n+1)$
- $\vdots$
- $AB...N = \frac{1}{2^{k-1}}(a-1)\times ... \times(n-1)$

Total Effect from Sign Table
- sum together total responses $[a]$ using sign corresponding to sign table for total effect

Symbols for $2^2$ Factorial
- $(1) = a_0b_0: A^-,\;B^-$
- $a = a_1b_0: A^+,\;B^-$
- $b = a_0b_1: A^-,\;B^+$
- $ab = a_1b_1: A^+,\;B^+$

### Sign Table

|Treatment Combination|A      |B      |C      |D      |
|-----|-----|-----|-----|-----|
|(1)   |-   |-   |-   |-   |
|a     |+   |-   |-   |-   |
|b     |-   |+   |-   |-   |
|ab    |+   |+   |-   |-   |
|c     |-   |-   |+   |-   |
|ac    |+   |-   |+   |-   |
|bc    |-   |+   |+   |-   |
|abc   |+   |+   |+   |-   |
|d     |-   |-   |-   |+   |
|ad    |+   |-   |-   |+   |
|bd    |-   |+   |-   |+   |
|abd   |+   |+   |-   |+   |
|cd    |-   |-   |+   |+   |
|acd   |+   |-   |+   |+   |
|bcd   |-   |+   |+   |+   |
|abcd  |+   |+   |+   |+   |

```
# signs for main variables
A <- rep(c(-1,1), 8)
B <- rep(c(-1,-1,1,1), 4)
C <- rep(c(-1,-1,-1,-1,1,1,1,1), 2)
D <- c(-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1)
# signs for interactions
AB <- A*B
AC <- A*C
AD <- A*D
BC <- B*C
BD <- B*D
CD <- C*D
ABC <- A*B*C
ABD <- A*B*D
ACD <- A*C*D
BCD <- B*C*D
ABCD <- A*B*C*D
# create sign table
signTable <- cbind(A,B,C,D,AB,AC,AD,BC,BD,CD,ABC,ABD,ACD,BCD,ABCD)
```

### Equation
$Y_{ijk}=\mu+B_{i}+\tau_j+\epsilon_{ij}$
- $B_{i}$ = effect of ith block
- $\tau_j$ = effect of significant factors

### ANOVA Table
|Sources of Variation|Degrees of Freedom|Sum of Squares|Mean Square|F Value Observed|
|:----:|:----:|:----:|:----:|:----:|
|Blocks|$r-1$|SSBlock|MSBlock|$F=\frac{MSBlock}{MSE}$|
|A|$1$|SSA|MSA|$F=\frac{MSA}{MSE}$|
|B|$1$|SSB|MSB|$F=\frac{MSB}{MSE}$|
|$\vdots$|$\vdots$|$\vdots$|$\vdots$|$\vdots$|
|N|$1$|SSN|MSN|$F=\frac{MSN}{MSE}$|
|AB|$1$|SSAB|MSAB|$F=\frac{MSAB}{MSE}$|
|AC|$1$|SSAC|MSAC|$F=\frac{MSAC}{MSE}$|
|$\vdots$|$\vdots$|$\vdots$|$\vdots$|$\vdots$|
|ABC|$1$|SSABC|MSABC|$F=\frac{MSABC}{MSE}$|
|$\vdots$|$\vdots$|$\vdots$|$\vdots$|$\vdots$|
|ABC...N|$1$|SSABC...N|MSABC...N|$F=\frac{MSABC...N}{MSE}$|
|Errors|$(2^k-1)(r-1)$|SSE|MSE|-|
|Total|$2^kr-1$|SSto|-|-|

```
# factor values
# contains two copies of the factor because there are two blocks
Atot <- c(A,A)
Btot <- c(B,B)
Ctot <- c(C,C)
Dtot <- c(D,D)
Y <- c(rep1,rep2)
# block values
block <- c(rep('1', 16), rep('2', 16))
# columns of response values
response <- c(block1, block2)
# create model
model <- aov(response ~ Atot*Btot*Ctot*Dtot + block)
modelTable <- anova(model)
```

### Overfitting
- if r=1, the error degrees of freedom equals 0 
    - this means we have a perfect fit
    - thus, the model cannot be applied to other data
- therefore, we must drop at least one factor to gain a degree of freedom

## (2^N, 2^k) Design
- $N$ treatment factors 
    - two levels (low and high)
- $2^N$ treatment combinations
- $2^k-1$ confounded effects
    - $k$ independent counfounded effects
    - dependent confounded effects found by multiplying together independent confounded effects
        - $ABC$ and $BD$ are confounded independently
        - $ABC\times BD = AB^2CD = ACD$, so $ACD$ is the depended confounded effect
- $2^k$ blocks
- $2^{N-k}$ observations in each block

### Equation
$Y_{ijk}=\mu+R_i+B_{ij}+\tau_k+\epsilon_{ijk}$
- $R_i$ = effect due to replicate i
- $B_{ij}$ = effect of jth block within replicate i
- $\tau_k$ = effect of significant factors

### ANOVA Table

```
# replication and block factors, corresponds to the replication and block each main effect belongs to
replicate <- c(rep('1', 16), rep('2', 16))
block <- c('1', '2', '2', '1', '2', '1', '1', '2', '2', '1', '1', '2', '1', '2', '2', '1', 
            '3', '4', '4', '3','4', '3', '3', '4', '3', '4', '4', '3', '4', '3', '3', '4')
# create model
model <- aov(response ~ replicate + block + Atot*Btot*Ctot*Dtot
modelTable <- anova(model)
```

### Confounded Effects
- confounding is a method of reducing the block size by making one or more treatment contrasts equal block contrasts
- choose treatment combinations with small total effects or sum of squares

Complete Confounding
- if same treatment effect is confouned in all replicates
- lose all information about confounded effect (must remove from the table), but gain more information about unconfounded effects
    - sum of squares due to confounded effect is included in SSE
    - other sum of squares calculated in usual manner

Partial Confounding
- different treatment effects are confounded in different replicates
- doesn't lose all information about confounded effects since can estimate them from other replicates where effect isn't confounded
- example: replicate 1 confounds $ABC$, replicate 2 confounds $AB$
    - use data from replicate 2 to find SSABC, use data from replicate 1 to find SSAB
        - when calculating the sum of squares for replicates, divide by the number of observations in the replicate instead of the total number of observations
        
Degrees of Freedom
- $df_{effects}$ = 1
- $df_{replicates} = number\;of\;replicates-1$
- $df_{blocks} = number\;of\;blocks-1$
- $df_{blocks\;within\;replicates}=df_{block}-df_{replicates}$
- $df_{error}= df_{total} - df_{others}$
- $df_{total} = number\;of\;observations-1$
        
### Obtaining Layout
- generate a key block that satisfies each independent equation (contains (1))
    - means each treatment combination has an even number of common letters with the confounded effects in the key
- other blocks constructed one-by-one by introducing a new treatment combination (not in previous blocks) and multiplying it with combinations of the key block

#### Example
- $(2^5,\;2^3)$ design independently confounds $AC,\;CDE,$ and $BD$ <br>

Dependent Confounded Effects
- $ADE,\;ABCD,\;BCE,\;ABE$ <br>

Key Block 
- key block = {$(1),\;ace,\;bde,\;abcd$}
    - elements inside the key block have 0(mod 2) common items with independent confounded effects <br>
    
Other Blocks
- block 1: (key * a) = {$a,\;ce,\;abde,\;bcd$}
- block 2: (key * b) = {$b,\;abce,\;de,\;acd$}
- block 3: (key * c) = {$c,\;ae,\;bcde,\;abd$}
- block 4: (key * d) = {$d,\;acde,\;be,\;abc$}
- block 5: (key * e) = {$e,\;ac,\;bd,\;abcde$}
- block 6: (key * ab) = {$ab,\;bce,\;ade,\;cd$}
- block 7: (key * bc) = {$bc,\;abe,\;cde,\;ad$}