# Single-variable experiment with variable levels

An engineer is tasked to investigate the effects of electricity power on the etch rate in the design of electronics.

Suppose if the engineer has 4 levels of RF power and 5 wafers at each level of RF power, we can then carry out a single variable experiment with parameters a=4 (the levels of the variable) and n=5 (number of wafers). We will then have 20 runs as illustrated below:

<img src='https://d2vlcm61l7u1fs.cloudfront.net/media%2F9a7%2F9a77a4ff-771a-4ab1-9fa3-acc38622d6fa%2FphpXhEn3w.png' width='600px' />

Using a box-plot, it is easily observable that
<ol>
<li>RF power setting affects the etch rate</li>
<li>higher power settings result in a higher etch rate.</li>
</ol>

<img src="https://i.ibb.co/Y45v2m5/Screenshot-2021-07-21-at-3-32-40-PM.png" width='400px'>

However sometimes, experimental results may not be straightforward and it could be hard to discern if the X variable affects the target variable Y. One way to scientifically decide on this is to apply the Analysis of Variance procedure (ANOVA).

<br><br>
### ANOVA (Analysis of Variance)


A typical data for a single-factor experiment:
<img src='https://slideplayer.com/slide/5084620/16/images/4/3.2+The+Analysis+of+Variance.jpg' width='600px'/>

The 'a' treatments is a random variable. An entry in the table (e.g. *y*<sub>ij</sub>) represents the j<sup>th</sup> observation taken under variable level or treatment i.

### The Linear Model (basic single-factor ANOVA model)
In ANOVA, we assumed that the output variable y is generated based on the linear model:

<center>
<em>*y*<sub>ij</sub> = μ + τi + ε<sub>ij</sub>, for i = 1,2,...a; j=1,2,...n</em>
</center>

* y<sub>ij</sub>: The observed value of the target variable y, in the specified treatment & run
* μ: Overall mean of y, regardless of treatment, and no random error involved.
* τi: Effect on y due to the i<sup>th</sup> treatment (aka the i<sup>th</sup> treatment effect).

* ε<sub>ij</sub>: Random error component that incorporates all other sources of variability in the experiment. It is convenient and common to assume ε<sub>ij</sub> ~ N(0, σ<sup>2</sup>).

* i: The specific treatment applied; j: the specific run (replicated) applied.

* a: Total number of treatment levels.
* n: Total number of replicates for each treatment.

From the equation, we can see how variables and random errors are combined to affect the outcome <em>y</em>.
If the 'magnitude' of the treatment effects is much bigger than the 'magnitude' of random errors, we have a good reason to believe that the treatment are important and make a difference to the outcome, if not, the treatment effects are not significant and could either be irrelevant in affecting outcomes or the effects are too small in the presence of random errors, ε.

### Importance of Randomised Design
The above linear model is the one-way or single-variable ANOVA model as only one variable is investigated. While carrying out the experiment, it is important to run it in random order so that the environmental differences in which the treatments are applied (aka experimental units) are as uniform as possible across units or bias randomised away. 

For hypothesis testing, the model errors are commonly assumed to be normal and independently distributed random variables with mean 0 and constant standard deviation at any levels of the variable. This implies that the observations of y are mutually independent and follows the normal distribution:

<center>y<sub>ij</sub> ~ N(μ + τ<sub>i</sub> , σ<sup>2</sup>)</center>


Random Effects Model vs Fixed Effects Model:
### Random Effects Model
Alternative to the fixed effects model, the 'a' treatments could be a random sample from a bigger population of treatments. Then, we would like to extend the conclusions to all treatments in the population, regardless of whether or not they were explicitly considered. Here, the τ<sub>i</sub> are random variables and the ones that were investigated is useful but not constraining. Instead, we test hypothesis about the variability of τ<sub>i</sub> and try to estimate the variability. This is called the random effects model or components of variance model.


### Fixed Effects Model

<!-- When our conclusion on the hypothesis of the treatment means applies to the variable levels considered in the analysis, the conclusion cannot be extended to other treatment levels that were not considered. Thus, we wish to estimate the model parameters (µ, τ<sub>i</sub>, σ<sup>2</sup>). -->

The Fixed Effects Model 

Using the Fixed Effects Model, we are interested in testing the equality of the 'a' treatments:
<center>H<sub>0</sub>: τ<sub>1</sub>=τ<sub>2</sub>=τ<sub>a</sub>=0</center>

<center>H<sub>1</sub>: τ<sub>i</sub> ≠ 0 <em>for at least one i</em></center>

<center>τ<sub>i</sub>: Effect on y due to the a<sup>th</sup> treatment</center>

### Total variability



<center>
    $SS_T = SS_{Treatments} + SS_E$
    <br/><br/>
    <img src='https://latex.codecogs.com/svg.latex?SS_T%3D%5Csum_%7Bi%3D1%7D%5E%7Ba%7D%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%28y_i_j-%5Coverline%7By%7D..%29%5E2%20%5C%5C%20%3D%5Csum_%7Bi%3D1%7D%5E%7Ba%7D%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%5B%28%5Coverline%7By%7D_i-%5Coverline%7By%7D..%29&plus;%28y_i_j-%5Coverline%7By%7D_i%29%5D%5E2%5C%5C%3Dn%5Csum_%7Bi%3D1%7D%5E%7Ba%7D%28%5Coverline%7By%7D_i-%5Coverline%7By%7D..%29&plus;%5Csum_%7Bi%3D1%7D%5E%7Ba%7D%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%28y_i_j-%5Coverline%7By%7D_i%29%5E2' width='300px'/>
    <br/><br/>
    <img src="https://i.ibb.co/FgpdpYJ/su2ch1-table2-3.png" alt="su2ch1-table2-3" width='600px' border="0">
</center>
<br/>

The reference distribution for $F_0 is the F_{a-1, a(n-1)}$ distribution \
$F_0 = \frac{MS_{Treaments}}{MS_E}$ \
We reject the null hypothesis if $F_0 > F_{a, a-1, a(n-1)}$ or equivalently if the P-value < 5%


