# Analysis of variance (ANOVA) 3: Evaluation of the effect of multiple factors (From one-way to multi-way)

Let's start again with the fertilizer example. The farmer finally found out the way to evaluate the effect of 3 fertilizers on the plant growth. He calculated the sum of squares (the variances) and subsequently the **F-statistic** to test the corresponding hypothesis from the following table:

In [3]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

m1, sd1 = 140, 2
m2, sd2 = 150, 5
m3, sd3 = 160, 5
m4, sd4 = 175, 7

f1 = np.random.normal(m1, sd1, 10)
f2 = np.random.normal(m2, sd2, 10 )
f3 = np.random.normal(m3, sd3, 10)
f4 = np.random.normal(m4, sd4, 10)

plant_height = pd.DataFrame({'Control':f1, 'Fert1':f2, 'Fert2':f3, 'Fert3':f4})
plant_height.index = ['sapling1', 'sapling2', 'sapling3', 'sapling4', 'sapling5', 'sapling6', 'sapling7', 'sapling8', 'sapling9', 'sapling10']


In [4]:
plant_height

Unnamed: 0,Control,Fert1,Fert2,Fert3
sapling1,139.353686,147.636171,154.060048,170.886932
sapling2,138.185835,145.055383,155.844355,174.694368
sapling3,140.058224,147.54245,163.003596,164.988508
sapling4,138.256892,139.838432,153.675441,170.281293
sapling5,139.540967,150.851112,155.046009,171.215795
sapling6,137.93502,150.008522,153.752873,174.114742
sapling7,141.756371,148.621995,149.788624,178.967506
sapling8,136.256571,144.309818,158.744635,181.759655
sapling9,143.46621,148.922125,166.94301,186.795235
sapling10,142.264644,149.266003,166.723476,171.604064


## Growth depends also on plant species

Suppose the saplings were baby mango trees. Now the Farmer also found that there are 3 varieties of mango species and the growth rate of the species are different. So the farmer cultivated 3 different species in 3 separate plots. **After one month,** he collected 3 sets of saplings corresponding to each variety, each consisting of 10 saplings of the same species and measured their growth in natural environment. Here is the summary of his observations:

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

m1, sd1 = 140, 2
m2, sd2 = 152, 5
m3, sd3 = 163, 5
m4, sd4 = 180, 7

f1 = np.random.normal(m1, sd1, 10)
f2 = np.random.normal(m2, sd2, 10 )
f3 = np.random.normal(m3, sd3, 10)
f4 = np.random.normal(m4, sd4, 10)

plant_height_2 = pd.DataFrame({'Height at beginning':f1, '30 days Species1':f2, ' 30 days Species2':f3, '30 days Species3':f4})
plant_height_2.index = ['sapling1', 'sapling2', 'sapling3', 'sapling4', 'sapling5', 'sapling6', 'sapling7', 'sapling8', 'sapling9', 'sapling10']


In [3]:
plant_height_2

Unnamed: 0,Height at beginning,30 days Species1,30 days Species2,30 days Species3
sapling1,140.453156,155.342262,160.356823,182.50223
sapling2,143.324193,146.278639,162.568835,178.887863
sapling3,140.643492,153.660995,164.818756,180.68097
sapling4,139.1112,151.232177,158.807055,177.530593
sapling5,138.066542,148.082442,162.386062,187.956717
sapling6,138.412326,149.928668,158.026598,180.597058
sapling7,142.992556,157.809692,167.34731,189.886106
sapling8,142.714508,150.651295,164.053552,184.733159
sapling9,136.12887,143.739257,165.858181,192.254225
sapling10,141.6533,153.534805,166.546497,172.160261


Again, the saplings in each group are selected randomly from 3 pools of saplings (from 3 plots), **so the 10 saplings in each sample (of size 10) are not same**.<br/><br/>Now, as the farmer is now familiar with the `Mean Square`, `F-statistic` and `Hypothesis testing`; he easily verified whether the growth across 3 species vary significantly or not.

So, basically the farmer carried out 2 different experiments to study the effect of 2 factors Fertilizer and species separately. 

## What if the resources are limited?

Often we do not have such luxury to waste our resources or we may not have enough time to do separate experiments for each effect. For example, the farmer does not have 6 different plots: 3 for studying fertilizers and 3 for studying the specieses. So the option left to him is doing the 2 studies one after another and wait for two months to study 2 different effects: 1 month for fertilizer and 1 month for specieses. But, as time is also valuable, this option is also not satisfactory.

## Simultaneous experiments

What if these two experiments were done simultaneously. Consider the 3 sets of 9 saplings, 3 from each species are planted in a plot and a fertilizer, say `Fert1` is applied on them. Likewise, `Fert2` and `Fert3` are apllied on 2 other plots containing 2 sets of 9 saplings (3 from each species in a set) and the plant growths are measured after one month. You may question that how would we recognize that the plants were from different species if their heights after one month is almost same. Well, let's assume that the farmer is intelligent and he tied small ribbons of 3 different colors to mark the specieses. Wow!!! How nicely he saved both time and plots right??

## From 1-way to multi-way

So he calculated 2 different **F-statistic**s to conclude the effect of two different factor on plant growth, in single experiment. This is an example of a `2-way ANOVA`. So it is obvious that the previous examples were `1-way ANOVA`.

Now, say the farmer being greedy (after learning ANOVA) also wants to study the effect of watering on sapling growth. So, he applied different amount of water on the 3 plots and after one month calculated 3 different **F-statistic**s to evaluate the effect of `1)Fertilizers`, `2)Specieses` and `3)Watering`, leading to a `3-way ANOVA`. In this way, he now can formulate n-way ANOVA to evaluate the effect of n number of factors. Amazing right??

## Hold on..

The farmer was very excited and happy learning to evaluate the effect on multiple fctors simultaneously. But suddenly it struck him that what if

***1) The factors are dependent on each other:*** for example, the effect of fertilizer may vary from species to species.<br/>
***2) The number of categories/labels under multiple factors are not same:*** for example, the farmer wants to see the effect of 4 `Watering` methods and 3 `Fertilizers`.

Confused?? Welcome to the actual world of ANOVA. As I told earlier, ANOVA is a huge topic and the above 2 points leads to 2 major sub-topics under ANOVA : **Interaction** and **Experimental design**. 

But, again, the posts will be short, easy and modular. So this post ends here and the 2 new jargons introduced above will be discussed one-by-one in the upcoming articles of the ANOVA series. So see you soon, byee :) . 