# Analysis of Variance: Example of Fertilizers 

You want to compare the quality of the soil for 4 fertilizers. To do this, 16 pieces of land with the same characteristics have been taken at random, divided into four groups of equal size and each of them has used a different type of fertilizer. The results are given in quintals/HA. Consider a level of significance of $5\%$.



In [6]:
# read excel data
 library("readxl")
fertilizer <-read_excel("Fertilizer.xls")
str(fertilizer)
#We change the variable type to factor
fertilizer$Type.f <-as.factor(fertilizer$Type)

tibble [16 × 2] (S3: tbl_df/tbl/data.frame)
 $ Profit: num [1:16] 218 198 215 174 263 201 219 226 274 248 ...
 $ Type  : chr [1:16] "A" "A" "A" "A" ...


First, we get some statistics of the profit in terms of the type of type of fertilizer:

In [10]:
#install.packages("RcmdrMisc")
#library("RcmdrMisc")
numSummary(fertilizer[,"Profit"], groups=fertilizer$Type.f,  statistics=c("mean", "sd", "quantiles"), quantiles=c(0,.25,.5,.75,1))

    mean       sd  0%    25%   50%    75% 100% Profit:n
A 201.25 20.18869 174 192.00 206.5 215.75  218        4
B 227.25 26.05603 201 214.50 222.5 235.25  263        4
C 272.50 37.54553 243 246.75 261.0 286.75  325        4
D 222.00 65.13064 180 182.25 195.0 234.75  318        4

### Check normality 

In [11]:
shapiro.test(fertilizer(DOLAR)Profit[fertilizer$Type=="A"])


	Shapiro-Wilk normality test

data:  fertilizer$Profit[fertilizer$Type == "A"]
W = 0.89425, p-value = 0.4031


When we analyze the normality of the profit of type-A fertilizer, the value of the statistic of the Shapiro-wilk is 0.89415 and of the p-value 0.4301. Therefore, with a significance level of  $5\%$, we cannot reject the normality of type-A fertilizer profit.

#### Note: 
* For the normality of type B, C and D, we proceed in the same way.
* When the size of the sample is larger than $30$, we use the Kolmogorov-Smirnoff test (with the command ks.test)

### Homogeneity of Variances

In [12]:
leveneTest(fertilizer(DOLAR)Profit, fertilizer$Type.f, center=mean) 

Unnamed: 0_level_0,Df,F value,Pr(>F)
Unnamed: 0_level_1,<int>,<dbl>,<dbl>
group,3,1.804026,0.2000296
,12,,


In the Levene's test, $H_0$ is that the variance of all the populations is equal. Thus, the value of the statistic we obtained is 1.804026, the degrees of freedom 3 and the p-value 0.2000296. Therefore, with a significance level of $5\%$, we cannot reject the equality of variance.

As a result, we conclude that we are in the conditions to apply the Analysis of Variance test. The hypothesis testing in this case is the following: 

$H_0:\ \mu_A=\mu_B=\mu_C=\mu_D$

$H_1: \exists i,j$ where $\mu_j\neq \mu_i$



In [13]:
Anova <-  aov(Profit ~ Type.f, data=fertilizer)
summary(Anova)

            Df Sum Sq Mean Sq F value Pr(>F)
Type.f       3  10808    3603   2.139  0.149
Residuals   12  20214    1685               

Since p-value = 0.1486 > 0.05, we cannot reject $H_0$ and therefore, we conclude that, with a significance level of $5\%$m, there is no difference on the mean of the four populations. This means that the type of fertilizer does not influece on the profit of the lands.  