# Statistical hypotheses in R. Example 10
## Multiple comparisons test. Numeric data
**Task.** Three groups of 1 month old mice were subjected to the diet effect study. The mice were administered to the standard food rations with additional dietary suppplements F1, F2, F3 during the month. At the end of the study the body mass changes (BMC) were measured and garnered in a table below. Find if the effect of the body mass change is related to the specific dietary supplement.

F1|F2|F3
-|-|-
7.1|7.6|9.2
7.0|7.3|8.3
7.0|7.3|9.1
8.6|8.3|9.0
8.2|7.6|8.9
6.6|6.6|9.0
6.8|6.7|9.2
6.7|6.8|7.6
7.7|7.0|8.1

### 1.	Define the type of the variable
Because the variables are fractions and so they have an infinite spectrum of values we're dealing with the numeric data.

**Get data in R**

Before processing the data from the task, one has to take into account that R expects the data for the multiple comparisons to be in a specific format. Each row of an input table has to have name of the group and value of BMC. It's because the multiple comparison methods and corresponding R functions treat multiple group data as a combination of nominal and numeric data. In our case it's one nominal variable (factor in R terms) group *GRP* with 3 levels: F1, F2, F3 and one numeric variable *BMC*.

One has to transform the original table in some external software. Any spreadsheet engine of your choice will do (MS Excel, LibreOffice Calc, Google Sheets). The final form of ready to process data is presented in a table below:

GRP|BMC
-|-
F1|7.1
F1|7.0
F1|7.0
F1|8.6
F1|8.2
F1|6.6
F1|6.8
F1|6.7
F1|7.7
F2|7.6
F2|7.3
F2|7.3
F2|8.3
F2|7.6
F2|6.6
F2|6.7
F2|6.8
F2|7.0
F3|9.2
F3|8.3
F3|9.1
F3|9.0
F3|8.9
F3|9.0
F3|9.2
F3|7.6
F3|8.1


In [1]:
Input = (
    "GRP,BMC
F1,7.1
F1,7.0
F1,7.0
F1,8.6
F1,8.2
F1,6.6
F1,6.8
F1,6.7
F1,7.7
F2,7.6
F2,7.3
F2,7.3
F2,8.3
F2,7.6
F2,6.6
F2,6.7
F2,6.8
F2,7.0
F3,9.2
F3,8.3
F3,9.1
F3,9.0
F3,8.9
F3,9.0
F3,9.2
F3,7.6
F3,8.1"
)
Data = read.csv(textConnection(Input), header = TRUE, 
                strip.white = TRUE,                # Remove the whitespace characters from the input strings
                stringsAsFactors = TRUE)           # Convert the strings into nominal data
str(Data)

'data.frame':	27 obs. of  2 variables:
 $ GRP: Factor w/ 3 levels "F1","F2","F3": 1 1 1 1 1 1 1 1 1 2 ...
 $ BMC: num  7.1 7 7 8.6 8.2 6.6 6.8 6.7 7.7 7.6 ...


As one can see the imported data frame `Data` holds one nominal variable `GRP` with 3 levels "F1", "F2", "F3" and one numeric data `BMC`. 

### 2. Check if the samples follow normal distribution
To get only the needed values for the upcoming normal distribution check we have to specify which categories not to ignore. E.g. to get only the values for the F1 group one has to type `Data$BMC[Data$GRP == "F1"]`. This notation tells R to get the values of the BMC variable from the whole dataset if it's corresponding GRP variable equals to factor value of "F1". The double equal sign `==` mean equals and is a logical comparison operator. It tests the values on the left and right sides against each other and returns TRUE if both of them are equal, otherwise it return FALSE.

In [2]:
# Normal distribution check for F1 group
shapiro.test(Data$BMC[Data$GRP == "F1"])
# Normal distribution check for F2 group
shapiro.test(Data$BMC[Data$GRP == "F2"])
# Normal distribution check for F3 group
shapiro.test(Data$BMC[Data$GRP == "F3"])


	Shapiro-Wilk normality test

data:  Data$BMC[Data$GRP == "F1"]
W = 0.86142, p-value = 0.09936



	Shapiro-Wilk normality test

data:  Data$BMC[Data$GRP == "F2"]
W = 0.93473, p-value = 0.5277



	Shapiro-Wilk normality test

data:  Data$BMC[Data$GRP == "F3"]
W = 0.82447, p-value = 0.0387


From the results obtained for the normal distribution check step we derive that F1, F2 follow normal distribution and F3 doesn't follow normal distribution. This means we have to use non-parametric methods.

### 3. Formulate the statistical hypotheses

**Null hypothesis (H0):** Body mass changes are the same for all three dietary supplements

**Alternative hypothesis (H1):** Body mass changes is different for all three dietary supplements

### 4. Testing the hypotheses
The non-parametrical method for the multiple comparison test of one nominal and one numeric data like in our case is a *Kruskal-Wallis Rank Sum Test*. This method can be thought of as a non-parametrical analogue of ANOVA and is implemented in R as `kruskal.test` function. It accepts at least one argument which R names formula and in essence is a short form of representing the relation between the tested variables. The independent variable is specified on the right side of the `~` sign and the dependent variable is provided on the left side. In *Kruskal-Wallis Rank Sum Test* the independent variable is always a nominal variable, so the format of the formula argument is `NumericVar ~ NominalVar`.

In [3]:
kruskal.test(Data$BMC ~ Data$GRP)


	Kruskal-Wallis rank sum test

data:  Data$BMC by Data$GRP
Kruskal-Wallis chi-squared = 13.592, df = 2, p-value = 0.001118


The result of the test shows that we have to reject the null hypothesis (`p-value` < 0.05) and accept the alternative hypothesis.

It means that different dietary supplements change body mass differently.

We'll try to figure out which dietary supplement affects the body mass in a different way among the three. To do so we have to perform the pairwise tests for all supplements. To do so we'll use the `pairwise.wilcox.test` function which can be thought of as a non-parametric version of `pairwise.t.test`. 

To get rid of possible errors in the resulting `p-value` we'll have to specify the p-value correction method along with the tested data. We'll use the Bonferroni correction method (for the details one can address this [book](http://www.biostathandbook.com/multiplecomparisons.html)).

In [4]:
pairwise.wilcox.test(Data$BMC, Data$GRP, p.adjust.method = "bonferroni")

"cannot compute exact p-value with ties"


	Pairwise comparisons using Wilcoxon rank sum test 

data:  Data$BMC and Data$GRP 

   F1     F2    
F2 1.0000 -     
F3 0.0079 0.0036

P value adjustment method: bonferroni 

The result of the function is presented as a matrix of `p-values` calculated for each of the group pair. As one can see the supplements of F1 and F2 affect the body mass change in a similar way while F3 affects the body mass in a different way. It means that the source of difference in body mass change shown in the Kruskal-Wallis Rank Sum test is the addition of F3 dietary supplement to the standard ration.

### Conclusion
Different dietary supplements change body mass differently. The addition of F3 dietary supplement to the standard ration affects body mass change in a different way among the studied food additions.