**The Wilcoxon rank-sum test** is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall.

## a)


$ H_0: M_1 = M_2 $

$ H_a: M_1 \neq M_2 $

'M' shows median of data


In [9]:
tab <- matrix(nrow=2, ncol=8, byrow=TRUE)
rownames(tab) <- c( 'Group1', 'Group2')
tab[1,] = c(19.0, 14.4, 18.2, 15.6, 14.5, 11.2, 13.9, 11.6)
tab[2,] = c(12.1, 19.1, 11.6, 21.0, 16.7, 10.1, 18.3, 20.5)
tab

0,1,2,3,4,5,6,7,8
Group1,19.0,14.4,18.2,15.6,14.5,11.2,13.9,11.6
Group2,12.1,19.1,11.6,21.0,16.7,10.1,18.3,20.5


In [10]:
total = c(tab[1,], tab[2,])
total

In [11]:
total_sorted = sort(total, decreasing = FALSE)
total_sorted

In [12]:
for (i in 1:length(total_sorted)) {
    cat("\n",i, total_sorted[i])
}


 1 10.1
 2 11.2
 3 11.6
 4 11.6
 5 12.1
 6 13.9
 7 14.4
 8 14.5
 9 15.6
 10 16.7
 11 18.2
 12 18.3
 13 19
 14 19.1
 15 20.5
 16 21

In [16]:
tab2 <- matrix(nrow=4, ncol=9, byrow=TRUE)
rownames(tab2) <- c('Group1', 'G1-Rank', 'Group2', 'G2-Rank')
colnames(tab2)[9] <- 'TOTAL'
tab2[1,] = c(tab[1,], NA)
tab2[2,] = c(13, 7, 11, 9, 8, 2, 6, 3.5, 59.5)
tab2[3,] = c(tab[2,], NA)
tab2[4,] = c(5, 14, 3.5, 16, 10, 1, 12, 15, 76.5)
tab2

Unnamed: 0,NA,NA.1,NA.2,NA.3,NA.4,NA.5,NA.6,NA.7,TOTAL
Group1,19.0,14.4,18.2,15.6,14.5,11.2,13.9,11.6,
G1-Rank,13.0,7.0,11.0,9.0,8.0,2.0,6.0,3.5,59.5
Group2,12.1,19.1,11.6,21.0,16.7,10.1,18.3,20.5,
G2-Rank,5.0,14.0,3.5,16.0,10.0,1.0,12.0,15.0,76.5


Rule is that the row with smaller sample size get to be our test stat but in this question both rows have equal sample sizes so we choose either one. for example we choose group1 (min value).

![w2](img/w2.png)


<hr/>

as we can see in the figure below, we can't reject the null hypothesis in this test.

![wilxcon2](img/wilxcon2.png)


**Notice:** Section below is just for checking the answer with R and is not included in the question

In [18]:
res <- wilcox.test(tab[1, ], tab[2, ],
                   exact = FALSE)
res


	Wilcoxon rank sum test with continuity correction

data:  tab[1, ] and tab[2, ]
W = 23.5, p-value = 0.4005
alternative hypothesis: true location shift is not equal to 0


In [19]:
# Print the p-value only
res$p.value

we can conclude from the result above that we can't reject the null hypothesis because $p_{value} > \alpha$

## b)

In statistics, the **Siegel–Tukey test**, named after Sidney Siegel and John Tukey, is a non-parametric test which may be applied to data measured at least on an ordinal scale. It tests for differences in scale between two groups.

The test is used to determine if one of two groups of data tends to have more widely dispersed values than the other. In other words, the test determines whether one of the two groups tends to move, sometimes to the right, sometimes to the left, but away from the center (of the ordinal scale).

The principle is based on the following idea:

Suppose there are two groups A and B with n observations for the first group and m observations for the second (so there are N = n + m total observations). If all N observations are arranged in ascending order, it can be expected that the values of the two groups will be mixed or sorted randomly, if there are no differences between the two groups (following the null hypothesis H0). This would mean that among the ranks of extreme (high and low) scores, there would be similar values from Group A and Group B.

- Hypothesis $H0: σ_1^2 = σ_2^2 , Me_1 = Me_2$ (where $σ^2$ and Me are the variance and the median, respectively)
- Hypothesis $H1: σ_1^2 > σ_2^2$


In [20]:
total_sorted

|   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Group  | 2  | 1  | 1&2  | 1&2  | 2  | 1  |  1 | 1  |  1 | 2  | 1  | 2  |  1 | 2  | 2  |  2 |
| Value | 10.1  | 11.2  |  11.6 | 11.6  |  12.1 |  13.9 | 14.4  |  14.5 | 15.6  | 16.7  | 18.2  | 18.3  | 19  | 19.1  |  20.5 | 21  |
|  Alternative Rank |  1 |  4 | 5  | 8  | 9  | 12  |  13 | 16  | 15  | 14  | 11  | 10  |  7 |  6 | 3  |  2 |

In [21]:
W_1 = 4 + (8+5)/2 + 12 + 13 + 16 + 15 + 11 + 7
W_1

In [23]:
W_2 = 1 + (8+5)/2 + 9 + 14 + 10 + 6 + 3 + 2
W_2

In [24]:
U_1 = W_1 - 8*(8+1)/2
U_1

In [25]:
U_2 = W_2 - 8*(8+1)/2
U_2

In [26]:
U_ = min(U_1, U_2)
U_

According to $H_{0}$ the minimum of these two values is distributed according to a Wilcoxon rank-sum distribution with parameters given by the two group sizes:

$$ mean(U_1, U_2) \Rightarrow Wilcoxon(m, n)$$

Which allows the calculation of a p-value for this test according to the following formula:

$$ p = Pr[X \leq mean(U_1, U_2)] $$
$$ X \Rightarrow Wilcoxon(m, n) $$


![w2](img/w2.png)

as we can see in Table of Critical Values for the Wilcoxon Rank-Sum Test our U is inside the upper and lower bound since we can't reject the null hypothesis.

plus for p we have:

$p = Pr[X \leq 15.5] = .0249$

indicating little or no reason to reject the null hypothesis that the dispersion of the two groups is the same.


## e)