# Further Hypothesis Testing

## Assignment: ice cream

This assignment contains exercises to test your understanding of various hypothesis tests from the course. It is a good idea to revise the course material thoroughly before starting.

Use whatever computational resources you like to answer the questions (e.g. python, R, any other programming language, a spreadsheet, a calculator, ...).

### The data

The file `Ice_cream.csv` contains data collected on 200 high school students. These are:

* Subject ID
* Wears glasses (0=No, 1=Yes)
* Favourite ice cream flavour (1=Vanilla, 2=Chocolate, 3=Strawberry)
* Score on a video game
* Score on a puzzle

### The tasks

Answer the following questions:

1. Are the video game and puzzle scores normally distributed?

2. Is the video game as hard as the puzzle?

3. Is wearing glasses associated with a higher score on the puzzle game?

4. Do glasses wearers have different ice cream preferences to non-wearers?

5. Is the video game score independent of favourite ice cream?

---

In [1]:
# Select this cell and type Ctrl-Enter to execute the code below.

library(tidyverse)

set_plot_dimensions <- function(width_choice, height_choice) {
    options(repr.plot.width = width_choice, repr.plot.height = height_choice)
}

cbPal <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#CC79A7", "#0072B2", "#D55E00")

set_plot_dimensions(5, 4)


── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.7     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



In [2]:
data <- read_csv("../assets/Ice_cream.csv")


[1mRows: [22m[34m200[39m [1mColumns: [22m[34m5[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[32mdbl[39m (5): id, glasses, ice_cream, video, puzzle

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


1. Are the video game and puzzle scores normally distributed?

In [8]:
# H0: The scores are normally distributed.
# H1: The scores are not normally distributed.

sample <- data %>%
    pull(video)

p_value <- shapiro.test(sample)$p.value
print(paste("Video: p =", p_value))


[1] "Video: p = 0.0347758959538878"


In [9]:
sample <- data %>%
    pull(puzzle)

p_value <- shapiro.test(sample)$p.value
print(paste("Puzzle: p =", p_value))


[1] "Puzzle: p = 2.34330158744853e-05"


p-values of video and puzzle scores are below the significant level alpha = 0.05. However, a Bonferroni correction for 2-testing would require p < alpha/2, hence we should only report that the puzzle scores appear significantly non-normal.

2. Is the video game as hard as the puzzle?

Although the puzzle scores are not normally distributed, we *do* have a big enough sample size (n > 75) to use a paired sample t-test to investigate the mean of the score differences (puzzle - video).

In [15]:
# H0: mean(puzzle - video) = 0
# H1: mean(puzzle - video) != 0

type.video <- data %>%
    pull(video)

type.puzzle <- data %>%
    pull(puzzle)

print(paste("Video:", mean(type.video)))
print(paste("Puzzle:", mean(type.puzzle)))
print(paste("Difference:", mean(type.video) - mean(type.puzzle)))


[1] "Video: 51.85"
[1] "Puzzle: 52.405"
[1] "Difference: -0.555"


In [16]:
t.test(type.video, type.puzzle, var.equal = TRUE, paired = TRUE, alternative = "two.sided")



	Paired t-test

data:  type.video and type.puzzle
t = -0.7338, df = 199, p-value = 0.4639
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -2.046463  0.936463
sample estimates:
mean difference 
         -0.555 


p > alpha so we accept the null hypothesis: the video game is as hard as the puzzle.

3. Is wearing glasses associated with a higher score on the puzzle game?

In [30]:
type.glasses <-
    data %>%
    filter(glasses == 1) %>%
    pull(puzzle)

type.no_glasses <-
    data %>%
    filter(glasses == 0) %>%
    pull(puzzle)

print(paste("Glasses: n =", length(type.glasses), "mean =", mean(type.glasses)))
print(paste("No Glasses: n =", length(type.no_glasses), "mean =", mean(type.no_glasses)))
print(paste("Difference:", mean(type.glasses) - mean(type.no_glasses)))

[1] "Glasses: n = 109 mean = 52.9174311926606"
[1] "No Glasses: n = 91 mean = 51.7912087912088"
[1] "Difference: 1.12622240145176"


Both groups are large enough to allow a reliable p-value to be obtained from a t-test using non-normal data (n>75).

Instead, we will use a non-parametric alternative. The two samples are independent, so we can use a Mann-Whitney U-test to compare their distributions.

Note that the question as posed is a one-tailed hypothesis test:

- H0: a random sample from puzzle_glasses is on average the same as a random sample from puzzle_no_glasses. 
- H1: a random sample from puzzle_glasses is on average less than a random sample from puzzle_no_glasses.

In [36]:
wilcox.test(type.glasses, type.no_glasses)


	Wilcoxon rank sum test with continuity correction

data:  type.glasses and type.no_glasses
W = 5184, p-value = 0.5795
alternative hypothesis: true location shift is not equal to 0


Above p-value is two-tailed. Dividing it by 2, we have p > alpha, so we conclude that glasses wearers do not score significantly higher on the puzzle than non-wearers.

4. Do glasses wearers have different ice cream preferences to non-wearers?

To test whether two categorical variables are independent, we use a chi-squared test:

- H0: glasses and ice_cream are independent.
- H1: glasses and ice_cream are not independent.

In [41]:
obs <-
    data %>%
    select(glasses, ice_cream) %>%
    table()

print("OBSERVED:")

print(obs)

[1] "OBSERVED:"
       ice_cream
glasses  1  2  3
      0 47 15 29
      1 48 32 29


In [42]:
print("row totals:")
tot_rows <- margin.table(obs, 1)
show(tot_rows)

print("column totals:")
tot_cols <- margin.table(obs, 2)
print(tot_cols)

print("total observations")
tot_obs <- sum(tot_rows)
print(tot_obs)


[1] "row totals:"
glasses
  0   1 
 91 109 
[1] "column totals:"
ice_cream
 1  2  3 
95 47 58 
[1] "total observations"
[1] 200


In [43]:
print("EXPECTED:")
exp <- as.matrix(tot_rows) %*% t(as.matrix(tot_cols)) / tot_obs
exp


[1] "EXPECTED:"


Unnamed: 0,1,2,3
0,43.225,21.385,26.39
1,51.775,25.615,31.61


In [44]:
chisq.test(obs)



	Pearson's Chi-squared test

data:  obs
X-squared = 4.5765, df = 2, p-value = 0.1014


p > alpha, so we accept the null hypothesis that the variables are independent.

5. Is the video game score independent of favourite ice cream?

This question asks us to test whether the mean video game scores for the three ice cream groups are the same. This requires a one-way ANOVA test:

- H0: The means of the three groups are the same.
- H1: The means of the three groups are not the same.

In [53]:
type.video_1 <-
    data %>%
    filter(ice_cream == 1) %>%
    pull(video)

type.video_2 <-
    data %>%
    filter(ice_cream == 2) %>%
    pull(video)

type.video_3 <-
    data %>%
    filter(ice_cream == 3) %>%
    pull(video)

print(paste("Vanilla: n =", length(type.video_1), "mean =", mean(type.video_1)))
print(paste("Chocolate: n =", length(type.video_2), "mean =", mean(type.video_2)))
print(paste("Strawberry: n =", length(type.video_3), "mean =", mean(type.video_3)))

[1] "Vanilla: n = 95 mean = 51.7052631578947"
[1] "Chocolate: n = 47 mean = 47.7021276595745"
[1] "Strawberry: n = 58 mean = 55.448275862069"


In [63]:
data %>%
    aov(video ~ ice_cream, .) %>%
    summary()


             Df Sum Sq Mean Sq F value Pr(>F)  
ice_cream     1    339   338.6   3.497 0.0629 .
Residuals   198  19169    96.8                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

p > alpha, so we accept the null hypothesis: there is no significant association between favourite ice cream and video game score.