## load libraries and data

In [None]:
library(tidyverse)

station_data <- read_csv('data/combined_stations.csv')

## Set up and run an AOV test to compare annual total rainfall at all four stations, using data from all available years. Are there significant differences between the stations? Use TukeyHSD() or pairwise.t.test() (documentation) to investigate further.

First, we group by `station` and `year`, then `summarize()` using `sum()` to get the annual total rainfall, and assign this to an object.

In [None]:
station_data |> 
    group_by(station, year) |> 
    summarize(rain = sum(rain)) -> total_rain

Next, look at a box plot to get an idea of the differences:

In [None]:
ggplot(data = total_rain, mapping = aes(x = station, y = rain)) + geom_boxplot()

Use `aov()` to do the analysis of variance, and print the summary:

In [None]:
rain_aov <- aov(rain~station, data=total_rain)
summary(rain_aov)

So there is a significant difference between at least one pair of the stations; using `TukeyHSD()`, we can check each pair individually:

In [None]:
TukeyHSD(rain_aov)

So, the only station pair with no significant difference is Southampton and Armagh (p = 0.175); for all other pairs, the difference is statistically significant.

## Using only observations from Armagh, set up and run a test to see if there are significant differences in rainfall based on the season.

First, select only observations from Armagh:

In [None]:
armagh <- station_data |> filter(station == 'armagh')

Use `aov()` to test for differences between seasons:

In [None]:
armagh_season <- aov(rain~season, data=armagh)
summary(armagh_season)

And, use `pairwise.t.test()` to see which season pairs have a significant difference:

In [None]:
pairwise.t.test(armagh$rain, armagh$season)

## Using only observations from Oxford, is there a significant difference between the values of tmax in the spring and the autumn at the 99.9% confidence level?

First, select only observations from Oxford:

In [None]:
oxford <- station_data |> filter(station == 'oxford')

Next, use `t.test()` - we're only testing for a difference, so use `two.sided`:

In [None]:
oxford.spring <- oxford |> filter(season == 'spring') |> pull(tmax)
oxford.autumn <- oxford |> filter(season == 'autumn') |> pull(tmax)

t.test(oxford.spring, oxford.autumn, alternative='two.sided', conf.level=0.999)

## Using only observations from Stornoway Airport, is the value of tmin significantly lower in the winter than in the autumn?

In [None]:
stornoway <- station_data |> filter(station == 'stornoway')

stornoway.winter <- stornoway |> filter(season == 'winter') |> pull(tmin)
stornoway.autumn <- stornoway |> filter(season == 'autumn') |> pull(tmin)

t.test(stornoway.winter, stornoway.autumn, alternative='less', conf.level=0.99)