## load the libraries that you need

In [None]:
library(readr) # this loads the functions we'll use to load the data
library(ggplot2) # this loads the functions, etc. needed for us to plot
library(dplyr) # this loads the functions, etc. needed for us to work with the data

## loads the saved data file (combined_stations.csv)

In [None]:
station_data <- read_csv(file.path('data', 'combined_stations.csv'))

## questions

### what station has the highest recorded rainfall in the past 20 years, and on what date?

For this, we want to first filter the dataset to only include the past 20 years. We can type this manually, but we can also use `Sys.Date()` to get the current date, use `format()` to get the year, and then subtract 20 from the numeric representation.

To select the row with the maximum value of `rain`, we use `slice_max()`:

In [None]:
station_data |> 
    filter(year > as.numeric(format(Sys.Date(), "%Y")) - 20) |>
    slice_max(rain, n=1)

### what season has the lowest average rainfall for each station?

Here, we want to group by `station` and `season`, then use `summarise` to calculate the averages, then use `slice_min()` to select the lowest value of `rain` for each group:

In [None]:
station_data |> 
    group_by(station, season) |>
    summarise(avg_rain = mean(rain, na.rm = TRUE)) |>
    slice_min(avg_rain)

### what station has recorded the most months with tmin < 1°C? are all these observations from a single season?

First, we use `filter()` to only select rows with `tmin < 1`; then, group by `station` and `season`, then `count()` to count the number of rows in each group, then `ungroup()` to be able to select the row with the maximum value.

To check whether these observations are from a single season or not, we can omit the last line.

In [None]:
station_data |> 
    filter(tmin < 1) |>
    group_by(station, season) |> 
    count() |> 
    ungroup() |> 
    slice_max(n)

### what is the median rainfall in months where tmax is greater than 20°C? make sure that your result is a number, not a tibble!

First, use `filter()` to select only rows where `tmax > 20`; then, summarize to calculate the median value, then use `pull()` to get the value of `rain`.

In [None]:
station_data |> 
    filter(tmax > 20) |>
    summarize(rain = median(rain, na.rm = TRUE)) |> 
    pull(rain)

### what year saw the most total rainfall, using data from all four stations?

First, we have to group by year, then summarize to get the sum of the rain. Finally, we use slice_min to select the row with the highest value:

In [None]:
station_data |>
    group_by(year) |>
    summarize(total_rain = sum(rain)) |>
    slice_max(total_rain, n=1)

### what are the top 5 driest years, using only data from stations in Britain?

Like before, we use `group_by()`, `summarize()`, and `slice_min()`, but we first want to `filter()` to only select stations in Britain (Oxford, Southampton, and Stornoway):

In [None]:
station_data |>
    filter(station %in% c('oxford', 'southampton', 'stornoway')) |>
    group_by(year) |>
    summarize(total_rain = sum(rain)) |>
    slice_min(total_rain, n=5)

### what is the lowest annually-averaged monthly minimum temperature in the dataset, as measured by a single station?

Here, we want to first use `group_by()` to group by `station` and `year`; then, we use `summarize()` to get the annually-averaged value of `tmin`. If we use `slice_min()` on the output of this, we will actually get 4 rows: one for each value of `station`; because we don't care what station, we can use `ungroup()` ([documentation](https://dplyr.tidyverse.org/reference/group_by.html)) to remove the groups and get only a single station:

In [None]:
station_data |> 
    group_by(station, year) |>
    summarize(tmin = mean(tmin, na.rm = TRUE)) |>
    ungroup() |>
    slice_min(tmin, n=1)

### what is the sunniest month, on average, in armagh? 

For this, we use `filter()` to select only rows where `station` is `'armagh'`, then group by month, `summarize` to get the mean value of `sun` for each month, then use `slice_max` to select the row with the maximum value:

In [None]:
station_data |>
    filter(station == 'armagh') |>
    group_by(month) |>
    summarize(sun = mean(sun, na.rm = TRUE)) |>
    slice_max(sun)

### bonus: write a line that will rename the months from the number to a 3-letter abbreviation

In [None]:
station_data |>
    filter(station == 'armagh') |>
    group_by(month) |>
    summarize(sun = mean(sun, na.rm = TRUE)) |>
    mutate(month = month.abb[month]) |>
    slice_max(sun)