# Seasonal Effects on Female Red-Billed Gull Size in New Zealand.
**Prepared by Group 4: Arnab Das, Maggie Wang, Paul Zeng, Dylan Zhang**

## Introduction
The red-billed gull is the most common gull on the New Zealand coast. It is frequently seen in coastal towns, garbage dumps, and at fish processing facilities (Red-billed gull: Tarāpunga, n.d.). We would like to find the differences in means of the weights and lengths of the female red-billed gull based on the two season climates. We believe that the primary and underlying cause of the size variations within a year for the red-billed gulls lies in the seasonal effects that bring about change in various aspects of their habits.

Some seasonal effects include:
- An extremely long egg-laying period that can extend from mid-September to January.
- The main food at the largest colonies is the euphausiid Nyctiphanes australis (krill) which occurs more abundantly within the spring season and early summer (Mills et al., 2008).
- The relative abundance of krill has been correlated positively with the Southern Oscillation Index (SOI) (Mills et al., 2008). The SOI is calculated using the differences in pressure between Tahiti and Darwin, which typically peaks during southern hemisphere spring (September - December) (Pacific Marine Environmental Laboratory, n.d.).
- At Kaikoura during the breeding season adult gulls can sustain themselves on alternative foods such as earthworms, small fish, garbage and kelp flies, but they are dependent upon an abundant and regular supply of the surface-swarming krill for successful breeding. Outside of the breeding season the diet is highly variable. Some still feed at sea; others feed on small invertebrates along the shore, or from human sources such as handouts in towns or cities, offal being discarded from fishing boats and garbage at rubbish dumps.

In this report, we will use the GULLS.csv dataset from NZGRAPHER (Dataset, n.d.). The dataset contains information on the weight and length of gulls, as well location (Maraetai, Muriwai, or Piha), coast (east or west), season (summer or winter) and sex (male or female) of the gull.


## Methods and Results
We first loaded all libraries pertaining and relevant for this data analysis.

In [1]:
# Loading all the required libraries
library(infer)
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.3.6      [32m✔[39m [34mpurrr  [39m 0.3.4 
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.4.1 
[32m✔[39m [34mreadr  [39m 2.1.2      [32m✔[39m [34mforcats[39m 0.5.2 
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()


Since the data was stored elsewhere, we used `read_csv` to import from the website and loaded the data onto our worksheet. We used `as_factor` to convert the variables of location, coast, season, and sex from characters to factors, to make it easier to perform statistical operations on them.

In [4]:
# Reading dataframe from the internet and storing it to a variable
gulls <- read_csv("https://raw.githubusercontent.com/maggie63/stat-201-group-4/main/gulls_data.csv") |>
    mutate(LOCATION = as_factor(LOCATION), COAST = as_factor(COAST), SEASON = as_factor(SEASON), SEX = as_factor(SEX)) |>
    filter((!is.na(WEIGHT)))

# Printing the first few rows of the dataframe
head(gulls)

[1mRows: [22m[34m2487[39m [1mColumns: [22m[34m6[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): LOCATION, COAST, SEASON, SEX
[32mdbl[39m (2): WEIGHT, LENGTH

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


WEIGHT,LENGTH,LOCATION,COAST,SEASON,SEX
<dbl>,<dbl>,<fct>,<fct>,<fct>,<fct>
262,38.9,MARAETAI,EAST,WINTER,MALE
300,41.3,MURIWAI,WEST,SUMMER,MALE
250,36.6,MURIWAI,WEST,WINTER,MALE
242,36.0,MARAETAI,EAST,WINTER,FEMALE
261,37.1,MURIWAI,WEST,WINTER,MALE
262,38.2,MURIWAI,WEST,WINTER,MALE


*Table 1.0: First few rows of "Gulls Data Set" with 2487 rows and 6 columns*


We then checked whether the seasons are represented similarly/equally and the number of samples is more than 30 per season.

In [5]:
# Group data to see if the categorical variables are representated properly
n_obs<- gulls |>
    select(-LENGTH) |>
    filter(SEX == "FEMALE" ) |>
    group_by(SEASON) |>
    summarise(n = n())
n_obs

SEASON,n
<fct>,<int>
WINTER,615
SUMMER,665


Since the population is representated properly in the sample, we were able to move on with the rest of the analysis.

We began by filtering the data to select only female gulls, and to get rid of any rows that may have been missing data. We created two different data frames, one for weight and one for length.

In [12]:
# Filtering and grouping data
gull_weights <- gulls |> 
    filter(SEX == "FEMALE" & (!is.na(WEIGHT))) |>
    select(WEIGHT, LENGTH, SEASON)

gull_lengths <- gulls |> 
    filter(SEX == "FEMALE" & (!is.na(LENGTH))) |>
    select(WEIGHT, LENGTH, SEASON)

head(gull_weights)
head(gull_lengths)

WEIGHT,LENGTH,SEASON
<dbl>,<dbl>,<fct>
242,36.0,WINTER
278,35.2,SUMMER
278,37.4,SUMMER
247,36.9,WINTER
268,36.6,SUMMER
274,35.4,SUMMER


WEIGHT,LENGTH,SEASON
<dbl>,<dbl>,<fct>
242,36.0,WINTER
278,35.2,SUMMER
278,37.4,SUMMER
247,36.9,WINTER
268,36.6,SUMMER
274,35.4,SUMMER


Next, we drew a single random sample of size 50, making sure to set the seed first so that our results for randomization are reproducible. We then calculated the sample mean and sample standard error for both the weight and length, and stored them in separate data frames, along with the number of observations from each season.

In [14]:
# Setting the seed
set.seed(1)

# Drawing a sample of size 50
gull_weight_sample <- gull_weights |> sample_n(size = 50)
gull_length_sample <- gull_lengths |> sample_n(size = 50)

# Computing estimates
gull_weight_summary <- gull_weight_sample |>
    group_by(SEASON) |>
    summarise(n = n(),
              sample_mean = mean(WEIGHT), 
              sample_std_error = sd(WEIGHT) / sqrt(n))

gull_length_summary <- gull_length_sample |>
    group_by(SEASON) |>
    summarise(n = n(),
              sample_mean = mean(LENGTH), 
              sample_std_error = sd(LENGTH) / sqrt(n))

gull_weight_summary
gull_length_summary

SEASON,n,sample_mean,sample_std_error
<fct>,<int>,<dbl>,<dbl>
WINTER,27,253.2222,1.860776
SUMMER,23,268.5217,2.331151


SEASON,n,sample_mean,sample_std_error
<fct>,<int>,<dbl>,<dbl>
WINTER,19,35.92632,0.5650566
SUMMER,31,35.42581,0.3314669


## Discussion

## References
1. Dataset. NZGRAPHER. (n.d.). https://grapher.jake4maths.com/?folder=sneddon&dataset=GULLS.csv
2. GULLS.CSV information. Inference. (n.d.). https://sites.google.com/view/inference/data-sets#h.p_IlT79LKK_MeP
3. Mills, J. A., Yarrall, J. W., Bradford-Grieve, J. M., Uddstrom, M. J., Renwick, J. A., & Merilä, J. (2008). The impact of climate fluctuation on food availability and reproductive performance of the planktivorous red-billed gulllarus novaehollandiae scopulinus. Journal of Animal Ecology, 77(6), 1129–1142. https://doi.org/10.1111/j.1365-2656.2008.01383.x 
4. Pacific Marine Environmental Laboratory. (n.d.). La Niña faqs. El Niñ0 Theme Page. https://www.pmel.noaa.gov/elnino/lanina-faq 
5. Red-billed gull: Tarāpunga: New Zealand Birds Online. New Zealand Birds Online - The digital encyclopaedia of New Zealand birds. (n.d.). https://nzbirdsonline.org.nz/species/red-billed-gull