# Describing communities using summary statistics

Here we will take field data from invertebrates collected during the 2022 Kosciuszko National Park field trip and compute descriptive statistics to summarise the populations

In [1]:
library(tidyverse)

── [1mAttaching packages[22m ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──



[32m✔[39m [34mggplot2[39m 3.3.5     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.6     [32m✔[39m [34mdplyr  [39m 1.0.7
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1



“package ‘tidyr’ was built under R version 4.0.5”


“package ‘readr’ was built under R version 4.0.5”


── [1mConflicts[22m ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



In [2]:
inverts <- read_csv("https://raw.githubusercontent.com/mikheyev/ecology-r-code/main/data/inverts.csv")

[1mRows: [22m[34m97[39m [1mColumns: [22m[34m15[39m


[36m──[39m [1mColumn specification[22m [36m──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (6): Group, Taxa, Kingdom, Phylum, Subphylum or Order, How the organism ...
[32mdbl[39m (9): Elevation, Mean flow rate (m/s), Mean temperature, Pack 1, Pack 2, ...



[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.


## Define functions for diversity and evenness

[Menhinick richness index](https://search.r-project.org/CRAN/refmans/abdiv/html/menhinick.html) measures diversity without considering relative species abundance.It's relatively crude, but simple to compute.

In [3]:
D <- function(n) {
    if (sum(n) > 0)
        sum(n > 0)/sqrt(sum(n))
    else
        0
}

[Shannon's index](https://en.wikipedia.org/wiki/Diversity_index#Shannon_index) combines species abundance to provide incorporate how abundance affects diversity.

In [4]:
H <- function(n) {
    partH <- 0
    for (i in n)
        if (i > 0)
            partH = partH - (i / sum(n)) * log(i / sum(n))
    return(partH)
}

[Evenness](https://en.wikipedia.org/wiki/Species_evenness) is a measure of how similar the abundances of different species are in the community, and it is derived from dividing Shannon's index by the natural log of the species count, which actually corresponds to the maximum possible H, for a given number of species.

In [5]:
E <- function(n) {
    H(n)/log(sum(n > 0))
}

## reshape data to make it easier to analyze

The following commands reshape the data from the easy-to-enter-by-humans 'wide' format to the `long` format that R likes. The `starts_with` command says that the variables we need to reshape start with "Pack", which is the leaf pack that is our unit of replication.

In [6]:
head(inverts)

inverts_long <- pivot_longer(inverts, starts_with("Pack"), names_to = "pack", values_to = "count")

head(inverts_long)

Group,Elevation,Mean flow rate (m/s),Mean temperature,Taxa,Kingdom,Phylum,Subphylum or Order,How the organism feeds,Pack 1,Pack 2,Pack 3,Pack 4,Pack 5,Pack 6
<chr>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Rosie,1610,0.2343445,10.632,Stoneflies (Order Plecoptera),Animalia,Arthropoda,Hexapoda,Mostly Predators,15,0,4,3,,2
Rosie,1610,0.2343445,10.632,Dragonflies and Damselflies (Order Odonata),,,,Mostly Predators,0,0,4,3,,2
Rosie,1610,0.2343445,10.632,Mayflies (Order Ephemeroptera),,,,Mostly herbivores,2,0,6,1,,0
Rosie,1610,0.2343445,10.632,Water Beetles (Order Coleoptera),,,,,0,0,0,0,,0
Rosie,1610,0.2343445,10.632,True Flies (Order Diptera),,,,,0,0,1,1,,0
Rosie,1610,0.2343445,10.632,Isopod,,,,,1,0,0,0,,0


Group,Elevation,Mean flow rate (m/s),Mean temperature,Taxa,Kingdom,Phylum,Subphylum or Order,How the organism feeds,pack,count
<chr>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>
Rosie,1610,0.2343445,10.632,Stoneflies (Order Plecoptera),Animalia,Arthropoda,Hexapoda,Mostly Predators,Pack 1,15.0
Rosie,1610,0.2343445,10.632,Stoneflies (Order Plecoptera),Animalia,Arthropoda,Hexapoda,Mostly Predators,Pack 2,0.0
Rosie,1610,0.2343445,10.632,Stoneflies (Order Plecoptera),Animalia,Arthropoda,Hexapoda,Mostly Predators,Pack 3,4.0
Rosie,1610,0.2343445,10.632,Stoneflies (Order Plecoptera),Animalia,Arthropoda,Hexapoda,Mostly Predators,Pack 4,3.0
Rosie,1610,0.2343445,10.632,Stoneflies (Order Plecoptera),Animalia,Arthropoda,Hexapoda,Mostly Predators,Pack 5,
Rosie,1610,0.2343445,10.632,Stoneflies (Order Plecoptera),Animalia,Arthropoda,Hexapoda,Mostly Predators,Pack 6,2.0


The set of calculations below is a bit hairy, but it illustrates the power of data manipulation in R, we do some cleaning and compute statistics to plot on the fly.

In [7]:
# take `inverts_long` and assign the final result to `inverts_summary`
inverts_summary <- inverts_long %>%  
  # remove any counts with missing data
  filter(!is.na(count)) %>% 
  # conduct measurements at the level of group and pack (our units of replication)
  group_by(Group, pack) %>% 
  # variables to compute -- not we're using the newly defined functions
  summarize(Elevation = first(Elevation), flow = first(`Mean flow rate (m/s)`), D = D(count), H = H(count), E = E(count) ) 

`summarise()` has grouped output by 'Group'. You can override using the `.groups` argument.


Take time to walk through the commands and make sure you understand what is going on in each line.