Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = ""
COLLABORATORS = ""

---

In [None]:
library(testthat) # testing
library(digest) # hashing

# Nesting tree charactetristics of the Northern Spotted Owl

The Northern Spotted owl is under extreme pressure in British Columbia due to forestry activities and competition from the Barred owl, which only recently migrated to the area. Only [six wild owls](https://www.cbc.ca/news/canada/british-columbia/spotted-owl-protections-bc-new-chick-breeding-program-1.5131548) have been confirmed to remain in BC, located exclusively within old growth forest in the lower Fraser Basin. This is thought to represent a decline of around 99\% from their precolonial numbers.
The BC government is attempting to protect remaining owl habitat and increase owl numbers through a [captive breeding and release program](https://www2.gov.bc.ca/gov/content/environment/plants-animals-ecosystems/species-ecosystems-at-risk/implementation/conservation-projects-partnerships/northern-spotted-owl).

The image below shows two different spotted owl nests. The left panel is a "Top Cavity" nest, while the right panel is a "Platform" nest.
![Spotted Owl with juveniles in a top cavity nest (Credit Jared Hobbs)](nest.png)


The number of owls in Washington and Oregon is much higher, where the owls are considered threatened (as the population is low and decreasing), but not endangered. To identify potential owl habitat for protection from harvesting, it is necessary to characterize owl habitat.

## Explanation of the data:
Below a dataset is presented that includes characteristics of nearly 2000 Spotted Owl nesting trees in Oregon and Washington. This data contains values for: 
1. **Site**: The location where the nest was observed. "Olympic" -- Olympic Peninsula, "Interior" -- within the rain shadow of the Cascade mountain range, "CoastN" -- Nothern coast of Wa. and Northern Or., and "Coast S" -- Southern coast of Southern Or. and Northern Ca. 

2. **Nest**: The type of nest. "TopCavity" -- a nest within the hollowed out cavity at the top of a broken tree, "SideCavity" -- a nest within a cavity on the side of a tree, and "Platform" -- a nest perched on the limbs of a tree.

3. **DBH**: The diameter at breast height of the nesting tree in *centimeters*

4. **Stage**: The life stage of a tree on a scale between 1 and 7. Values of 1 and 2 represent living trees respectively while 3-7 represent dead trees in progressive decay. The image below indicates the meaning of "Stage"

5. **Ht**: The height of the nesting tree in *meters*

![Decay stages of trees (Credit Plos ONE)](treedecay.png)


### In this lab, you will apply descriptive statistics, ANOVA, and the Tukey post-hoc test to this dataset to determine the types of trees Northern Spotted Owls prefer for nesting.

# Questions:

## 1. First we will load the data and conduct descriptive statistics to determine which species of trees owls prefer to nest in. 

### (a) Load the dataset `nestingTrees.csv` into a dataframe called `df`, then investigate the data using `head` and `tail`. 

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_that("df should contain the data from nestingTrees.csv", {
    expect_equal(digest(dim(df)),'bd70088e790ba15dc7ecde85b2f213f7')
})

### (b) What tree species are contained in the dataset? Name these to a variable called `species`
You can find the unique tree species using the function `unique` after selecting the `Tree` column from `df`.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_that("All tree species should be named to a variable 'species'", {
    expect_equal(digest(species),'85ec3ec7c19a0654bdd8542184288d27')
})

### (c) What is the most common tree species for owls to nest in? Assign this species to a variable called `commonTree`. What percentage (between 0 and 100) of all nests are in this tree species? Assign this percentage to a variable called `percentTree`.
You can apply `table(x)` to count unique occurrences in a vector `x`. 

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_that("The percentage of nests in the most common tree species should be assigned to `percentTree`", {
    expect_equal(digest(unname(percentTree)),'916925fd50b6c5aaeb0422b00c1d8b31')
})

### (c) Now select out the rows within `df` that correspond to the most common tree species for owl nesting and include these in a new dataframe `df1`. Using `df1`, calculate the mean diameter at breast height (`DBH`) and its standard deviation. Assign these values respectively to `meanDBH` and `sdDBH`. Similarly obtain the mean tree height (`Ht`) and its standard deviation and assign these values to `meanHt` and `sdHt`. 
Hint: you can select a subset of a dataframe `df` where a vector `mask` contains `TRUE` using `df1 = subset(df,mask)`. The functions for the mean and standard deviation are `mean` and `sd`.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_that("Variables meanDBH, sdDBH, meanHt, and sdHt should be correctly computed.", {
    expect_equal(digest(meanDBH),'f5bbaa09f8a6b6b9d8b11894ebba78af')
    expect_equal(digest(sdDBH), '053e6145fe82c986c8006286fc10d968')
    expect_equal(digest(meanHt),'7b781b379b0cab3cc5868676258128e8')
    expect_equal(digest(sdHt),'dbefb6980e223f3b2785ee7c2e4bde7a')
})

### (d) Calculate the coefficient of variation for both tree height and diameter for this most common tree species. Assign these to variables cvHt and cvDBH respectively. Are tree diameters or heights more variable among the samped nesting trees? Set `moreVary = 0` if diameters are more variable and `moreVary=1` if heights are more variable.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_that("Variables cvDBH, cvHt, and moreVary should be correctly computed.", {
    expect_equal(digest(cvDBH),'537e9e1ba3d370c6be72bd14a92c6b9a')
    expect_equal(digest(cvHt), '0583809d757b2d3b1de966578765df4b')
    expect_equal(digest(moreVary),'908d1fd10b357ed0ceaaec823abf81bc')
})

#### (e) What is the most common stage of decay (`Stage`, indicated in the figure above) among trees which owls build nests in? Assign this answer (as an integer between 1 and 7) to the variable `commonStage`. 
You can again use `table` to count occurrences.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_that("The most common tree stage for nesting trees should be named commonStage", {
    expect_equal(digest(commonStage),'db8e490a925a60e62212cefc7674ca02')
})

### (f) Finally, what is the most common nest type? Options are `"Platform"`, `"SideCavity"`, and `"TopCavity"`. Assign the most common nest type to a variable called `commonNest`.  What percentage of all nests in the dataset are this nest type? Name this to a variable called `percentNest`.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_that("The most common nest type is identified properly", {
    expect_equal(digest(commonNest),'65bf880f72932460d6dab31f857deacf')
})
test_that("The percent of all nests of this type is calculated correctly", {
    expect_equal(digest(unname(percentNest)),'b50efca909c244fe8644d1e93406ecdb')
})

## (g) Summarize the results of your descriptive statistics. What species of trees do owls prefer to nest in? What life stage are these trees in? What types of nests do owls most commonly build in these trees?

YOUR ANSWER HERE

## 2. Now we will investigate whether the diameter and height of nesting trees affects the types of nests owls make in them using analysis of variance (ANOVA).

#### a) For the most common nesting tree species (`Tree`) and life stage (`Stage`), form a boxplot showing the tree diameter on the y axis versus the nest type on the x axis. Label your axes with units as appropriate. Add a plot title.
You can extract the appropriate data from `df` with the `subset` function and `mask = (df$Stage==commonStage) & (df$Tree==commonTree)`.
You can use `boxplot(y ~ x, data=yourDataFrame, main='a title for your plot', xlab='your x axis label', ylab='your y axis label')` with appropriate substitutions.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

### b) Now we will investigate whether the mean diameters of nesting trees differs according to Nest Type using ANOVA. Does the data indicated in your boxplot meet the requirements of ANOVA? Explain this, then write the null and alternate hypotheses for Anova, being careful to define all variables.
You can render equations in markdown by surrounding them by either single or double dollar signs, as in `$ F = m a $`. Double dollar signs give equations their own line. Single dollar signs make them inline with text. You can write a Greek $\mu$ ("mu") with a subscript (i.e., $\mu_1$) as `\mu_1`. This method of writing math is called [MathJax](https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference). Use MathJax to render your Null and Alternate hypotheses.


YOUR ANSWER HERE

### c) Now conduct ANOVA on the data in your boxplot to determine if you can reject the null hypothesis. Conduct the calculation and print the summary.
The function for ANOVA is `aov`. It operates as `A = aov(y~x,data=dataset)` with approprate subsitutions. As written here the ANOVA results are stored in `A`. You can summarize the results with `summary(A)`.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

### d) Summarize the results of the ANOVA in several sentences. Do you accept or reject the null hypothesis at $\alpha=0.05$? What numerical values in the ANOVA summary lead you to this choice? What does this imply for the control of tree diameters on the nest types owls construct?

YOUR ANSWER HERE

### e) Now conduct a post-hoc Tukey test on the ANOVA output. Obtain the Tukey results in a code cell, then interpret and summarize these results in a markdown cell, making specific reference to the relevant numerical values in the Tukey output.
Use the `TukeyHSD` function on the earlier ANOVA output.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

YOUR ANSWER HERE

### f) Now conduct ANOVA to determine if nest type (`Nest`) is related to tree height (`Ht`) for the most common nesting tree species and life stage (the `df2` dataframe). Perform the calculation in one code cell and summarize the results in another markdown cell, making reference to the appropriate numerical values. Use $\alpha=0.05$.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

YOUR ANSWER HERE

## g) In 2-3 sentences, summarize your findings with regard to (i) tree characteristics Northern Spotted owls select for nesting, (ii) the types of nests these owls most commonly build in these trees, and (iii) how ANOVA informs the control of tree height and diameter on the types of nests owls construct.

YOUR ANSWER HERE