# Project 1: Hierarchical models

## Background

You will be working with the reedfrogs tadpole dataset introduced [here](https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1890/04-0535) and analyzed in [Statistical Rethinking 2nd Ed. Chapter 13](https://github.com/pymc-devs/resources/blob/master/Rethinking_2/Chp_13.ipynb).

The experiment intended to study how an ecological mechanism called predation-induced hatching, affected the survivability of tadpoles until they morphed into frogs. Predation-induced hatching is a phenomenon in which larvae hatch into tadpoles early to avoid being eaten by acquatic predators. The experiment intended to study if hatching early led to decreased survivability in the long run because the tadpoles would be too small to escape predators.

In their experimental setup, they prepared many water tanks, in which they placed different numbers of tadpoles, with different sizes, in the presence or absence of predators and recorded the number of tadpoles that survived until morphed into frogs.

## Project

You will take the dataset (stored in `"data/reedfrogs.csv"`) and will attempt to answer the following questions:

1. Does size alone influence survivability (ignoring the presence or absence of predators)?
2. Does the presence of predators affect survivability?
3. In the presence of predators, does size play an important role in survivability? Attempt to build a model in which size and predation interact to explain survival probability.

To do this, we recommend that you work with a hierarchical model of varying intercepts:

$$
S_{tank} \sim Binomial(p_{tank}, n_{tank}) \\
p_{tank} = logistic( \alpha + \sigma_{\alpha} z_{tank} + \beta x_{tank}  )\\
z_{tank} \sim Normal(0, 1)\\
\sigma_{\alpha} \sim Exponential(1)\\
\alpha \sim Normal(0, 1.5)\\
\beta \sim Normal(0, 1.5)
$$

where $S_{tank}$ is the number of tadpoles that survived in a tank that began with $n_{tank}$ tadpoles. $z_{tank}$ will be called "random_effect" across tanks, $\sigma_{\alpha}$ will denote the variability across tanks, $\alpha$ will be the group mean intercept, and the $\beta$ are the slopes assigned to other fixed effects that you wish to model (e.g. `pred` or `size`).

Here is a quick question you can ask yourself by the way: do you recognize the type of parametrization we used here? How is it usually called in statistical linguo?

From the workshop's point of view, it is also important that you try to answer the following questions:

4. How does $\sigma_{\alpha}$ change as you add fixed effects to the model (i.e the `predation` and `size` variables)?
5. Predict the probability of survival of 100 small tadpoles in a new tank, in the presence of predators, given that 50 of them survived in our experiment