# Butterfly egg populations **[20 points]**

In this notebook we will analyse butterfly egg populations to try to understand population dynamics and butterfly reproductive behaviour. The butterfly *Euphydryas gillettii* was introduced to Gunnison
County, Colorado, USA. This population, located in the southern Rocky Mountains, is south of the butterflyâ€™s natural range, which extends from southern Alberta to northern Wyoming. This butterfly lays it eggs on leaves of *Lonicera involucrata* in clusters (see picture). Multiple egg clusters may be found on each leaf. We will use field data collected from Colorado over 29 years, provided to me by Prof. Carol Boggs of the SEOE.

<img src="./images/gillettii.jpg" width = 600  >

<img src="./images/CBoggs_ButterflySampleSite.jpg" width = 600>

We want to understand butterfly population dynamics. Imagine being in the field and finding a single leaf with 7 egg clusters on it! You might ask yourself, is this a particularly special leaf that the butterflies would favour depositing their eggs on? Maybe the female butterflies know which leaves increase the chance of their caterpillars surviving and so seek it out, or maybe it's just random chance that they all chose this leaf. We're going to test the hypothesis that this is just random chance and not motivated by butterfly behaviour.

<img src="./images/CBoggs_EggClusters.jpg" width = 600>

<img src="./images/CBoggs_Caterpillar.jpg" width = 600>

First off, let's import the packages. We're going to use: `pandas`, `matplotlib.pyplot`, `numpy`, and `math`.

<font color=goldenrod>**_Code for you to write_**</font> **[2 points]**

Import the packages. Use aliases such as `import package as alias` if you like.

Now read in the egg cluster data from the file `NumberOfEggClusters_CarolBoggs.csv` which is in the `data` directory.

<font color=goldenrod>**_Code for you to write_**</font> **[1 point]**

Read in the csv file.

<font color=goldenrod>**_Code for you to write_**</font> **[1 point]**

Check what's inside the dataframe using `.head()` or `print`

In [None]:
print (butterflies['Number of Egg Clusters on Leaf'])

### Viewing the frequencies of egg clusters

Plot a histogram of the number of egg clusters per leaf.

<font color=goldenrod>**_Code for you to write_**</font> **[4 points]**

- Using `plt.subplots()`, plot a histogram of `Number of Egg Clusters on Leaf`
- Use the `bin_list` provided for the bins, this will plot the histogram colours at the centre of each bin (will help with plotting later on)
- Plot the histogram on a log scale using `log=True`
- Add x and y-labels with `set_xlabel` and `set_ylabel`
- Add a title
                        

In [None]:
bin_list = np.linspace(-0.5, 7.5, 9)
print(bin_list)



## Probability distribution functions

We're going to test the hypothesis that the number of butterfly egg clusters on leaf is systematic and thus controlled by some behaviour or external factor, such as butterflies avoiding leaves with pre-existing egg clusters, or particularly tasty leaves. Our null (default) hypothesis is that the number of eggs clusters on a leaf is by chance alone. In this case, the egg clusters have a poisson distribution. 

### Theoretical

The **Poisson distribution** gives the probability that an event (with two possible outcomes) occurs $k$ number of times in an interval of time where $\lambda$ is the expected rate of occurance. The Poisson distribution is the limit of the binomial distribution for large $n$. So if you take the limit of the binomial distribution as $n \rightarrow \infty$ you'll get the Poisson distribution:

$$P(k) = e^{-\lambda}\frac{\lambda^{k}}{k!}$$

Let's code up a function to calculate the poisson probability for some k and some lambda. We're going to do lots of testing to ensure it works.

**DO THIS STEP BY STEP, TESTING EACH TIME. IT WILL HELP YOU TO BE SUCCESSFUL**

<font color=goldenrod>**_Code for you to write_**</font> **[5 point]**
1. Create a function that reads in `k` and `lam` as variables **(`lambda` is a special word and can't be used as a variable name)**
2. Write a docstring, this function calculates the probability of getting k observations particular outcomes when the expected rate is lambda
3. **Test your function.** Pass k=1 and lam=2. **If your function returns those numbers then move on to the next step**
4. Code $$prob = e^{-\lambda}$$ using `np.exp(-1*lam)`
5. Change your function to return `prob`
6. **Test your function.** Pass k=1 and lam=2. The answer to $$e^{-2}$$ = 0.135. **If your function returns this then move on to the next step.**
7. Now multiply this by $$\frac{\lambda^{k}}{k!}$$ which is coded as `* (lam**k)` and divide this by $${k!}$$ which is coded as `math.factorial(k)`. (multiplication is done with the star `*`)
8. **Test your function.** Pass k=1 and lam=2. The correct answer is 0.2707... If you get this then your function is correct! 

Test your function here. If k=1 and lambda=2, then the correction poisson probability is 0.2707...

Here's a box for testing your function.

Now we're going to use the function to calculate the probabilities of different `k` values, or numbers of egg clusters per leaf.

<font color=goldenrod>**_Code for you to write_**</font> **[4 point]**
- Use `np.arange` to create a variable of `number_clusters_per_leaf` of egg numbers between 0 and 7, inclusive. This is our range of `k` values
- Create an empty list called `probabilities` using `[]`
- Create a variable called `egg_rate_lambda` and set it equal to 1 for now
- Create a for loop, for each value of `k` in `number_clusters_per_leaf` and your chosen lambda, use your function `poisson_probability` to calculate the probability
- Append that propability to the `probabilities` list using `probabilities.append(value)`
- Print your `probabilities`

<font color=blue>**_Hint_**</font> 

The following are the poisson probabilities for each value of `k` with `lam=1`. Check your answers against this.

| k | Poisson Probability |
| --- | --- |
| 0 | 0.368 | 
| 1 | 0.368 |
| 2 | 0.184 |
| 3 | 0.061 |
| 4 | 0.015 |
| 5 | 0.003 |
| 6 | 0.001 |
| 7 | 0.00007 |

Now we're going to plot the data and the modelled probabilities.

<font color=goldenrod>**_Code for you to write_**</font> **[2 points]** 

- Copy your histogram from above down here
- Use `ax2 = ax1.twinx()` to create a second axis object that mirrors that of `ax1`
- On `ax2`, plot `number_clusters_per_leaf` versus `probabilities` as a line

<font color=blue>**_Hint_**</font>

The final figure should look something like this
<img src="./images/ExampleHistogram_ButterfliesPoisson.png" width = 600>

<font color=goldenrod>**_Code for you to write_**</font>
- Repeat the above steps and change your `egg_rate_lambda` to find a value that causes the modelled (green) values to match the empirical (red) data

<font color=red>**_Question_**</font> **Write your answer in the box below**

What value of `egg_rate_lambda` best matches the data?

### Our conclusion about butterfly behaviour

**From this, we can interpret that the butterflies lay their eggs at random on leaves**, and there isn't anything special about a leaf with 7 egg clusters, it's just statistically rare.

### Turn in this notebook

Save your completed notebook, print the file to PDF, and upload the PDF on Blackboard.