# Ladybird Analysis

# Compare the mean sizes of low and high predated two-spot ladybird populations

<div class="alert alert-success">

# Part 1: Exploring your data
</div>

## Task 1.1: If you don't have your group's ladybird excel spreadsheet on Noteable, upload it now

Follow these instructions to do this:
1. Go to Learn and click **Open Microsoft Teams classes** in the left-hand panel.
2. Log in to the **Biology 1A Variation (2024-2025) Team**.
3. Click on **Files** and locate your group's spreadsheet. For example, if your group is YAK E and your partnering group is YAK F then your spreadsheet is called `ladybird_sizes_YAK_E_F.xlsx`.
4. Hover your cursor over your spreadsheet name, click on the three dots and select Download. 
5. Return to the **Variation1/Ladybird Analysis/** browser tab running **Noteable**.
6. Click on **Upload** on the right, find your spreadsheet on your laptop, then click on the **blue Upload** button.

<div class="alert alert-danger">

Make sure that your excel spreadsheet is in the **Ladybird Analysis Notebooks** folder on Noteable, i.e., the same folder that this jupyter notebook is in.
    
</div>

## Task 1.2: Read in and print the low and high predation samples to check the data are okay

Using pandas, read in your excel spreadsheet and call it something sensible.

1. To read in excel spreadsheets we use the command `pd.read_excel('filename.xlsx')`. Do this now, calling the DataFrame something sensible, such as `ladybirds`.

2. Print the data to make sure it is okay. You should see two columns headed `low` and `high`. You will probably see `NaN` repeated at the bottom of one of the columns. This isn't a problem; it's just because different numbers of ladybirds were measured in the two cemeteries.

In [None]:
# read and print your ladybird size dataset

## Task 1.3: Plot the samples in a histogram to see how they are distributed

Plot the distributions of the low and high predation samples as histograms in a single annotated graph. 

See [4.1 - Comparing two population means](../Self-study%20Notebooks/4.1%20-%20Comparing%20two%20population%20means.ipynb#First-plot-the-data) for help.

In [None]:
# annotated histograms of samples of two-spot ladybird sizes from low and high predation cemeteries

## Task 1.4: The distributions might be clearer in a boxplot

Your `low` and `high` histograms will probably overlap quite a lot. This makes it hard to see if the means of the two samples are different.

If that is the case, a boxplot is probably a better way to visualise your data as it hides individual data points and instead uses a 5-number-summary to summarise the distribution of your samples. 

Plot the distributions of the low and high predation samples in an annotated boxplot. 

See [4.1 - Comparing two population means](../Self-study%20Notebooks/4.1%20-%20Comparing%20two%20population%20means.ipynb#First-plot-the-data) for help.

In [None]:
# a boxplot to visually compare ladybird sizes from low and high predation cemeteries 

## Task 1.5: What does the box and the various lines in a boxplot represent?

If you don't know try googling the answer. Write your answer in the following markdown cell.

> Write your answer here. 

## Task 1.6: Eye-ball estimates of the means and standard deviations

It is generally a good idea to estimate means and standard deviations by eye before calculating them on a computer. This is so you can check your eye-ball estimates with the actual values output by Python. If they don't match then you know something is wrong: either your estimates or your code.

Using your histograms or boxplots, estimate the means and standard deviations of ladybird sizes from both cemeteries. Remember that a rough estimate of the standard deviation is given by this formula

$$s \approx \frac{\mathrm{max\ value} - \mathrm{min\ value}}{4}$$


> Write your estimates here

## Task 1.7: Calculate the sample sizes, means and standard deviations

Now, using Python code, calculate the sample sizes, means and standard deviations of the two samples and print to the appropriate number of decimal places.

See Notebook [4.1 - Comparing two population means](../Self-study%20Notebooks/4.1%20-%20Comparing%20two%20population%20means.ipynb#Sample-means-and-standard-deviations) for example code.

How do they compare to your eye-ball estimates?

In [None]:
# sample sizes, sample means and sample standard deviations of both samples

## Task 1.8: Calculate the difference in the sample means

Using the sample means you just calculated, calculate, using Python code, the difference in sample means. 

See Notebook [4.1 - Comparing two population means](../Self-study%20Notebooks/4.1%20-%20Comparing%20two%20population%20means.ipynb#Sample-means-and-standard-deviations) for the code to do this.

In [None]:
# calculate your observed difference in sample means

<div class="alert alert-success">

# Part 2: Two-sample *t*-test
</div>

Having looked at your data and calculated the difference in the sample means, you next need to work out how likely that difference is assuming the null hypothesis were true.

If that difference is **likely** under the null hypothesis then you have insufficient evidence to reject the null hypothesis.

On the other hand, if that difference is **unlikely** under the null hypothesis then you have sufficient evidence to reject the null hypothesis. 

How likely the observed difference in sample means is under the null hypothesis is called a *p*-value. 

This is what you are going to calculate now.

## Task 2.1: Perform a two-sample *t*-test

Perform a two-sample *t*-test on your data using Python code. To do this, copy, paste and adapt the code from [4.1 - Comparing two population means](../Self-study%20Notebooks/4.1%20-%20Comparing%20two%20population%20means.ipynb#The-two-sample-t-test).

In [None]:
# perform a two-sample t-test on your data

## Task 2.2: Reject or not reject your null hypothesis

Based on your *p*-value, do you reject or fail to reject your null hypothesis that mean ladybird sizes are the same in cemeteries with low and high predation rates? Write your answer below.

Also see [4.1 - Comparing two population means](../Self-study%20Notebooks/4.1%20-%20Comparing%20two%20population%20means.ipynb#To-reject-or-not-reject-the-null-hypothesis) for more discussion about rejecting or not rejecting a null hypothesis.

> Do you reject or not reject the null hypothesis? Explain why.

## Task 2.3: Report the result of your test

There are three possible outcomes of your analysis.

1. You fail to reject the null hypothesis. Which means you have no evidence that mean ladybird sizes differ between Edinburgh cemeteries.

2. You reject the null hypothesis but mean ladybird sizes are smaller in the high predation cemetery than in the low predation cemetery. Which means you have evidence that mean ladybird sizes differ between Edinburgh cemeteries. But that this difference is not due to Harlequin ladybirds preferring to eat small two-spot ladybirds. Something else must be causing this difference.

3. You reject the null hypothesis and mean ladybird sizes are larger in the high predation cemetery than in the low predation cemetery. Which means you have evidence that mean ladybird sizes differ between Edinburgh cemeteries and that this difference is due to Harlequin ladybirds preferring to eat small two-spot ladybirds.

Report the outcome of your test in words, as you might write in a report.

See [4.1 - Comparing two population means](../Self-study%20Notebooks/4.1%20-%20Comparing%20two%20population%20means.ipynb#Reporting-the-result-of-the-test) for an example. 

> Report the outcome of your test.