# Ladybird Analysis: Estimating the population mean size of your two-spot ladybirds

## Task 1: Read in and print your groups's data

Using pandas, you are now going to read in the Excel spreadsheet you just created with your group mates.

<div class="alert alert-danger">

Make sure that your excel spreadsheet is in the **Ladybird Analysis Notebooks** folder on Noteable, i.e., the same folder that this jupyter notebook is in.
    
</div>

1. Read in your Excel spreadsheet using the command 
```python
pd.read_excel('ladybird_sizes_X_Y_Z.xlsx')
```
Replace X with your group name (GNU, YAK, FOX, or APE) and Y and Z with your groups' letters. For example, if your group is GNU D and your partnering group is GNU C the filename is `ladybird_sizes_GNU_C_D.xlsx`.

2. Call the DataFrame something sensible, such as `ladybirds`.

3. Print the data to make sure it is okay.

In [None]:
# read in and print your ladybird size dataset

## Task 2: Plot your group's data

Plot your two-spot ladybird sizes in an annotated histogram in the following code cell. See [Coding 3 - Working with data](../Coding%20Practicals%20Notebooks/Coding%203%20-%20Working%20with%20data.ipynb#Visualising-data) for help.

<div class="alert alert-danger">

**Only plot your group's data. We'll compare your group's data with your partnering group's data next week.**
    
For example, if you are the low predation group you might use the command
```python
g = sns.displot(ladybirds['low'])
```
    
</div>

<div class="alert alert-success">

Note: In Task 1 you imported pandas and read in your spreadsheet. Jupyter Notebooks remember that you did this. Which means you DO NOT need to import pandas nor read in your spreadsheet again in any of the following code cells.
</div>

In [None]:
# annotated histogram of two-spot ladybird sizes.

## Task 3: Check for outliers

A histogram allows you to easily spot any outliers; that is data that are **extremely** far from the average. Perhaps the wrong species was measured or you entered 45 instead of 4.5 into the spreadsheet. 

If you are uncertain whether a value is an outlier you should leave it in your dataset. For example, a size of 9mm may seem large, but perhaps you've just found a particularly large two-spot ladybird. Only clearly erroneous values should be removed.

![outliers.png](attachment:outliers.png)

If you think any of the data are outliers you'll need to go back to your spreadsheet in Teams, update the excel spreadsheet and re-upload to Noteable.

## Task 4: Eye-ball estimates of the mean and standard deviation

It is generally a good idea to estimate means and standard deviations by eye before calculating them on a computer. This is so you can check your eye-ball estimates with the actual values output by Python. If they don't match then you know something is wrong: either your estimates or the Python code.

Using your histogram, estimate the mean and standard deviation of ladybird sizes. Remember that a rough estimate of the standard deviation is given by this formula

$$s \approx \frac{\mathrm{max\ value} - \mathrm{min\ value}}{4}$$


> Write your estimates here

## Task 5: Calculate the sample size, mean and standard deviation

Now, using Python code, calculate the sample size, mean and standard deviation of your data in the following code cell to the appropriate number of decimal places. (See Notebook [3.3 - Normal distribution](../Self-study%20Notebooks/3.3%20-%20Normal%20distribution.ipynb#Find-the-sample-size,-mean-and-standard-deviation))

How do they compare to your eye-ball estimates?

In [None]:
# sample size, sample mean and sample standard deviation

## Task 6: Check if your data obey the 68-95-99.7% rule

Now you should check to see if your data are roughly normally distributed.

1. Check if roughly 68% of your data lie within one standard deviation of the mean using Python code.
2. Do you think your data are normally distributed?

<div class="alert alert-info">

To do this task you will need to calculate, using Python code, the range from the mean minus one standard deviation to the mean plus one standard deviation. Then count how many ladybirds had sizes within this range. Is that roughly 68% of your data? 
    
See the example in Notebook [3.3 Normal distribution](../Self-study%20Notebooks/3.3%20-%20Normal%20distribution.ipynb) for how to answer this Task.
</div>

In [None]:
# check if roughly 68% of your data are within one standard deviation of the mean

## Task 7: Calculate the precision of your estimate of the population mean

Calculate the standard error of the mean and the 95% confidence interval of the mean. (See Notebook [3.5 - Estimating a population mean](../Self-study%20Notebooks/3.5%20-%20Estimating%20a%20population%20mean.ipynb#How-to-calculate-the-standard-error-of-the-mean-(SEM)))

In [None]:
# standard error and 95% confidence interval

## Task 8: Report your estimate of the population mean

Write a short sentence below reporting the estimate and precision of the population mean. (See Notebook [3.6 - Reporting a population mean](../Self-study%20Notebooks/3.6%20-%20Reporting%20a%20population%20mean.ipynb#Reporting-the-estimate-of-the-population-mean-and-its-standard-error))

> Report your estimate and precision

## Task 9: Discuss in your group what you think of your data


- How would you improve data collection in the cemetery? What would you do differently? How would you organise your group better?
- What do you think of the quality of your data? Were there problems with measuring precisely and how could you improve that?
- What do you think of the quantity of data? Could you have collected more ladybirds?

<div class="alert alert-info">

These are the types of questions you will need to think and write about in your Group Report.
    
</div>