# *Pseudomonas fluorescens* analysis

# Comparison of diversity scores between control (control) and experimental (experimental) plates

<div class="alert alert-success">

# Part 1: Exploring your data
</div>

## Task 1.1: Move your team's excel spreadsheet from Microsoft Teams to Noteable

Your data are in an excel spreadsheet saved on Teams. You need to download it from Teams onto your laptop then upload it to Noteable.

Follow these instructions to do this:
1. Go to Learn and click **Open Microsoft Teams classes** in the left-hand panel.
2. Log in to the **Biology 1A Variation (2022-2023) Team**.
3. Click on **Files** and locate your team's spreadsheet. For example, if you are in Lab 3, sitting at Bench 5 or Bench 6 then your spreadsheet is called `Lab 3 Benches 5 + 6.xlsx`.
4. Click on the **three dots** to the right of your spreadsheet then click on **Download** in the popup.
5. Return to the **Variation1/Pseudomonas Analysis/** browser tab running **Noteable**.
6. Click on **Upload** on the right, find your spreadsheet on your laptop, then click on the **blue Upload** button.
7. Make sure your spreadsheet is saved in the **Variation1/Pseudomonas Analysis** folder. That is, the same folder this Notebook is in.

## Task 1.2: Read in and print the diveristy scores to check the data are okay

Using pandas, read in your excel spreadsheet and call it something sensible.

1. To read in excel spreadsheets we use the command `pd.read_excel('filename.xlsx')`. Do this now, calling the DataFrame something sensible.

2. Print the data to make sure it is okay. You should see two columns headed `control` and `experimental`.

<div class="alert alert-danger">

If you get **FileNotFoundError** make sure your spreadsheet is saved in the folder **Variation1/Pseudomonas Analysis**, i.e., the same one this notebook is saved in.
</div>

In [17]:
# read and print your pseudomonas cultures' diverity scores

## Task 1.3: Plot your data in histograms to see how they are distributed

Plot the distributions of the control and experimental diversity scores as histograms in a single annotated graph. 

See [2.7 - Visualising data](../Self-study%20Notebooks/2.7%20-%20Visualising%20data.ipynb) for help.

In [18]:
# annotated histograms of pseudomonas diversity scores 

## Task 1.4: The distributions might be clearer in a boxplot

As there are so few data points in your data the histograms look quite sparse. A boxplot is probably a better way to visualise your data. 

Plot the distributions of the control and experimental diversity scores in an annotated boxplot. 

See [4.5 - Visual comparison](../Self-study%20Notebooks/4.5%20-%20Visual%20comparison.ipynb) for help.

In [19]:
# annotated boxplot of pseudomonas diversity scores 

## Task 1.5: Eye-ball estimates of the means and standard deviations

It is generally a good idea to estimate means and standard deviations by eye before calculating them on a computer. This is so you can check your eye-ball estimates with the actual values output by Python. If they don't match then you know something is wrong: either your estimates or your code.

Using your histograms or boxplots, estimate the means and standard deviations of diversity scores from both cultures. Remember that a rough estimate of the standard deviation is given by this formula

$$s \approx \frac{\mathrm{max\ value} - \mathrm{min\ value}}{4}$$


> Write your estimates here

## Task 1.6: Calculate the sample sizes, means and standard deviations

Now, using Python code, calculate the sample sizes, means and standard deviations of the two samples and print to the appropriate number of decimal places.

See Notebook [4.2 - Comparing two population means](../Self-study%20Notebooks/4.2%20-%20Comparing%20two%20population%20means.ipynb#Sample-means-and-standard-deviations) for example code.

How do they compare to your eye-ball estimates?

In [None]:
# sample sizes, sample means and sample standard deviations of both samples

## Task 1.7: Calculate the the difference in sample means

Using the sample means you just calculated, calculate, using Python code, the difference in sample means. 

See Notebook [4.2 - Comparing two population means](../Self-study%20Notebooks/4.2%20-%20Comparing%20two%20population%20means.ipynb#The-test-statistic) for the code to do this.

In [3]:
# difference in sample means

<div class="alert alert-success">

# Part 2: Statistically test the null hypothesis that diversity scores are the same in the control and experimental cultures
</div>

## Task 2.1: Perform a Mann-Whitney *U*-test

You used a *t*-test to test if mean ladybird sizes were the same in low and high predation cemeteries. *t*-tests can only be used on data that are normally distributed. As ladybird sizes are normally distributed using a *t*-test is acceptable.

Diversity scores, on the other hand, are usually not normally distributed. Which means you shouldn't use a *t*-test to compare if diversity scores are the same in control and experimental cultures. Instead you should use something called the Mann-Whitney *U*-test.

Although the mathematical details are different, the Mann-Whitney *U*-test still calculates a *p*-value for your data. And, as for the *t*-test, we reject the null hypothesis if *p* < 0.05 and fail to reject it if *p* > 0.05.

To use a Mann-Whitney *U*-test in Python you first have to import it with the command

```python
from scipy.stats import mannwhitneyu
```

And then apply it to your data with the command

```python
U, p = mannwhitneyu(DataFrame['control'], DataFrame['experimental'])
```

<div class="alert alert-info">

You must replace the word `DataFrame` in the above command with whatever you called your dataset when you read it in in Task 1.2 above.
    
</div>

Note that the statistic is now called `U` rather than `t` (although, of course, it doesn't matter how we name our Python variables), and we have removed the `nan_policy='omit'` as that is not applicable to the Mann-Whitney *U*-test

Now perform a Mann-Whitney *U*-test on your data using Python code.

In [6]:
# perform a Mann-Whitney U-test on your data

## Task 2.2: Reject or fail to reject your null hypothesis

Based on your *p*-value and a confidence level of $\alpha=$ 0.05. Do you reject or fail to reject your null hypothesis that diversity scores are the same in the control and experimental cultures? Write your answer below.

Also see [4.4 - Two sample *t*-test in practice](../Self-study%20Notebooks/4.4%20-%20Two%20sample%20t-test%20in%20practice.ipynb#To-reject-or-not-reject-the-null-hypothesis) for more discussion about rejecting or not rejecting a null hypothesis.

> Do you reject or not reject the null hypothesis? Explain why.

## Task 2.3: Report the result of your test

Report the outcome of your test in words, as you might write in a report.

See [4.4 - Two sample *t*-test in practice](../Self-study%20Notebooks/4.4%20-%20Two%20sample%20t-test%20in%20practice.ipynb#Reporting-the-result-of-the-test) for an example. 

> Report the outcome of your test.