# Sky Mermaids and Cloud Barnacles

In [1]:
from opyn.stats.observationalstudies import TwoByTwo
from opyn.pandasloader import PandasLoader

## Summary

The aim of this study is to analyse whether there is an association between the colour of a sky mermaid's tail and the density of cloud barnacle growth on their tails.

## Background

Sky mermaids are *fabulosa creaturae* that are often spotted in the sky.
They are known to enjoy spending their times perched on clouds, which can lead to the growth of cloud barnacles on their tail.
It has been speculated that red tailed sky mermaids are more prone to developing a higher density of cloud barncale growth, compared to the more common blue tailed variety.

## Introduction

Data were obtained on the density of cloud barnacle growth of 99 sky mermaids.
The numbers suffering from high cloud barnacle growth who have red tails and blue tails were counted.
The data was sourced from watching too much children TV and needing an example to show how a self-developed `Python` module operates.

## Setting Up The Analysis

We can use the `PandasLoader` class in `opyn` to import the data, which is saved as a **csv** file.

In [2]:
data = PandasLoader().get("skymermaids_cloudbarnacles")
data

Unnamed: 0,tail colour,barnacle density,count
0,red,high,14
1,red,low,11
2,blue,high,19
3,blue,low,55


The data is stored in a format that is usable by software such as **SPSS**.
Columns **tail colour** and **barnacle density** represent the exposure and disease categories, respectfully, and the **count** column shows the number of observations for each pairing.

We can use the class method `from_dataframe` to load the data into an initialised instanced of `TwoByTwo`. 
We first need to replace the labels with integers so, if we were to crosstabulate the dataframe, it would appear like **table 1**.

**Table 1.**

| Outcome         | Disease | No disease | Total |
|-----------------|---------|------------|-------|
| **Exposure**    |         |            |       |
| **Exposed**     | *a*     | *b*        | *a+b* |
| **Not exposed** | *c*     | *d*        | *c+d* |

We have deemed those sky mermaids with red tails as being *exposed*, and those with a high density of cloud barnacle as *disease*.
Therefore, we replace labels **red** and **high** to 1, and labels **blue** and **low** to 2.

In [3]:
new_labels = {"red": 1, "blue": 2, "high": 1, "low": 2}
data.replace(new_labels, inplace=True)

We can now use this relabelled dataframe to initialise the object.

In [4]:
skymermaids: TwoByTwo = TwoByTwo.from_dataframe(
    df=data,
    exposure="tail colour",
    outcome="barnacle density"
)
skymermaids.show_table(show_row_totals=True)

Outcome,Disease,No Disease,Total
Exposure,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Exposed,14,11,25
Not Exposed,19,55,74


We can reintroduce the more meaningful exposure and outcome labels by setting the `row_labels` and `col_labels` attributes.
*(This step is optional.)*

In [5]:
skymermaids.row_labels = ["red tail", "blue tail"]
skymermaids.col_labels = ["high density", "low density"]
skymermaids.show_table(show_row_totals=True)

Outcome,high density,low density,Total
Exposure,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
blue tail,19,55,74
red tail,14,11,25


Notice that the the rows have switched position: This is because **blue tail** is placed before **red tail** in the alphabet, and **Pandas** sorts it indices.
This will not affect the calculations, because the data was prepared by relabelling the variables prior to initialising the object.

## Results

### Test of No Association

A **chi-squared** test of no association was carried out.
The expected frequences under the null hypothesis of no association are shown below.

In [6]:
skymermaids.expected_freq(show_row_totals=True)

Outcome,high density,low density,Total
Exposure,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
blue tail,24.666667,49.333333,74.0
red tail,8.333333,16.666667,25.0


The number of observed red tailed sky mermaids suffering from a high density of cloud barnacle growth **(14)** is greater than expected (**8.33**).

The values of the $\chi^{2}$ is shown below.

In [7]:
skymermaids.chi2_contribs()

Outcome,high density,low density
Exposure,Unnamed: 1_level_1,Unnamed: 2_level_1
blue tail,1.301802,0.650901
red tail,3.853333,1.926667


The null distribution of the test statistic is approximately **chi-squared** with degrees of freedom ν = **1**.
Since all expected frquencies are at least **5**, the approximation is adequate.

The value of the test statistic is **7.73**, which corresponds to a **p**-value of **0.005**.

In [8]:
skymermaids.chi2_test()

Unnamed: 0,chisq,pval,df
ChiSqTest,7.732703,0.005423,1


Given **p** < **0.01**, there is strong evidence against the null hypothesis of no association.
The study provides strong evidence of an association between the colour of a sky mermaid's tail and the density of cloud barnacle growth.

### Measures of association

The estimated **relative risk** is **2.18**, with **95%** confidence interval **(1.30, 3.67)**.

In [9]:
skymermaids.relative_risk()

Unnamed: 0,estimate,ese,lcb,ucb
RelativeRisk,2.181053,0.265606,1.295931,3.670714


The estimated **odds ratio** is **3.68**, with **95%** confidence interval **(1.43, 9.49)**.

In [10]:
skymermaids.odds_ratio()

Unnamed: 0,estimate,ese,lcb,ucb
OddsRatio,3.684211,0.482857,1.429999,9.491901


Both the odds ratio and relative risk are greater than **1**, with **95%** confidence interval located well above **1**.
Given this, it is implausible that that either measure of association is **1**.
The results show there is a positive association between a sky mermaid's tail colour and cloud barnacle growth.

## Discussion

A cohort study of the hypothesised association between the colour of a sky mermaid's tail and the density of cloud barnacle was undertaken.
A **chi-squared** test of no asociation returned a **p**-value of **0.01**, suggesting there is strong evidence of an association.
The association is positive, with an odds ratio of **3.68** and a relative risk of **2.18**.
Overall, we conclude that sky mermaids with red tails are approximately **118%** more likely to develop a high density of cloud barnacle growth, compared to those with blue tails.