# Intertidal Seaweed Grazers -- A Randomized Experiment

# Case Study Questions

1. What are the impacts of the three different grazers on regeneration rates of seaweed?  Which consumes the most seaweed?
2. Do the different grazers influence each other?
3. Are the grazing effects similar in all microhabitats?


# Setup

In [None]:
# 3rd party library imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.stats import t as tdist
import seaborn as sns
import statsmodels.formula.api as smf
from statsmodels.stats.anova import anova_lm

sns.set()
plt.rcParams['figure.figsize'] = [8.0, 4.8]

# Data Preparation

In [None]:
df = pd.read_csv('case1301.csv')

# add a column for the ordinal average response by block
# use this to make a plot that seems to uniformly rise left to right
dfb = df.groupby('Block').mean(numeric_only=True).sort_values(by='Cover').reset_index()

df['BlockOrd'] = 0
for idx, row in dfb.iterrows():
    df.loc[df['Block'] == row['Block'], 'BlockOrd'] = idx + 1

# Analysis of the seaweed grazer data

## Initial assessment of additivity, outliers, and the need for transformation

In [None]:
fig, ax = plt.subplots()
data = df.groupby(['Block', 'Treat']).mean().reset_index().sort_values(by='BlockOrd')
g = sns.lineplot(data=data, x='Block', y='Cover', hue='Treat', sort=False)
ax.set_xlabel('Block Number (ordered from smallest to largest average response)')
ax.set_ylabel('Percentage Seaweed Regrowth')

title = (
    'Average Percentages of seaweed regeneration '
    'with different grazers allowed'
)
_ = ax.set_title(title)


In [None]:
formula = 'Cover ~ Block * Treat'
sat_model_percent = smf.ols(formula=formula, data=df).fit()

fig, ax = plt.subplots()
g = sns.scatterplot(data=df, x='Cover', y=sat_model_percent.resid)
ax.set_xlabel('Fitted Percent Regeneration')
ax.set_ylabel('Residuals')
_ = ax.set_title('Residual plot from the saturated model fit to the seaweed grazer data')

# Transformation

In [None]:
df['rr'] = np.log(df['Cover'] / (100 - df['Cover']))

In [None]:
fig, ax = plt.subplots()
data = df.groupby(['Block', 'Treat']).mean().reset_index().sort_values(by='BlockOrd')
g = sns.lineplot(data=data, x='Block', y='rr', hue='Treat', sort=False)
ax.set_xlabel('Block Number (ordered from smallest to largest average response)')
_ = ax.set_ylabel('Regeneration Ratio Seaweed Regrowth')


### The analysis of variance table from the fit to the saturated model


In [None]:
formula = 'rr ~ Block * Treat'                                                     
sat_model = smf.ols(formula=formula, data=df).fit()                                
sat_table = anova_lm(sat_model)                                                    
print(sat_table)   

In [None]:
add_model = smf.ols(formula='rr ~ Block + Treat', data=df).fit()                                
add_table = anova_lm(add_model) 
print(add_table)

In [None]:
anova_lm(add_model, sat_model)

We consider the interaction term in the saturated model

$
H_0: \mu\{Y|BLOCK, TREAT\} = BLOCK + TREAT
$

$
H_a: \mu\{Y|BLOCK,TREAT\} = BLOCK + TREAT + (BLOCK \times TREAT)
$

<a id="interaction_effects"/>
We conclude there is weak evidence for the interaction term 
($F_{35,48} = \frac{\frac{29.77 - 14.54}{83 - 48}}{14.54 / 48} = 1.4369$, $p = 0.1209$).

We consider the treatment effect in the additive model.

$
H_0: \mu\{Y|BLOCK,TREAT\} = BLOCK + TREAT
$

$
H_a: \mu\{Y|BLOCK, TREAT\} = BLOCK
$

In [None]:
formula = 'rr ~ Block'                                                     
block_model = smf.ols(formula=formula, data=df).fit()                                
block_table = anova_lm(block_model)                                                    
print(block_table)
anova_lm(block_model, add_model)

We conclude there is strong evidence for the treatment effect ($F_{5,83} = 54.09$, $p < 0.0001$)

## Answers to specific questions of interest using linear combinations

### Table of averages of log percentage of seaweed regeneration ratio with different grazer combinations in eight blocks

In [None]:
pd.options.display.float_format = '{:.2f}'.format
dfp = (
    df.groupby(['Block', 'Treat'])['rr']
      .mean()
      .reset_index()
      .pivot_table(index='Block', columns='Treat', values='rr', margins=True, margins_name='average')
)

# subtract off the overall mean from the block/treat column/row to get the block/treat effects
dfp['block effect'] = dfp['average'] - df['rr'].mean()
dfp.loc['treat effect', :] = dfp.loc['average', :] - df['rr'].mean()
dfp

### Do large fish have an effect on the regeneration ratio?
The difference between means from $fF$ and $f$ treatments measures this effect in the presence of small fish only; the difference between means from the $LfF$ and $Lf$ treatments measures the effect in the presence of both small fish and limpets.  The large fish effect is taken to be the average of those two effects:  $\gamma_1 = \frac{1}{2}(\mu_{fF} - \mu{f}) + \frac{1}{2}(\mu_{LfF} - \mu_{Lf})$.  This effect averages over different limpet conditions, so it measures a meaningful effect only if there is no limpet-by-big-fish interaction.

In [None]:
gs1 = pd.Series(index=['fF', 'f', 'LfF', 'Lf'], data=np.array([1, -1, 1, -1])) * 1 / 2

### Do small fish have an effect on the regeneration ratio?
This is investigated through the average of the difference between the $f$ and the $C$ treatment means and the $Lf$ and $L$ treatment means:  $\gamma_2 = \frac{1}{2} (\mu_f - \mu_C) + \frac{1}{2}(\mu_{Lf} - \mu_{L})$.

In [None]:
gs2 = pd.Series(index=['f', 'CONTROL', 'Lf', 'L'], data=np.array([1, -1, 1, -1])) * 1 / 2

### Do limpets have an effect on the regeneration ratio?
This is investigated through $\gamma_3 = \frac{1}{3}(\mu_L - \mu_C) + \frac{1}{3}(\mu_{Lf} - \mu_{f}) + \frac{1}{3} (\mu_{LfF} - \mu_{fF})$

In [None]:
gs3 = pd.Series(index=['L', 'CONTROL', 'Lf', 'f', 'LfF', 'fF'], data = [1, -1, 1, -1, 1, -1]) * 1 / 3

### Do limpets have a different effect when small fish are present than when small fish are not present?
When small fish are present, the limpet effect is given by $\frac{1}{2} (\mu_{Lf} - \mu_{f}) + \frac{1}{2} (\mu_{LfF} - \mu_{fF})$.  When small fish are not present, the limpet effect is $(\mu_{L} - \mu_{C})$.  The difference in effects is then $\gamma_4 = \frac{1}{2} (\mu_{Lf} - \mu_{f}) + \frac{1}{2} (\mu_{LfF} - \mu_{fF}) - (\mu_{L} - \mu_{C})$.

In [None]:
gs4 = pd.Series(index=['Lf', 'f', 'LfF', 'fF', 'L', 'CONTROL'], data=[1/2, -1/2, 1/2, -1/2, -1, 1])

### Do limpets have a different effect when large fish are present than when large fish are absent?
This is investigated through $(\mu_{LfF} - \mu_{fF}) - (\mu_{Lf} - \mu_f)$.

In [None]:
gs5 = pd.Series(index=['LfF', 'fF', 'Lf', 'f'], data=[1, -1, -1, 1])

dfC = pd.DataFrame([gs1, gs2, gs3, gs4, gs5]).fillna(0)
dfC

$SE(g) = s_p \sqrt{\sum_{i = 1}^{I} \frac{C_{i}^2}{n_i}}$

In [None]:
sp = np.sqrt(add_table.loc['Residual', 'mean_sq'])
I = len(df['Treat'].unique())
dof_sp = len(df) - I

dfn = 1 / df.groupby('Treat').size()
dfn

In [None]:
index = ['large fish', 'small fish', 'limpets', 'limpet diff with small fish', 'limpet diff with large fish']
se_g = pd.Series(sp * np.sqrt(dfC ** 2 @ dfn))
se_g.index = index
se_g

In [None]:
estimate = dfC @ df.groupby('Treat').mean(numeric_only=True)['rr']
estimate.index = index

t = (estimate / se_g)
p = (1 - tdist.cdf(np.abs(t), dof_sp)) * 2

estimate.index = index
t.index = index
p_series = pd.Series(p, index=index)

intervals = np.array([
    tdist.interval(0.95, df=dof_sp, loc=estimate['large fish'], scale=se_g['large fish']),
    tdist.interval(0.95, df=dof_sp, loc=estimate['small fish'], scale=se_g['small fish']),
    tdist.interval(0.95, df=dof_sp, loc=estimate['limpets'], scale=se_g['limpets'])
])
lower = pd.Series(intervals[:, 0], index=index[:3])
upper = pd.Series(intervals[:, 1], index=index[:3])

data = {
    'estimate': estimate,
    't': t,
    'p-value': p_series,
    '95% CI - L': lower,
    '95% CI - U': upper,
}
tstat = pd.DataFrame(data)

pd.options.display.float_format = '{:.4f}'.format
tstat

<a id="treatment_interactions"></a>

There is only very weak evidence of a limpet-by-large-fish interaction ($t_{90} = -0.71$, $p = 0.48$), and there is even weaker evidence of a limpet-by-small-fish interaction ($t_{90} = 0.37$, $p = 0.71$).  The effects of the individual grazers can now be cleanly considered.

# Statistical Conclusions

There is [weak evidence](#interaction_effects) that treatment differences change across blocks ($F_{35,48} = 1.437$, $p = 0.12$).  Limpets cause the [largest reduction](#treatment_effects) in the regeneration ratio ($t_{90} = -14.96$, $p < 0.0001$), but there is also strong evidence for reductions caused by both large and small fish.  The limpet effect is strongest with the median regeneration ratio estimated to be $e^{-1.8288} = 0.161$, 95% confidence interval $(e^{-2.0717}, e^{-1.5860})$ = $(0.126, 0.205)$.  In other words, the median regeneration ratio in the presence of limpets was estimated to be only 0.161 as large as the regeneration ratio with limpets were excluded.  The median regeneration ratio of the presence of large fish was estimated to be $e^{-0.6140} = 0.541$, 95% confidence interval $(e^{-0.9115}, e^{-0.3166}) = (0.40, 0.73)$.  The median regeneration ratio of the presence of small fish was estimated to be $e^{-0.3933} = 0.67$, 95% confidence interval $(e^{-0.6907}, e^{0.0958}) = (0.50, 0.91)$.