# Island Area and Number of Species

If $S$ is the number of species and $A$ is the area, then $S = CA^{\gamma}$, where $C$ is a constant and $\gamma$ is a biologically meaningful parameter.  What is the best estimate of $\gamma$?

If $S = CA^{\gamma}$, then $\log(S) = \gamma \log(CA) = \gamma \log(C) + \gamma \log(A)$.

In [None]:
# standard library imports

# 3rd party library imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats
import seaborn as sns
import statsmodels.formula.api as smf
import statsmodels.api as sm

sns.set()

df = pd.read_csv('case0801.csv')

# Inspection and Graphical Summary

In [None]:
df.head()

In [None]:
df.describe()

In [None]:
fig, axes = plt.subplots(ncols=2)
sns.scatterplot(data=df, x='Area', y='Species', ax=axes[0])
sns.scatterplot(data=df, x='Area', y='Species', ax=axes[1])
axes[1].set(xscale='log', yscale='log')
axes[0].set_box_aspect(1)
axes[1].set_box_aspect(1)
fig.set_figwidth(12)

There isn't a lot of data to work with here, but a log-log transformation works well (and matches up with the mathematical model).

# Model

In [None]:
model = smf.ols('np.log(Species) ~ np.log(Area)', data=df).fit()
model.summary()

$\mu\{\log(Species)|\log(Area)\} = 1.9365 + 0.2497 \log(Area)$

In [None]:
fig, ax = plt.subplots()
sns.scatterplot(data=df, x='Area', y='Species', color='black', ax=ax)
ax.set(xscale='log', yscale='log')

# unfortunately, we can't use seaborn for the regression plot
# also unfortunately, statsmodels will transform the first part of the prediction,
# but not the 2nd
sf = model.get_prediction(df).summary_frame()
sf['Area'] = df['Area']

p = sns.color_palette()
sns.lineplot(data=sf, x='Area', y=np.exp(sf['mean']), ax=ax)

x = sf['Area']
y1 = np.exp(sf['mean_ci_lower'])
y2 = np.exp(sf['mean_ci_upper'])
_ = ax.fill_between(x, y1, y2, alpha=0.2, color=p[0])

We estimate the value of $\gamma$ to be 0.2497.  We are 95% confident the true value lies between 0.219 and 0.281.  $\gamma$ may be interpreted in terms of doubling area.  For each doubling of island area, the median number of species increases by a factor of $2^{0.2497} = 1.189$ or approximately 19%.

## Robustness of Assumptions
### Normality

In [None]:
sm.graphics.qqplot(np.log(df['Species']), line='45', fit=True)
plt.show()

Again, not a lot of data, but normality looks ok on the log scale.