# The Blood-Brain Barrier - A Controlled Experiment

The human brain is protected from bacteria and toxins, which course through the bloodstream, by a single layer of cells called the *blood-brain barrier*.  This barrier normally alllows only a few substances, including some medications, to reach the brain.  Because chemicals used to treat brain cells have such large molecular size, they cannot pass through the barrier to attack tumor cells.  At the Oregon Health Sciences University. Dr. E. A. Newwelt developed a method of disrupting the barrier by infusing a solution of concentrated sugars.

As a test of the disruption mechanism, researchers conducted a study on rats, which possess a similar barrier.  (Data from P. Barnett et al., "Differential Permeability and Quantitative MR Imaging of a Human Lung Carcinoma Brain Xenograft in the Nude Rat," *American Journal of Pathology* 146(2) (1995): 436-49.) . The rats were innoculated with human lung cancer cells to induce brain tumors.  After 9 to 11 days they were infused with either the barrier disruption (BD) solution or, as a control, a normal saline (NS) solution.  Fifteen minutes later, the rats received a standard dose of the therapeutic antibody $L6\mathrm{-}F(ab\prime)_2$.  After a set time they were sacrificed, and the amounts of antibody in the brain tumor and in normal tissue were measured.

Since the amount of the antibody in normal tissue indicates how much of it the rat actually received, a key measure of the effectiveness of transmission across the blood-brain barrier is the ratio of the antibody concentration in the brain tumor to the antibody concentration in normal tissue outside the brain.  The brain tumor concentration divided by the liver concentration is a measure of the amount of the antibody that reached the brain relative to the amount of it that reaeched other parts of the body.  This is the response variable.  The explanatory variables comprise two two categories:  *design variables* are those that describe manipulation by the researcher; *covariates* are those measuring characteristics of the subjects that were not controllable by the researcher.

Was the antibody concentration in the brain tumor increased by the use of the blood-brain barrier disruption infusion?  If so, by how much?  Do the answers to these two questions depend on the length of time after the infusion (from 1/2 to 72 hours)?  What is the effect of treatment on antibody concentration after weight loss, total tumor weight, and the other covariates are accounted for?

In [None]:
# standard library imports

# 3rd party library imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats
import seaborn as sns
import statsmodels.formula.api as smf
import statsmodels.api as sm
from statsmodels.stats.anova import anova_lm

sns.set()
pd.options.display.float_format = "{:.3f}".format
pd.options.display.max_columns = 12

In [None]:
df = pd.read_csv('case1102.csv')
df.head()

In [None]:
df['Concentration'] = df['Brain'] / df['Liver']
df = df.rename(mapper={'Time': 'SAC', 'Treatment': 'TRTMNT'}, axis='columns')

In [None]:
# This plot seems to show an effect due to the treatment.
fig, ax = plt.subplots()
_ = sns.stripplot(data=df, x='SAC', y='Concentration', hue='TRTMNT', log_scale=True, ax=ax)
_ = ax.set_xlabel('Sacrifice Time (Hours)')
_ = ax.set_ylabel('Tumor-to-Liver Concentration Ratio')
handles, labels = ax.get_legend_handles_labels()
_ = ax.legend(handles, ['Barrier Disruption', 'Saline Control'])

In [None]:
# a pairplot of the covariates shows at the very least that concentration should be logged
cols = ['Days', 'Weight', 'Loss', 'Tumor', 'Concentration']
g = sns.pairplot(df[cols])

In [None]:
df['logconc'] = np.log(df['Concentration'])
cols = ['Days', 'Weight', 'Loss', 'Tumor', 'logconc']
g = sns.pairplot(df[cols])

**Days** clearly looks to have an effect, **Weight** (total weight) looks like it has an effect, but the effects of **Loss** and **Tumor** (weight loss and tumor weight) are not clear.  

We choose to fully hydrate the model as follows:
 
$\mu\{\log(Conc) | SAC, TRTMT, Days, FEM, weight, loss, tumor\} = SAC + TREAT + (SAC \times TREAT) + Days + FEM + weight + loss + tumor$

In [None]:
formula = (
    'np.log(Concentration) '
    '~ C(SAC, Treatment(reference=0.5)) '
    '+ C(TRTMNT, Treatment(reference="NS")) '
    '+ C(SAC, Treatment(reference=0.5)) * C(TRTMNT, Treatment(reference="NS")) '
    '+ C(Sex, Treatment(reference="Female")) '
    '+ Days + Weight + Loss + Tumor'
)
model = smf.ols(formula, data=df)
lm1 = model.fit()
lm1.summary()

# Scatterplot of residuals vs fitted values from the fit of logged response on a rich model for explanatory variables

In [None]:
ax = sns.scatterplot(x=lm1.fittedvalues, y=lm1.resid)
_ = ax.set_xlabel('Fitted Values')
_ = ax.set_ylabel('Residuals')

In [None]:
# there seem to be two notable residuals
df[abs(lm1.resid) > 1]

In [None]:
infl = lm1.get_influence().summary_frame()[['cooks_d', 'student_resid', 'hat_diag']]
idxc = infl['cooks_d'] > 0.5
idxs = infl['student_resid'].abs() > 2
idxl = infl['hat_diag'] > 2 * len(lm1.params) / lm1.nobs
infl.loc[idxc | idxs | idxl, :]

Observations 30 and 33 could possibly be considered to be influential, mostly due to large Studentized residuals.

In [None]:
# an influence plot shows the same
fig, ax = plt.subplots()
_ = sm.graphics.influence_plot(lm1, ax=ax)
_ = ax.set_title('')

Drop observations 30 and 33.

In [None]:
df = df.reset_index()
df = df.rename(mapper={'index': 'observation'}, axis='columns')
df = df.query('observation < 30 or (observation > 30 and observation < 33)')

## Refine the model

Can the co-variates variables be dropped?

In [None]:
formula2 = (
    'np.log(Concentration) '
    '~ C(SAC, Treatment(reference=0.5)) '
    '+ C(TRTMNT, Treatment(reference="NS")) '
    '+ C(SAC, Treatment(reference=0.5)) * C(TRTMNT, Treatment(reference="NS")) '
)
lm2 = smf.ols(formula=formula2, data=df).fit()
anova_lm(lm2, lm1)

There is little evidence of loss of fit from dropping those terms.  What about the interaction term?

In [None]:
formula3 = 'np.log(Concentration) ~ C(SAC, Treatment(reference=0.5)) + C(TRTMNT, Treatment(reference="NS"))'
lm3 = smf.ols(formula=formula3, data=df).fit()
anova_lm(lm3, lm2)

Again, there is little evidence of loss of fit.