# Alcohol Metabolism in Men and Women - An observational Study
Women exhibit a lower tolerance for alcohol and develop alcohol-related liver disease more readily than men.  When men and women of the same size and drinking history consume equal amounts of alcohol, the women on average carry a higher concentration of alcohol in their bloodstream.  According to a team of Italian researchers, this occurs because alcohol-degrading enzymes in the stomach (where alcohol is partially metabolized before it enters the bloodstream and is eventually metabolized by the liver) are more active in men that in women.  The researchers studies the extent to which the activity of the enzyme explained the first-pass alcohol metabolism and the extent to which is explained the first-pass metabolism between women and men.  (Data from M. Frezza et al., "High Blood Alcohol Levels in Women," *New England Journal of Medicine* 322 (1990): 95-99.)

The subjects were 18 men and 14 women, all living in Trieste.  Three of the women and five of the men were categorized as alcoholic.  All subjects received ethanol, at a dose of 0.3 grams per kilogram of body weight, orally one day and intravenously another, in randomly determined order.  Since the intravenous administration bypasses the stomach, the difference in blood alcohol concentration - the concentration after intravenous administration minus the concentration after oral administration - provides a measure of "first-pass metabolism" in the stomach.  In addition, gastric alcohol dehydrogenase (AD) activity (activity of the key enzyme) was measured in mucus samples taken from the stomach linings.

Several questions arise.  Do levels of first-pass metabolism differ between men and women?  Can the differences be explained by postulating that men have more dehydrogenase activity in their stomachs?  Are the answers to these questions complicated by an alcoholism effect?

In [None]:
# standard library imports

# 3rd party library imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats
import seaborn as sns
import statsmodels.formula.api as smf
import statsmodels.api as sm
from statsmodels.stats.anova import anova_lm

plt.rcParams['text.usetex'] = True
pd.options.display.float_format = "{:.2f}".format
sns.set()

In [None]:
df = pd.read_csv('case1101.csv')
df.head()

In [None]:
df.groupby(['Sex', 'Alcohol'])['Metabol'].std()   

There is weak evidence for the hypothesis of equal variance.

In [None]:
g = sns.relplot(data=df, x='Gastric', y='Metabol', col='Sex', hue='Alcohol', kind='scatter')
_ = g.set_xlabels(r'Gastric AD Activity ($\mu$mol/min/g of tissue)')
_ = g.set_ylabels(r'First-Pass Metabolism (mmol/liter-hour)')

## Tentative Model

In [None]:
formula = (
    'Metabol '
    '~ Gastric * C(Sex, Treatment(reference="Male")) * C(Alcohol, Treatment(reference="Non-alcoholic"))'
)
model = smf.ols(formula, data=df)
lm1 = model.fit()
lm1.summary()

## Model Diagnostics

In [None]:
ax = sns.scatterplot(x=lm1.fittedvalues, y=lm1.resid)
ax.set_xlabel('Fitted Values')
ax.set_ylabel('Residuals')
_  = ax.set_title('Fitted vs. Residual plot from the regression of first pass metabolism')

In [None]:
lm1.fittedvalues[lm1.fittedvalues > 6]

Possibly influential values are revealed to be subjects 31 and 32, likely due to high studentized residuals, noteably high gastric AD.  Refit the model to exclude those points.

In [None]:
results1 = smf.ols(formula=formula, data=df).fit()
results2 = smf.ols(formula=formula, data=df.query('Subject < 31')).fit()
df1 = pd.DataFrame({'estimate': results1.params, 'SE': results1.bse, 'p': results1.pvalues})
df2 = pd.DataFrame({'estimate': results2.params, 'SE': results2.bse, 'p': results2.pvalues})

cidx1 = pd.MultiIndex.from_product([('All 32',), df1.columns])
cidx2 = pd.MultiIndex.from_product([('Cases 31 and 32 removed',), df2.columns])
df1.columns = cidx1
df2.columns = cidx2
df1.join(df2)

The p-value for the Gastric / Sex interaction changes from indicating strong evidence to weak evidence.  The two excluded points have extreme gastric AD explanatory values, so the book argues to exclude them.  I would be curious if there weren't a physiological explanation for it.

In [None]:
infl = lm1.get_influence().summary_frame()[['cooks_d', 'student_resid', 'hat_diag']]
idxc = infl['cooks_d'] > 0.5
idxs = infl['student_resid'].abs() > 2
idxl = infl['hat_diag'] > 2 * len(lm1.params) / lm1.nobs
infl.loc[idxc | idxs | idxl, :]

Only observation 31 is extreme in all three of Cook's Distance, studentized residuals, and leverage.  Of all the others, observation 32 has a very high studentized residual, so those two observations will be removed.

In [None]:
df = df.query('Subject < 31')

## Model Refinement

Alcoholism is not a primary concern, so perform an extra-sum-of-squares F-test to see if terms involving alcoholism can be dropped.

In [None]:
model = smf.ols(formula, data=df)
lm1 = model.fit()
formula = 'Metabol ~ Gastric * C(Sex, Treatment(reference="Male"))'
model_sans_alcohol = smf.ols(formula, data=df)
lm2 = model_sans_alcohol.fit()
anova_lm(lm2, lm1)

There is little evidence of a lack of fit to the reduced model, so the alcohol terms will not be kept.  A zero-intercept makes logical sense, so force $\beta_0$ to be zero and drop the female indicator term.

In [None]:
formula = 'Metabol ~ Gastric + Gastric : C(Sex, Treatment(reference="Male")) - 1'
model = smf.ols(formula, data=df)
lm3 = model.fit()
anova_lm(lm3, lm2)

Again, the $F$-test shows that this is justified, so here we have our final model.  

In [None]:
lm3.summary()

$\mu\{metabolism|gast,fem\} = \beta_1 gast + \beta_2 gast \times fem$

Note that for any level of gastric AD activity, the mean first pass metabolism for men exceeds that of women by the ratio of $\frac{\beta_1}{\beta_1 + \beta_2} = 2.203$.

# Statistical Conclusion
The following inferences pertain only to individuals with gastric AD activity levels between 0.8 and 3.0 $\mu$mol/min/g.  No reliable model could be determined for values greater than 3.0  There was no evidence from the data that alcoholism was related to first-pass metabolism in any way ($p$-value = 0.93, from an F-test for significance of alcoholism and its interaction with gastric activity and sex.)  Convincing evidence exists that first-pass metabolism was larger for males than for females overall (two-sided $p$-value = 0.0002, from a rank-sum test) and that gastric AD activity was larger for males than for females (two-sided $p$-value = 0.07 from a rank-sum test).  Males had higher first-pass metabolism than females even accounting for differences in gastric AD activity (two-sided $p$-value = 0.0003 from a $t$-test for equality of male and female slopes when both intercepts are zero).  For a given level of gastric dehydrogenase activity, the mean first-pass alcohol metabolism for men is estimated to be 2.20 times as large as the first-pass alcohol metabolism for women (approximate 95% confidence interval from 1.37 to 3.04).