# Effects of Agent Orange on Troops in Vietnam

In [None]:
# standard library imports
import pathlib
import functools
import warnings

warnings.simplefilter(action='ignore', category=FutureWarning)

# 3rd party library imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pingouin as pg
import scipy.stats
import seaborn as sns

pd.options.display.width = 128
sns.set()

We begin by reading the data and summarizing the variables.

In [None]:
df = pd.read_csv('case0302.csv')
df.groupby('Veteran').describe()                                    

In [None]:
fig, axes = plt.subplots(ncols=2, figsize=[12.8, 4.8])
_ = sns.boxplot(data=df, x='Veteran', y='Dioxin', ax=axes[0]) 
_ = sns.kdeplot(data=df, x='Dioxin', hue='Veteran', ax=axes[1])
axes[0].set_ylabel('Dioxin (parts per trillion)')
fig.tight_layout()

The data is skewed right, but applying the logarithmic transformation is problematic due to the presense of zero in both groups.  The density plot does not cry out for a transformation.

$$
\begin{align*}
& H_0: \mu_{veteran} = \mu_{other} \\
& H_a: \mu_{veteran} \gt \mu_{other} \\
\end{align*}
$$

In [None]:
x = df.loc[df.Veteran == 'Vietnam', 'Dioxin']                            
y = df.loc[df.Veteran == 'Other', 'Dioxin']                          
stats = pg.ttest(x, y, alternative='greater') 
print(stats)

There is weak evidence that the mean dioxin level in surviving combat troops in Vietnam is greater than that for non-Vietnam veterans ($t_{0.95, 136.96}$ = 0.291, $p$-value = 0.38567).  We are 95% confident that the true mean value of dioxin levels in Vietnam veterans is no less than 0.39 parts per trillion less than non-Vietnam veterans.

These results differ slight from the book due to application of Welch's T-test, which is by default in *pg.ttest* when sample sizes are different.