In [None]:
# standard library imports
import pathlib
import warnings

warnings.simplefilter('ignore', category=FutureWarning)

# 3rd party library imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pingouin as pg
from scipy.stats import f
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf

sns.set()
pd.options.display.float_format = "{:.1f}".format

We begin by reading the data and summarizing the variables.

In [None]:
df = pd.read_csv('case0502.csv')
print(df.head())
df.groupby('Judge').describe()

<ol type="a">
    <li>Is there evidence that women are underrepresented on the Spock judge’s venires compared to the venires of the other judges?</li>
    <li>Is there any evidence that there are differences in women’s representation  in the venires of the other six judges?</li>
</ol>

In [None]:
fig, axes = plt.subplots(ncols=2, figsize=[12, 6.4])
_ = sns.boxplot(data=df, x='Judge', y='Percent', ax=axes[0])
_ = sns.kdeplot(data=df, x='Percent', hue='Judge', ax=axes[1])

In [None]:
sm.qqplot(data=df['Percent'], line='45', fit=True)

There is weak evidence against independence.  There appears to be significant evidence against equal variance.  Outliers do not seem to be an issue, and only the percentages for Judge E might be construed to be skewed somewhat.  There is weak evidence against normality.

## Is any judge different?

$
\begin{align}
H_0: \: &\mu_i = \mu_j \: \text{for} \: i, j \in \{A, B, C, D, E, F, Spock\} \: \text{(reduced model)} \\
H_a: \: &\text{at least one} \: \mu_i \ne \mu_j \: \text{for} \: i, j \in \{A, B, C, D, E, F, Spock\} \: \text{(full model)} \\
\end{align}
$

In [None]:
pd.options.display.float_format = "{:.4f}".format
model = smf.ols('Percent ~ Judge', data=df).fit()
adf = sm.stats.anova_lm(model)
print(adf)

There is strong evidence that at least one judge's percentages are different ($F_{6,39}$ = 6.7184, $p$-value = 0.0001).

We can manually run the test as well.

In [None]:
ssr1, sse1 = adf['sum_sq']
dof_ssr1, dof_sse1 = adf['df']
dof_sst1, sst1 = adf['df'].sum(), adf['sum_sq'].sum()

fstat = (ssr1 / dof_ssr1) / (sse1 / dof_sse1)
print(fstat, 1 - f.cdf(fstat, dof_ssr1, dof_sse1))

## Are judges A-F different?


<table>
    <tr>
        <th>Group</th><td>Spock</td><td>A</td><td>B</td><td>C</td><td>D</td><td>E</td><td>F</td>
    </tr>
    <tr>
      <th>Full Model</th><td>$\mu_1$</td><td>$\mu_2$</td><td>$\mu_3$</td><td>$\mu_4$</td><td>$\mu_5$</td><td>$\mu_6$</td><td>$\mu_7$</td></tr>
        <tr>
      <th>Reduced Model ($H_0$)</th><td>$\mu_1$</td><td>$\mu_0$</td><td>$\mu_0$</td><td>$\mu_0$</td><td>$\mu_0$</td><td>$\mu_0$</td><td>$\mu_0$</td></tr>
</table>

The necessary data values for this hypothesis test are not directly produced by **sm.stats.anova_lm** and **smf.ols**.  The sum of squares error (SSE) for the full model is provided above.  The total sum of squares (SST) for the reduced model can be obtained with a 2nd one-way analysis with just two groups, **Spock** and **Other**, but the SST that we are looking for is actually the SSE in the 2nd table.

In [None]:
df['Judge2'] = df['Judge']
df.loc[df['Judge'] != "Spock's", 'Judge2'] = 'Other'
model = smf.ols('Percent ~ C(Judge2, Treatment)', data=df).fit()
adf2 = sm.stats.anova_lm(model)
print(adf2)

In [None]:
sst2 = adf2.loc['Residual', 'sum_sq']
dof_sst2 = adf2.loc['Residual', 'df']

The extra-sum-of-squares test can now be performed for the question at hand, are judges A-F different?

In [None]:
extra_ss = (sst2 - sse1)
extra_dof = (dof_sst2 - dof_sse1)
fstat =  (extra_ss / extra_dof) / (sse1 / dof_sse1)
print(fstat, extra_dof, dof_sse1, (1 - f.cdf(fstat, extra_dof, dof_sse1)))

There is only weak evidence that judges A-f differ from each other ($F_{5,39}$ = 1.37, $p$-value = 0.2582).

## Is Spock's judge different from the others?

<table>
    <tr>
        <th>Group</th><td>Spock</td><td>A</td><td>B</td><td>C</td><td>D</td><td>E</td><td>F</td>
    </tr>
    <tr>
      <th>Full Model</th><td>$\mu_1$</td><td>$\mu_0$</td><td>$\mu_0$</td><td>$\mu_0$</td><td>$\mu_0$</td><td>$\mu_0$</td><td>$\mu_0$</td></tr>
        <tr>
      <th>Reduced Model ($H_0$)</th><td>$\mu$</td><td>$\mu$</td><td>$\mu$</td><td>$\mu$</td><td>$\mu$</td><td>$\mu$</td><td>$\mu$</td></tr>
</table>

This is directly answered above, there is strong evidence that Spock's judge differs from the other judges ($F_{1,44}$ = 32.1454, $p$-value < 0.0001).

# Non-parametric ANOVA

There is no technical need to do a non-parametric analysis since none of the guidelines are violated.  But if it were appropriate, we might do the following.

## Is any judge different?

$
\begin{align}
H_0: \: &\mu_i = \mu_j \: \text{for} \: i, j \in \{A, B, C, D, E, F, Spock\} \: \text{(reduced model)} \\
H_a: \: &\text{at least one} \: \mu_i \ne \mu_j \: \text{for} \: i, j \in \{A, B, C, D, E, F, Spock\} \: \text{(full model)} \\
\end{align}
$

In [None]:
pg.kruskal(data=df, dv='Percent', between='Judge')

There is strong evidence that at least one judge has percentages that differ from the others ($H_6$ = 21.96, $p$-value = 0.0012).