Here, an experiment was run comparing the performance of 5 different AI agents across 60 trials. However, due to an oversight most of the data was lost - save for the means and variances of each sample. Based on this limited data, I attempted to carry out a one-way ANOVA through manual calculation. There are at least two conceivable ways to run the ANOVA. The first, listed in code below, is to reconstruct a surrogate data-set retaining the summary statistics of the original data for an ANOVA. The second is to carry out the ANOVA through the grand mean, for which the napkin maths is listed at the bottom of this notebook.

In [1]:
import numpy as np
import math, csv
import pandas as pd 
from scipy import stats

In [2]:
ss_mean_table = [75.17, 99.5, 96.75, 91.08, 72.58]
ss_variance_table = [6884.72, 5015.85, 3610.87, 5373.81, 2992.79]
surrogate_data = []
n = 60
k = 5
N = n * k

The idea here is to reconstruct some surrogate data-set based on the limited statistics available. The mean and variances of our original data are sufficient statistics for an ANOVA. This is accomplised through the following:

$$y_i = \overline{x} + \frac{s}{\sqrt{n}}, i = 1, 2, ...., n - 1$$

and 

$$y_n = n\overline{x} - (n - 1)y_1$$

In [3]:
count = 0
for mean, variance in zip(ss_mean_table, ss_variance_table):
    count += 1
    reconstructed_sample = []
    for i in range(0, 58):
        y_i = mean + math.sqrt(variance) / math.sqrt(n)
        reconstructed_sample.append([y_i, count])
    y_n = n*mean - (n - 1)*reconstructed_sample[0][0]
    reconstructed_sample.append([y_n, count])
    surrogate_data.append(reconstructed_sample)

We can now re-organise this data into a dataframe and begin our analysis in python. 

In [4]:
with open('output.csv', 'w') as f:
    writer = csv.writer(f)
    for i in surrogate_data:
        writer.writerows(i)
data = pd.read_csv('output.csv', header=None)
data.columns=['score', 'sample']
print(data)

          score  sample
0     85.881925       1
1     85.881925       1
2     85.881925       1
3     85.881925       1
4     85.881925       1
5     85.881925       1
6     85.881925       1
7     85.881925       1
8     85.881925       1
9     85.881925       1
10    85.881925       1
11    85.881925       1
12    85.881925       1
13    85.881925       1
14    85.881925       1
15    85.881925       1
16    85.881925       1
17    85.881925       1
18    85.881925       1
19    85.881925       1
20    85.881925       1
21    85.881925       1
22    85.881925       1
23    85.881925       1
24    85.881925       1
25    85.881925       1
26    85.881925       1
27    85.881925       1
28    85.881925       1
29    85.881925       1
..          ...     ...
265   79.642566       5
266   79.642566       5
267   79.642566       5
268   79.642566       5
269   79.642566       5
270   79.642566       5
271   79.642566       5
272   79.642566       5
273   79.642566       5
274   79.642566 

In [5]:
sum_n = data['score'].sum()
sum_n_sq = (sum_n)**2
corection_factor = sum_n_sq / N
print('correction factor = ' + str(corection_factor))

correction factor = 2188901.320023551


In [6]:
sum_list = []
for i in data['score']:
    sum_list.append(i**2)
ss_total = np.sum(sum_list) - corection_factor
print('ss total = ' + str(ss_total))

ss total = 1481836.0060398462


In [7]:
groupvar_toadd = []
summed_groupby = data.groupby(['sample']).sum()
for index, row in summed_groupby.iterrows():
    grouped_sq = row.values[0]**2
    grouped_div = grouped_sq/n
    groupvar_toadd.append(grouped_div)
ss_group = np.sum(groupvar_toadd) - corection_factor
print('ss group = ' + str(ss_group))

ss group = 35730.730605458375


In [8]:
ss_error = ss_total - ss_group
print('ss error = ' + str(ss_error))

ss error = 1446105.2754343878


In [9]:
ms_group = ss_group / (k - 1)
print('ms group = ' + str(ms_group))

ms group = 8932.682651364594


In [10]:
ms_error = ss_error / (N - k)
print('ms error = ' + str(ms_error))

ms error = 4902.0517811335185


In [11]:
variance_ratio = ms_group / ms_error
print('variance ratio = ' + str(variance_ratio))

variance ratio = 1.8222334341191024


This variance ratio is our Fisher statistic. I am pretty sure that it is in fact incorrect. At bare minimum it is slightly divergent from the original F statistic, due to the imperfect reconstruction of the data. At worst, there is an error in my manual calculation. 