# MC Stat Expected
Matt Kretchmar  
September 2022

This notebook contains code to analyze historical data in XC results in Ohio from 2017 through 2021 seasons.  The primary objective here is to tabulate the **EXPECTED** cell counts for each category in the table below.   

We organize the data into the following for each division separately:

| | Smallest 25% | Middle 25% | Biggest 25% |
|-|-|-|-|
| Districts | a | b | c |
| Regionals | d | e | f | 
| States | g | h | i | 


We draw from 30 different datasets computed from the notebook "MC_FREQ_ANALYSIS".  This other notebook generates 30 data files corresponding to:
- Two genders: (boys and girls)
- Three divisions: (I, II, and III)
- Five years (2017-2021)


There is enough statistical similarity from year to year, so we were able to combine and average the data across the five years.  This generates **SIX** data sets corresponding to the two genders and three divisions (which are not similar enough to combine).  

This notebook produces the EXPECTED cell counts for these six different experiments/datasets.  




In [2]:
import numpy as np
import random
import matplotlib.pyplot as plt
import pandas as pd

In [70]:
def readCombData ():
    '''
    This method reads and combines all txt data in a single dataframe
    '''
    for division in [1,2,3]:
        for gender in ['B','G']:
            grid = [[0, 0, 0] for i in range(3)]
            num = 0
            for year in [2017,2018,2019,2020,2021]:
                filename = 'D{0}_{1}_{2}.txt'.format(division,gender,year)
                
                head = ['ID','POP']
                for i in range(1,23):
                    head.append(str(i))

                df = pd.read_csv(filename,sep='\s+',index_col=False,header=0,names=head)
                df = df.sort_values(by=['POP'])
                #print(df)
                
                n = len(df)
                cut1 = int(n / 4)
                cut2 = n-cut1

                df_small = df.iloc[:cut1]
                df_big = df.iloc[cut2:]
                df_middle = df.iloc[cut1:cut2]
                
                small_states = df_small.iloc[:,2:22].sum().sum()
                small_regionals = df_small.iloc[:,22].sum()
                small_districts = df_small.iloc[:,23].sum()
                
                middle_states = df_middle.iloc[:,2:22].sum().sum()
                middle_regionals = df_middle.iloc[:,22].sum()
                middle_districts = df_middle.iloc[:,23].sum()
                
                big_states = df_big.iloc[:,2:22].sum().sum()
                big_regionals = df_big.iloc[:,22].sum()
                big_districts = df_big.iloc[:,23].sum()
                
                total = small_states + small_regionals + small_districts
                total += middle_states + middle_regionals + middle_districts
                total += big_states + big_regionals + big_districts
                
                grid[0][0] += small_districts/total;
                grid[0][1] += middle_districts/total;
                grid[0][2] += big_districts/total;
                grid[1][0] += small_regionals/total;
                grid[1][1] += middle_regionals/total;
                grid[1][2] += big_regionals/total;
                grid[2][0] += small_states/total;
                grid[2][1] += middle_states/total;
                grid[2][2] += big_states/total;
                
                num += n

            num = num // 5
            for i in range(3):
                for j in range(3):
                    grid[i][j] = grid[i][j] / 5
            print('\n\n********* DIV {0} GENDER {1}  NUM {2}\n'.format(division,gender,num))
            print('Data as Proportion')
            print('{0:5.3f}  {1:5.3f}  {2:5.3f}'.format(grid[0][0],grid[0][1],grid[0][2]))
            print('{0:5.3f}  {1:5.3f}  {2:5.3f}'.format(grid[1][0],grid[1][1],grid[1][2]))
            print('{0:5.3f}  {1:5.3f}  {2:5.3f}'.format(grid[2][0],grid[2][1],grid[2][2]))

            for i in range(3):
                for j in range(3):
                    grid[i][j] = grid[i][j] * num * 5
            print('\nData as Counts')
            print('{0:5.1f}  {1:5.1f}  {2:5.1f}'.format(grid[0][0],grid[0][1],grid[0][2]))
            print('{0:5.1f}  {1:5.1f}  {2:5.1f}'.format(grid[1][0],grid[1][1],grid[1][2]))
            print('{0:5.1f}  {1:5.1f}  {2:5.1f}'.format(grid[2][0],grid[2][1],grid[2][2]))
            
                


In [71]:
readCombData()



********* DIV 1 GENDER B  NUM 194

Data as Proportion
0.197  0.354  0.141
0.040  0.105  0.060
0.010  0.046  0.047

Data as Counts
191.3  343.5  136.5
 38.9  101.6   58.6
 10.0   44.5   45.2


********* DIV 1 GENDER G  NUM 165

Data as Proportion
0.188  0.324  0.126
0.048  0.125  0.069
0.012  0.055  0.053

Data as Counts
155.0  267.6  103.6
 39.2  102.8   57.0
 10.2   45.7   43.9


********* DIV 2 GENDER B  NUM 193

Data as Proportion
0.190  0.349  0.152
0.045  0.104  0.057
0.015  0.049  0.040

Data as Counts
182.9  337.0  146.2
 43.0  100.7   55.5
 14.1   47.3   38.3


********* DIV 2 GENDER G  NUM 169

Data as Proportion
0.182  0.325  0.139
0.051  0.120  0.065
0.017  0.057  0.045

Data as Counts
153.5  275.0  117.2
 42.9  101.2   55.2
 14.0   47.8   38.1


********* DIV 3 GENDER B  NUM 190

Data as Proportion
0.198  0.343  0.144
0.040  0.110  0.060
0.010  0.051  0.044

Data as Counts
188.1  325.6  136.7
 38.1  104.1   57.4
  9.6   48.7   41.7


********* DIV 3 GENDER G  NUM 162

Dat