# MC Frequency SQL
This notebook contains code to perform the "expected distribution" calculations as part of the Chi-Square stats test procedure.  This notebook is Part 1 of the analysis.  The remaining calculations are in Part 2 *MC_FREQ_ANALYSIS*.  

The goal of this notebook is to query schools that participated in post season competition from the Database and then store those results in 30 separate csv files.  These files are read by Part 2.  

This notebook performs the sql queries to obtain the data and store it in pandas dataframes.  Each data piece is queried separately and stored in a separate filename.  

There should be data for:
- Years: 2017, 2018, 2019, 2020, 2021
- Genders: "B" and "G"
- Divisions: 1, 2 and 3

for a total of 30 different queries and filenames.  


In [1]:
import pandas as pd
import sqlalchemy as sa
import numpy as np
import math
import random

### Create Database Connection
Here we connect to our database engine.  For anyone wanting to reproduce this portion of the work, you will need to download the database and install it on your own mysql server.   Then you will need to alter the IP address name to connect to your own engine.

Put your login credentials in the `creds.txt` file.  

In [2]:
# read credentials
fileHD = open('creds.txt')
creds = fileHD.read()
fileHD.close()
userid,password = creds.split()

# make connection
pattern = "{}://{}:{}@{}/{}"
protocol = 'mysql+mysqlconnector'
server = "jakku.cs.denison.edu"
database = 'OH_XC'
cstring = pattern.format(protocol,userid,password,server,database)
#print(cstring)

# connect to DB engine
engine = sa.create_engine(cstring)
connection = engine.connect()

### Select Statement
Get the correct data from the database.  
We query 30 different times for results from
- years: 2017, 2018, 2019, 2020, and 2021
- genders: boys and girls
- divisions: I, II and III

Each of these queries produces results for DISTRICT, REGIONAL and STATE meets along with team names, team IDs and team school populations.



In [3]:
for year in ['2017','2018','2019','2020','2021']:
    for gender in ['B','G']:
        for div in [1,2,3]:

            #select S.NAME, XC.Year, XC.GENDER, XC.TYPE, XC.PLACE, S.POPULATION from XC_Results as XC 
            #inner join School as S on XC.BLDG_IRN=S.BLDG_IRN 
            #where XC.YEAR=2017 and XC.DIVISION = 1 and XC.Gender='B' ordeR BY S.POPULATION, S.NAME, XC.TYPE;


            q1 = "SELECT S.BLDG_IRN, S.NAME, XC.YEAR, XC.DIVISION, XC.GENDER, XC.TYPE, XC.PLACE, S.POPULATION " 
            q2 = "FROM School AS S " 
            q3 = "JOIN XC_Results AS XC USING(BLDG_IRN) " 
            q4 = "WHERE XC.YEAR={0} and XC.GENDER='{1}' and XC.DIVISION={2} ".format(year,gender,div)
            q5 = "ORDER BY S.POPULATION, S.NAME, XC.TYPE; "

            query = q1+q2+q3+q4+q5
            print(query)
            resultproxy = connection.execute(query)
            df = pd.DataFrame(resultproxy.fetchall(),columns=resultproxy.keys())
            #print(df)
            
            filename = 'D{0}_{1}_{2}.csv'.format(div,gender,year)
            df.to_csv(filename)            


SELECT S.BLDG_IRN, S.NAME, XC.YEAR, XC.DIVISION, XC.GENDER, XC.TYPE, XC.PLACE, S.POPULATION FROM School AS S JOIN XC_Results AS XC USING(BLDG_IRN) WHERE XC.YEAR=2017 and XC.GENDER='B' and XC.DIVISION=1 ORDER BY S.POPULATION, S.NAME, XC.TYPE; 
SELECT S.BLDG_IRN, S.NAME, XC.YEAR, XC.DIVISION, XC.GENDER, XC.TYPE, XC.PLACE, S.POPULATION FROM School AS S JOIN XC_Results AS XC USING(BLDG_IRN) WHERE XC.YEAR=2017 and XC.GENDER='B' and XC.DIVISION=2 ORDER BY S.POPULATION, S.NAME, XC.TYPE; 
SELECT S.BLDG_IRN, S.NAME, XC.YEAR, XC.DIVISION, XC.GENDER, XC.TYPE, XC.PLACE, S.POPULATION FROM School AS S JOIN XC_Results AS XC USING(BLDG_IRN) WHERE XC.YEAR=2017 and XC.GENDER='B' and XC.DIVISION=3 ORDER BY S.POPULATION, S.NAME, XC.TYPE; 
SELECT S.BLDG_IRN, S.NAME, XC.YEAR, XC.DIVISION, XC.GENDER, XC.TYPE, XC.PLACE, S.POPULATION FROM School AS S JOIN XC_Results AS XC USING(BLDG_IRN) WHERE XC.YEAR=2017 and XC.GENDER='G' and XC.DIVISION=1 ORDER BY S.POPULATION, S.NAME, XC.TYPE; 
SELECT S.BLDG_IRN, S.NAME, X

In [4]:
connection.close()
del engine
