**Overview of Models**

The general model for Double Poisson supposes that, for a game between team A and team B, the number of runs scored by team A and team B, denoted $A$ and $B$ respectively, can be modeled by the Poisson random variables $A \sim Poisson(\mu_A)$ and $B \sim Poisson(\mu_B)$.

The probability of team A beating team B in a game where ties are allowed is $p(A > B)$

To account for the fact that softball does not allow ties (generally), we assume that teams have the same relative probability of winning in extra innings as they do in regulation, leading to the formula $\frac{p(A > B)}{p(A > B) + p(B > A)}$

The prior implementation defines $\mu_{A} = O_aV_b$ and $\mu_B = O_bV_a$, where, for a team $X$:
- $O_X$ is the "offensive attacking strength" of team X, defined to be the mean number of runs per game scored by team X.
- $V_X$ is the "defensive vulnerability" of team X, defined to be the mean number of runs scored by team X, divided by the maximum mean number of runs scored across all teams in the input dataset so as to fix the maximum $V_X$ value at 1.

From game-by-game data, we propose four new factors affecting $\mu_A$ and $\mu_B$:

$O_{X_S}$ is the **schedule-adjusted offensive strength (SAOS)** of team X, defined to be the mean over all games played by team X in the given season of the value $\frac{R_{X_i}}{V_{o_i}}$, where:

- $R_{X_i}$ is the number of runs scored by team $X$ in game $i$
- $V_{o_i}$ is the defensive vulnerability of the opponent in game $i$
    
$V_{X_S}$ is the **schedule-adjusted defensive vulnerability (SADV)** of team X, defined to be the mean over all games played by team X in a given season of the value $\frac{R_{o_i}}{O_{o_i}}$, where:

- $R_{o_i}$ is the number of runs scored by the opponent of team $X$ in game $i$
- $O_{o_i}$ is the offensive strength of the opponent in game $i$
    
$O_{X_C}$ is the **conference-adjusted offensive strength (CAOS)** of team X, defined to be the mean over all non-conference games played by teams in team X's conference against Division I opponents of the value $\frac{R_{X_i}}{V_{o_i}}$, with the numerator and denominator defined as above.

$V_{X_C}$ is the **conference-adjusted defensive vulnerability (CADV)** of team X, defined to be the mean over all non-conference games played by teams in team X's conference against Division I opponents of the value $\frac{R_{o_i}}{O_{o_i}}$, with the numerator and denominator defined as above.

We suppose that a "truly average" team would score as many runs as its opponent usually gives up and allow as many runs as its opponent would usually score, so the values of the four variables above for this theoretical average team would be 1.

The three modified models using the four new factors above are:

1. Schedule-Adjusted Double Poisson, in which $\mu_A = (O_a)(V_b)(O_{a_S})(V_{b_S})$ and $\mu_B = (O_b)(V_a)(O_{b_S})(V_{a_S})$
    
2. Conference-Adjusted Double Poisson, in which $\mu_A = (O_a)(V_b)(O_{a_C})(V_{b_C})$ and $\mu_B = (O_b)(V_a)(O_{b_C})(V_{a_C})$
3. Conference-and-Schedule-Adjusted Double Poisson, in which $\mu_A = (O_a)(V_b)(O_{a_S})(V_{b_S})(O_{a_C})(V_{b_C})$ and $\mu_B = (O_b)(V_a)(O_{b_S})(V_{a_S})(O_{b_C})(V_{a_C})$

**Overview of Use Case**

Each spring, 64 teams are chosen to advance to the NCAA Division I Softball Championship. 32 of these teams qualify by winning their respective conference's tournament, while 32 are chosen at-large. 

Each team advances to one of sixteen regionals, which is played in double-elimination format at one of sixteen host sites (the host is almost always one of the four teams competing). The initial point of interest is predicting the winner of these sixteen host sites, as well as the finish of teams (regional winner, runner-up, third-place, and winless team). Each regional will be simulated 10,000 times using each double poisson model.

**Overview of Inputs**

The regional round of the 2021-2024 editions of the NCAA Division I Softball Championship will be investigated. The 64 regional sets are stored in the file `regionals.csv` in the current directory.

Game-by-game records are stored in `SB_2024_DPGameByGame.csv` ... `SB_2021_DPGameByGame.csv` in a separate directory.

To identify teams by conference, the TCN (Team-Conference-Number) dataframes used in web scraping will be used. These are stored in `24_TCN.csv` ... `21_TCN.csv` in a separate directory.

**Overview of Outputs**

We would like this to retain both the raw data created from the simulations *and* a high-level output to use in evaluating the models.

The raw data `.csv` output should include, for each simulated regional (640k lines per year), the following fields:

|Identifying Data|Model Results|
|---|---|
|Year|Predicted Regional Winner|
|Regional Host|Predicted Regional Runner-Up|
|Regional Host Seed|Predicted Regional 1-2 Team|
|Model Type ("NONADJUSTED", "SADP", "CADP", or "CSADP")|Predicted Regional 0-2 Team|

The high-level `.csv` output should include, for each simulated regional (32 lines), the following fields:

|Identifying Data|Model Evaluation Results (done *for each model*)|
|---|---|
|Year|Proportion of times *Winner* was correctly predicted|
|Regional Host|Proportion of times *Runner-Up* was correctly predicted|
|Regional Host Seed|Proportion of times *1-2 Team* was correctly predicted|
|Regional Winner|Proportion of times *0-2 Team* was correctly predicted|
|Regional Runner-Up|Proportion of times *all teams' finishes* were correctly predicted|
|Regional 1-2 Team| |
|Regional 0-2 Team| |

**Code Outline**

0. Set Input and Output Filepaths
*Loop over Years (2022, 2023)*
1. Import Data
    - Yearly game-by-game file -> `gbgDF`
    - Yearly team-conference-number file -> `tcnDF`
    - Drop `gbgDF` down to only required values. 
    - Get rid of rows in `gbgDF` that are against non-Division I teams.
    - Use `tcnDF` to add team's conference and opponent conference columns to `gbgDF`.
2. Calculate Required Values (for each team, each year)
    - Offensive Strength/Defensive Vulnerability
        - Construct new DF by grouping by team. Call it `strengthsDF` and rename columns appropriately.
    - Schedule-Adjusted Offensive Strength/Defensive Vulnerability
        - Merge `gbgDF` with `OsDvDF` on `gbgDF.OppUID = OsDvDF.UID`
        - Calculate SAOS and SADV values for each game as new columns.
        - Group by UID to generate new DF of UID, SAOS, SADV values -> `SAOsDvDF`
        - Merge `SAOsDvDF` with `strengthsDF`.
    - Conference-Adjusted Offensive Strength/Defensive Vulnerability
        - Group `gbgDF` by team conference to generate DF of average runs scored/against -> `confStrengths.DF`
        - Scale `confStrengthsDF` to make sure maximum value of DV is 1.
        - Merge `confStrengthsDF` with `gbgDF` on same as above.
        - Calculate CAOS and CADV values for each game as new columns.
        - Group `tcnDF` by conference to create new DF with CAOS, CADV values -> `CAOsDvDF`
        - Merge `CAOsDvDF` with `strengthsDF`.
    - Output `strengthsDF` as CSV/XLSX.
3. Make Predictions
    - *Loop over regionals in given year:*
        - Build `regStrengthDF` with all teams in regionals' strengths.
        - Traditional Double-Poisson:
            - Build `tradProbDF` with probability of winning/losing under traditional two-input Double Poisson model.
            - Simulate tournament 10,000 times. Within each, add 

**Code**

In [101]:
display(gbgDF)
display(tcnDF)
display(regionalDF)
display(strengthsDF)

Unnamed: 0,Year,Date,TeamUID,Team,Opponent,Conference,OffRuns,OppOffRuns,TeamConference,OppConference,OppUID,OppRunsScoredPerGame,OppRunsAllowedPerGame,SAOS,SADV
0,2023,2023-02-09,2023.SB.Oklahoma,Oklahoma,Liberty,0,1.0,0.0,Big 12,ASUN,2023.SB.Liberty,5.109091,0.354670,2.819522,0.000000
1,2023,2023-02-09,2023.SB.Oklahoma,Oklahoma,Duke,0,4.0,0.0,Big 12,ACC,2023.SB.Duke,6.127273,0.308842,12.951613,0.000000
2,2023,2023-02-10,2023.SB.Oklahoma,Oklahoma,Stanford,0,10.0,1.0,Big 12,Pac-12,2023.SB.Stanford,4.471698,0.188162,53.145604,0.223629
3,2023,2023-02-11,2023.SB.Oklahoma,Oklahoma,Washington,0,5.0,4.0,Big 12,Pac-12,2023.SB.Washington,5.693878,0.313112,15.968750,0.702509
4,2023,2023-02-12,2023.SB.Oklahoma,Oklahoma,San Jose St.,0,9.0,0.0,Big 12,Mountain West,2023.SB.SanJoseSt,3.673077,0.554268,16.237643,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14359,2023,2023-04-22,2023.SB.MississippiVal,Mississippi Val.,Jackson St.,1,0.0,3.0,SWAC,SWAC,2023.SB.JacksonSt,4.767442,0.598917,0.000000,0.629268
14360,2023,2023-04-23,2023.SB.MississippiVal,Mississippi Val.,Jackson St.,1,1.0,11.0,SWAC,SWAC,2023.SB.JacksonSt,4.767442,0.598917,1.669681,2.307317
14361,2023,2023-04-28,2023.SB.MississippiVal,Mississippi Val.,Alabama St.,1,0.0,8.0,SWAC,SWAC,2023.SB.AlabamaSt,4.777778,0.523592,0.000000,1.674419
14362,2023,2023-04-29,2023.SB.MississippiVal,Mississippi Val.,Alabama St.,1,3.0,5.0,SWAC,SWAC,2023.SB.AlabamaSt,4.777778,0.523592,5.729651,1.046512


Unnamed: 0,Team,TeamConference
0,Oklahoma,Big 12
1,UCLA,Pac-12
2,Florida St.,ACC
3,Boston U.,Patriot
4,Tennessee,SEC
...,...,...
290,Southern Utah,WAC
291,CSU Bakersfield,Big West
292,Tennessee Tech,OVC
293,Detroit Mercy,Horizon


Unnamed: 0,RegionalYear,RegionalSite,HostSeed,Seed1UID,Seed2UID,Seed3UID,Seed4UID,WinnerUID,RunnerUpUID,OneWinUID,NoWinUID
0,2023,Norman,1,2023.SB.Oklahoma,2023.SB.California,2023.SB.Missouri,2023.SB.Hofstra,2023.SB.Oklahoma,2023.SB.California,2023.SB.Missouri,2023.SB.Hofstra
1,2023,Los Angeles,2,2023.SB.UCLA,2023.SB.Liberty,2023.SB.SanDiegoSt,2023.SB.GrandCanyon,2023.SB.SanDiegoSt,2023.SB.Liberty,2023.SB.GrandCanyon,2023.SB.UCLA
2,2023,Tallahassee,3,2023.SB.FloridaSt,2023.SB.SouthCarolina,2023.SB.UCF,2023.SB.Marist,2023.SB.FloridaSt,2023.SB.SouthCarolina,2023.SB.UCF,2023.SB.Marist
3,2023,Knoxville,4,2023.SB.Tennessee,2023.SB.Indiana,2023.SB.Louisville,2023.SB.NorthernKy,2023.SB.Tennessee,2023.SB.Indiana,2023.SB.Louisville,2023.SB.NorthernKy
4,2023,Tuscaloosa,5,2023.SB.Alabama,2023.SB.CentralArk,2023.SB.MiddleTenn,2023.SB.LIU,2023.SB.Alabama,2023.SB.MiddleTenn,2023.SB.CentralArk,2023.SB.LIU
5,2023,Stillwater,6,2023.SB.OklahomaSt,2023.SB.WichitaSt,2023.SB.Nebraska,2023.SB.UMBC,2023.SB.OklahomaSt,2023.SB.Nebraska,2023.SB.WichitaSt,2023.SB.UMBC
6,2023,Seattle,7,2023.SB.Washington,2023.SB.Minnesota,2023.SB.McNeese,2023.SB.NorthernColo,2023.SB.Washington,2023.SB.McNeese,2023.SB.Minnesota,2023.SB.NorthernColo
7,2023,Durham,8,2023.SB.Duke,2023.SB.Charlotte,2023.SB.Campbell,2023.SB.GeorgeMason,2023.SB.Duke,2023.SB.Charlotte,2023.SB.Campbell,2023.SB.GeorgeMason
8,2023,Stanford,9,2023.SB.Stanford,2023.SB.Florida,2023.SB.LMU-CA,2023.SB.LongBeachSt,2023.SB.Stanford,2023.SB.Florida,2023.SB.LMU-CA,2023.SB.LongBeachSt
9,2023,Baton Rouge,10,2023.SB.LSU,2023.SB.Louisiana,2023.SB.Omaha,2023.SB.PrairieView,2023.SB.Louisiana,2023.SB.LSU,2023.SB.Omaha,2023.SB.PrairieView


Unnamed: 0_level_0,TeamConference,OffStrength,DefVulnerability,SAOS,SADV,CAOS,CADV
TeamUID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023.SB.A&M-CorpusChristi,Southland,3.266667,0.414003,6.855081,0.853768,8.290895,0.876554
2023.SB.AbileneChristian,WAC,4.311111,0.771994,8.626102,1.646356,8.587848,1.083580
2023.SB.Akron,MAC,4.530612,0.404808,8.998650,0.862807,7.885488,1.149676
2023.SB.Alabama,SEC,4.775862,0.266415,13.488565,0.496350,15.228775,0.474981
2023.SB.AlabamaA&M,SWAC,5.314286,0.704501,8.520124,1.714515,4.027777,1.776338
...,...,...,...,...,...,...,...
2023.SB.WichitaSt,AAC,6.711538,0.278188,15.342877,0.598579,9.958578,0.977140
2023.SB.Winthrop,Big South,2.938776,0.355605,6.603068,0.816616,8.013354,0.989106
2023.SB.Wisconsin,Big Ten,4.266667,0.338508,9.824049,0.697409,11.328819,0.662576
2023.SB.Yale,Ivy League,4.071429,0.524462,8.488251,1.209634,6.931182,1.258177


In [36]:
## STEP 0: SETTING IMPORT AND EXPORT PATHS

import pandas as pd
import numpy as np
import DoublePoisson as dp
import os
from datetime import datetime
import matplotlib.pyplot as plt
pd.options.mode.copy_on_write = True

# Set global date variables
MAX_MONTH = 5
MAX_DAY = 16
SIMULATIONS = 10000

# Set data import path
gbgPath = os.getcwd() + "\\Stat-Files"
tcnPath = os.getcwd() + "\\TCN-Files"

# Set data export path
outPath = os.getcwd() + "\\Analysis-Outputs"
probDFOutPath = os.getcwd() + "\\Analysis-Outputs\\ProbDFs"

# import overall regionals DF
regionalsDF = pd.read_csv(os.getcwd() + "\\Analysis-Inputs\\regionals.csv")

for year in [2024, 2023, 2022]:
    ## STEP 1a: IMPORT DATA
    shortYear = str(year)[2:]
    gbgFileName = gbgPath + f"\\SB_{year}_DPGameByGame.csv"
    tcnFileName = tcnPath + f"\\{year}_TCN.csv"
    gbgDF = pd.read_csv(gbgFileName)
    tcnDF = pd.read_csv(tcnFileName)
    regionalDF = regionalsDF[regionalsDF["RegionalYear"] == year]
    regionalDF.reset_index(inplace=True, drop=True)
    
    ## STEP 1b: CLEAN DATA
    # Basic Cleaning
    tcnDF = tcnDF[["Team", "Conference"]]
    gbgDF = gbgDF[["Year", "Date", "Conference", "Team", "TeamUID", "TeamConference", "Opponent", "OppUID", "OppConference", "RunsFor", "RunsAgainst"]]
#     tcnDF.rename(columns={"Conference" : "TeamConference"}, inplace=True)
#     gbgDF = gbgDF.merge(tcnDF, on="Team")
    gbgDF["Date"] = pd.to_datetime(gbgDF["Date"])
#     gbgDF.rename(columns={"UID" : "TeamUID"}, inplace=True)
#     tcnDF.rename(columns={"TeamConference" : "OppConference", "Team" : "Opponent"}, inplace=True)
#     gbgDF = gbgDF.merge(tcnDF, on="Opponent", how="inner")  # This eliminates non-D1 opponent rows!!
#     tcnDF.rename(columns={"OppConference" : "TeamConference", "Opponent" : "Team"}, inplace=True)
    
    # Getting rid of games before specified date
    gbgDF = gbgDF[gbgDF["Date"] < datetime(year, MAX_MONTH, MAX_DAY)]
    gbgDF.reset_index(drop=True, inplace=True)
    
    # Forming OppUID
#     oppUIDList = []
#     for team in gbgDF["Opponent"]:
#         oppUID = team.replace(" ", "").replace(" (", "-").replace(")", "").replace(".", "")
#         oppUID = str(year) + ".SB." + oppUID
#         oppUIDList.append(oppUID)
#     gbgDF["OppUID"] = oppUIDList
    
    ## STEP 2: CALCULATING REQUIRED VALUES
    # 2a: Basic Values
    strengthsDF = gbgDF.loc[:, ["TeamUID", "TeamConference","RunsFor", "RunsAgainst"]].groupby(["TeamUID", "TeamConference"]).mean()
    strengthsDF.rename(columns={"RunsFor" : "OffStrength", "RunsAgainst" : "DefVulnerability"}, inplace=True)
    
    # 2b: Schedule-Adjusted Values
    gbgDF = gbgDF.merge(strengthsDF.reset_index().rename(columns={"OffStrength" : "OppRunsScoredPerGame",
                                                               "DefVulnerability" : "OppRunsAllowedPerGame",
                                                               "TeamUID" : "OppUID",
                                                                "TeamConference" : "OppConference"}), 
                                                                on=["OppUID", "OppConference"])
    gbgDF["SAOS"] = gbgDF["RunsFor"] / gbgDF["OppRunsAllowedPerGame"]
    gbgDF["SADV"] = gbgDF["RunsAgainst"] / gbgDF["OppRunsScoredPerGame"]
    SAOsDvDF = gbgDF[["TeamUID","SAOS", "SADV"]].groupby("TeamUID").mean()
    strengthsDF = strengthsDF.merge(SAOsDvDF, left_index=True, right_index=True)
    
    # 2c: Conference-Adjusted Values
    gbgNonConOnlyDF = gbgDF[gbgDF["Conference"] == 0]
    confStrengthsDF = gbgNonConOnlyDF[["TeamConference", "SAOS", "SADV"]].groupby("TeamConference").mean()
    confStrengthsDF.rename(columns={"SAOS" : "CAOS", "SADV" : "CADV"}, inplace=True)
    strengthsDF.reset_index(inplace=True)
    strengthsDF = strengthsDF.merge(confStrengthsDF, on="TeamConference")
    # Set the max defensive vulnerability value to 1.
    strengthsDF["DefVulnerability"] = strengthsDF["DefVulnerability"] / strengthsDF["DefVulnerability"].mean()
    strengthsDF.set_index("TeamUID", inplace=True)
    
    # 2d: Export strengthsDF
    strengthsDF.to_csv(outPath + f"\\{year}_DP_strengths.csv")
    
    # STEP 3: ACTUALLY SIMULATE SOME FREAKIN' REGIONALS!!
    # create empty arrays to be filled .
    winnerCorrect = np.zeros(((len(regionalDF)), 4))
    runnerUpCorrect = np.zeros(((len(regionalDF)), 4))
    oneWinTeamCorrect = np.zeros(((len(regionalDF)), 4))
    noWinTeamCorrect = np.zeros(((len(regionalDF)), 4))
    allTeamsCorrect = np.zeros(((len(regionalDF)), 4))
    
    for regNum in range(len(regionalDF)): # should always be 16
        regUIDs = [regionalDF.loc[regNum, "Seed1UID"], regionalDF.loc[regNum, "Seed2UID"], 
                   regionalDF.loc[regNum, "Seed3UID"], regionalDF.loc[regNum, "Seed4UID"]]
        actualWinner = regionalDF.loc[regNum, "WinnerUID"]
        actualFinishers = [regionalDF.loc[regNum, "WinnerUID"], regionalDF.loc[regNum, "RunnerUpUID"], 
                       regionalDF.loc[regNum, "OneWinUID"], regionalDF.loc[regNum, "NoWinUID"]]
        hostSite = regionalDF.loc[regNum, "RegionalSite"]
        hostSeed = regionalDF.loc[regNum, "HostSeed"]
        
        # 3a: Implementing Base Double Poisson (where D_v is relative to mean runs allowed)
        # Calculating probabilities
        probArray = np.zeros((len(regUIDs), len(regUIDs)))
        for i in range(len(regUIDs)):
            uid = regUIDs[i]
            teamOa = strengthsDF.loc[uid, "OffStrength"]
            teamDv = strengthsDF.loc[uid, "DefVulnerability"]
            for j in range(i+1, len(regUIDs)):
                oppUID = regUIDs[j]
                oppOa = strengthsDF.loc[oppUID, "OffStrength"]
                oppDv = strengthsDF.loc[oppUID, "DefVulnerability"]
                teamMu = teamOa*oppDv
                oppMu = oppOa*teamDv
                (pWin, pLoss, pTie) = dp.DoublePoissonSimpleStrengths(teamMu, oppMu, allowTies=False)
                probArray[i, j] = pWin
                probArray[j, i] = pLoss

        probDF = pd.DataFrame(probArray, index=regUIDs, columns=regUIDs)
        probDF.to_csv(probDFOutPath + f"\\{year}_{hostSeed}_{hostSite}_Base_ProbDF.csv")
        
        winnerList = []; runnerUpList = []; oneWinList = []; noWinList = [];
        for i in range(SIMULATIONS):
            resultList = dp.SimulateFourTeamDE(probDF, regUIDs[0], regUIDs[1], regUIDs[2], regUIDs[3])
            winnerList.append(resultList[0])
            runnerUpList.append(resultList[1])
            oneWinList.append(resultList[2])
            noWinList.append(resultList[3])
        resultDF = pd.DataFrame({"RegionalYear" : [year] * SIMULATIONS, "RegionalHost" : [hostSite] * SIMULATIONS,
                                     "HostSeed" : [hostSeed] * SIMULATIONS, "ModelType" : ["BASE"] * SIMULATIONS,
                                     "Predicted Winner" : winnerList, "Predicted Runner-Up" :runnerUpList,
                                     "Predicted 1-2 Team" : oneWinList, "Predicted 0-2 Team" : noWinList})
        winnerCorrect[regNum, 0] = sum(resultDF["Predicted Winner"] == actualWinner)
        runnerUpCorrect[regNum, 0] = sum(resultDF["Predicted Runner-Up"] == actualFinishers[1])
        oneWinTeamCorrect[regNum, 0] = sum(resultDF["Predicted 1-2 Team"] == actualFinishers[2])
        noWinTeamCorrect[regNum, 0] = sum(resultDF["Predicted 0-2 Team"] == actualFinishers[3])
        allTeamsCorrect[regNum, 0] = sum((resultDF["Predicted Winner"] == actualWinner) & 
                                         (resultDF["Predicted Runner-Up"]==actualFinishers[1]) & 
                                         (resultDF["Predicted 1-2 Team"]==actualFinishers[2]) & 
                                         (resultDF["Predicted 0-2 Team"]==actualFinishers[3]))
        if regNum == 0: ovrResultDF = resultDF;
        else: ovrResultDF = pd.concat([ovrResultDF, resultDF]);
            
        # 3b: Implementing Schedule-Adjusted Double Poisson
        probArray = np.zeros((len(regUIDs), len(regUIDs)))
        for i in range(len(regUIDs)):
            uid = regUIDs[i]
            teamOa = strengthsDF.loc[uid, "OffStrength"]
            teamDv = strengthsDF.loc[uid, "DefVulnerability"]
            teamSAOS = strengthsDF.loc[uid, "SAOS"]
            teamSADV = strengthsDF.loc[uid, "SADV"]
            for j in range(i+1, len(regUIDs)):
                oppUID = regUIDs[j]
                oppOa = strengthsDF.loc[oppUID, "OffStrength"]
                oppDv = strengthsDF.loc[oppUID, "DefVulnerability"]
                oppSAOS = strengthsDF.loc[oppUID, "SAOS"]
                oppSADV = strengthsDF.loc[oppUID, "SADV"]
                teamMu = teamOa*oppDv*teamSAOS*oppSADV
                oppMu = oppOa*teamDv*oppSAOS*teamSADV
                (pWin, pLoss, pTie) = dp.DoublePoissonSimpleStrengths(teamMu, oppMu, allowTies=False)
                probArray[i, j] = pWin
                probArray[j, i] = pLoss
        probDF = pd.DataFrame(probArray, index=regUIDs, columns=regUIDs)
        probDF.to_csv(probDFOutPath + f"\\{year}_{hostSeed}_{hostSite}_SADP_ProbDF.csv")
        
        winnerList = []; runnerUpList = []; oneWinList = []; noWinList = [];
        for i in range(SIMULATIONS):
            resultList = dp.SimulateFourTeamDE(probDF, regUIDs[0], regUIDs[1], regUIDs[2], regUIDs[3])
            winnerList.append(resultList[0])
            runnerUpList.append(resultList[1])
            oneWinList.append(resultList[2])
            noWinList.append(resultList[3])
        resultDF = pd.DataFrame({"RegionalYear" : [year] * SIMULATIONS, "RegionalHost" : [hostSite] * SIMULATIONS,
                                     "HostSeed" : [hostSeed] * SIMULATIONS, "ModelType" : ["SADP"] * SIMULATIONS,
                                     "Predicted Winner" : winnerList, "Predicted Runner-Up" :runnerUpList,
                                     "Predicted 1-2 Team" : oneWinList, "Predicted 0-2 Team" : noWinList})
        # second column of correct array structure is for schedule-adjusted double poisson
        winnerCorrect[regNum, 1] = sum(resultDF["Predicted Winner"] == actualWinner)
        runnerUpCorrect[regNum, 1] = sum(resultDF["Predicted Runner-Up"] == actualFinishers[1])
        oneWinTeamCorrect[regNum, 1] = sum(resultDF["Predicted 1-2 Team"] == actualFinishers[2])
        noWinTeamCorrect[regNum, 1] = sum(resultDF["Predicted 0-2 Team"] == actualFinishers[3])
        allTeamsCorrect[regNum, 1] = sum((resultDF["Predicted Winner"] == actualWinner) & 
                                         (resultDF["Predicted Runner-Up"]==actualFinishers[1]) & 
                                         (resultDF["Predicted 1-2 Team"]==actualFinishers[2]) & 
                                         (resultDF["Predicted 0-2 Team"]==actualFinishers[3]))
        ovrResultDF = pd.concat([ovrResultDF, resultDF]);
        
        # 3c: Implementing Conference-Adjusted Double Poisson
        probArray = np.zeros((len(regUIDs), len(regUIDs)))
        for i in range(len(regUIDs)):
            uid = regUIDs[i]
            teamOa = strengthsDF.loc[uid, "OffStrength"]
            teamDv = strengthsDF.loc[uid, "DefVulnerability"]
            teamCAOS = strengthsDF.loc[uid, "CAOS"]
            teamCADV = strengthsDF.loc[uid, "CADV"]
            for j in range(i+1, len(regUIDs)):
                oppUID = regUIDs[j]
                oppOa = strengthsDF.loc[oppUID, "OffStrength"]
                oppDv = strengthsDF.loc[oppUID, "DefVulnerability"]
                oppCAOS = strengthsDF.loc[oppUID, "CAOS"]
                oppCADV = strengthsDF.loc[oppUID, "CADV"]
                teamMu = teamOa*oppDv*teamCAOS*oppCADV
                oppMu = oppOa*teamDv*oppCAOS*teamCADV
                (pWin, pLoss, pTie) = dp.DoublePoissonSimpleStrengths(teamMu, oppMu, allowTies=False)
                probArray[i, j] = pWin
                probArray[j, i] = pLoss
        probDF = pd.DataFrame(probArray, index=regUIDs, columns=regUIDs)
        probDF.to_csv(probDFOutPath + f"\\{year}_{hostSeed}_{hostSite}_CADP_ProbDF.csv")
        
        winnerList = []; runnerUpList = []; oneWinList = []; noWinList = [];
        for i in range(SIMULATIONS):
            resultList = dp.SimulateFourTeamDE(probDF, regUIDs[0], regUIDs[1], regUIDs[2], regUIDs[3])
            winnerList.append(resultList[0])
            runnerUpList.append(resultList[1])
            oneWinList.append(resultList[2])
            noWinList.append(resultList[3])
        resultDF = pd.DataFrame({"RegionalYear" : [year] * SIMULATIONS, "RegionalHost" : [hostSite] * SIMULATIONS,
                                     "HostSeed" : [hostSeed] * SIMULATIONS, "ModelType" : ["CADP"] * SIMULATIONS,
                                     "Predicted Winner" : winnerList, "Predicted Runner-Up" :runnerUpList,
                                     "Predicted 1-2 Team" : oneWinList, "Predicted 0-2 Team" : noWinList})
        # third column of correct array structure is for conference-adjusted double poisson
        winnerCorrect[regNum, 2] = sum(resultDF["Predicted Winner"] == actualWinner)
        runnerUpCorrect[regNum, 2] = sum(resultDF["Predicted Runner-Up"] == actualFinishers[1])
        oneWinTeamCorrect[regNum, 2] = sum(resultDF["Predicted 1-2 Team"] == actualFinishers[2])
        noWinTeamCorrect[regNum, 2] = sum(resultDF["Predicted 0-2 Team"] == actualFinishers[3])
        allTeamsCorrect[regNum, 2] = sum((resultDF["Predicted Winner"] == actualWinner) & 
                                         (resultDF["Predicted Runner-Up"]==actualFinishers[1]) & 
                                         (resultDF["Predicted 1-2 Team"]==actualFinishers[2]) & 
                                         (resultDF["Predicted 0-2 Team"]==actualFinishers[3]))
        ovrResultDF = pd.concat([ovrResultDF, resultDF]);

        # 3d: Implementing Conference-and-Schedule-Adjusted Double Poisson
        probArray = np.zeros((len(regUIDs), len(regUIDs)))
        for i in range(len(regUIDs)):
            uid = regUIDs[i]
            teamOa = strengthsDF.loc[uid, "OffStrength"]
            teamDv = strengthsDF.loc[uid, "DefVulnerability"]
            teamSAOS = strengthsDF.loc[uid, "SAOS"]
            teamSADV = strengthsDF.loc[uid, "SADV"]
            teamCAOS = strengthsDF.loc[uid, "CAOS"]
            teamCADV = strengthsDF.loc[uid, "CADV"]
            for j in range(i+1, len(regUIDs)):
                oppUID = regUIDs[j]
                oppOa = strengthsDF.loc[oppUID, "OffStrength"]
                oppDv = strengthsDF.loc[oppUID, "DefVulnerability"]
                oppCAOS = strengthsDF.loc[oppUID, "CAOS"]
                oppCADV = strengthsDF.loc[oppUID, "CADV"]
                oppSAOS = strengthsDF.loc[oppUID, "SAOS"]
                oppSADV = strengthsDF.loc[oppUID, "SADV"]
                teamMu = teamOa*oppDv*teamCAOS*oppCADV*teamSAOS*oppSADV
                oppMu = oppOa*teamDv*oppCAOS*teamCADV*oppSAOS*teamSADV
                (pWin, pLoss, pTie) = dp.DoublePoissonSimpleStrengths(teamMu, oppMu, allowTies=False)
                probArray[i, j] = pWin
                probArray[j, i] = pLoss
        probDF = pd.DataFrame(probArray, index=regUIDs, columns=regUIDs)
        probDF.to_csv(probDFOutPath + f"\\{year}_{hostSeed}_{hostSite}_CSADP_ProbDF.csv")
   
        winnerList = []; runnerUpList = []; oneWinList = []; noWinList = [];
        for i in range(SIMULATIONS):
            resultList = dp.SimulateFourTeamDE(probDF, regUIDs[0], regUIDs[1], regUIDs[2], regUIDs[3])
            winnerList.append(resultList[0])
            runnerUpList.append(resultList[1])
            oneWinList.append(resultList[2])
            noWinList.append(resultList[3])
        resultDF = pd.DataFrame({"RegionalYear" : [year] * SIMULATIONS, "RegionalHost" : [hostSite] * SIMULATIONS,
                                     "HostSeed" : [hostSeed] * SIMULATIONS, "ModelType" : ["CSADP"] * SIMULATIONS,
                                     "Predicted Winner" : winnerList, "Predicted Runner-Up" :runnerUpList,
                                     "Predicted 1-2 Team" : oneWinList, "Predicted 0-2 Team" : noWinList})
        # fourth column of correct array structure is for schedule-and-conference-adjusted double poisson
        winnerCorrect[regNum, 3] = sum(resultDF["Predicted Winner"] == actualWinner)
        runnerUpCorrect[regNum, 3] = sum(resultDF["Predicted Runner-Up"] == actualFinishers[1])
        oneWinTeamCorrect[regNum, 3] = sum(resultDF["Predicted 1-2 Team"] == actualFinishers[2])
        noWinTeamCorrect[regNum, 3] = sum(resultDF["Predicted 0-2 Team"] == actualFinishers[3])
        allTeamsCorrect[regNum, 3] = sum((resultDF["Predicted Winner"] == actualWinner) & 
                                         (resultDF["Predicted Runner-Up"]==actualFinishers[1]) & 
                                         (resultDF["Predicted 1-2 Team"]==actualFinishers[2]) & 
                                         (resultDF["Predicted 0-2 Team"]==actualFinishers[3]))
        ovrResultDF = pd.concat([ovrResultDF, resultDF]);
        
    # Output and analyze results.
    ovrResultDF.to_csv(outPath + f"\\SB_{year}_DP_RawResults.csv")

    highLevelDF = regionalDF
    
    for (modelType, i) in [("Base", 0), ("SADP", 1), ("CADP", 2), ("CSADP", 3)]:
        highLevelDF.loc[:, modelType + "WinnerCorrect"] = winnerCorrect[:, i]
        highLevelDF.loc[:, modelType + "RunnerUpCorrect"] = runnerUpCorrect[:, i]
        highLevelDF.loc[:, modelType + "OneWinTeamCorrect"] = oneWinTeamCorrect[:, i]
        highLevelDF.loc[:, modelType + "NoWinTeamCorrect"] = noWinTeamCorrect[:, i]
        highLevelDF.loc[:, modelType + "AllTeamsCorrect"] = allTeamsCorrect[:, i]
    
    highLevelDF.to_csv(outPath + f"\\SB_{year}_DP_HighLevelResults.csv")

highLevelDFs = []
for year in [2024, 2023, 2022]:
    highLevelDFs.append(pd.read_csv(outPath + f"\\SB_{year}_DP_HighLevelResults.csv"))
highLevelDF = pd.concat(highLevelDFs)
highLevelDF.to_csv(outPath + "\\SB_2022to2024_HighLevelResults.csv")
print("Analysis done. High-level results are ready to view!")

Analysis done. High-level results are ready to view!
