In here we're going to explore a few different methods for extracting standard error from the component OPR model to see which ones hold up.

This is _essential_ to creating any kind of model that we'll use to get match score predictions, as we need to know what the match score confidence interval looks like to say by what margin a team will win by.

(I'd assume that in the future for that model we'll end up using some kind of "sliding confidence interval" to see at what point the intervals stop overlapping to determine a match victor)

(a lot of this is transcribed from some research/brainstorming i've been doing)

The main leads that I have for determining the error are:
- Using some kind of bootstrapping to simulate matches and find the standard error of the _overall_ OPR model
- Using the fact that $\frac{1}{n}\sqrt{\Sigma ^n _{i=0} (O_i - AP)^2} = SE$ to find the standard error for a _single team_ -- (O is observed match score, AP is match score prediction from model (incidence * oPr))
- Taking the above method and repeating it for all the teams, tuning the model being made as it goes, to eventaully get a better tuned model (and hope that it converges!) -- this seems a bit like an ML technique
- Looking at the mean of the standard error for the OPR fit/residuals when doing the matrix math in order to get a general uncertainty for each team
- A poor man's approach to calculating OPR (as outlined by [wgardner](https://www.chiefdelphi.com/t/standard-error-of-opr-values/144363/20)), which basically entails finding the variance and standard deviation of a team's OPR while omitting every match, one match at a time. This seems like less of a "standard error" metric, but still one that can be used to estimate match score variance

The first method that I'll be looking at is **wgardner's method**, as that seems like it'll be the simplest to implement.

In [7]:
from dataLoader import load_data_event

import numpy as np
import pandas as pd
import json

In [9]:
# load in the data -- for this, we'll only be looking at data from one qualifier event to see if the method works
with open('data/week1/2020ncwak.json', 'r') as f:
    json_data = json.load(f)

team_keys = json_data['teams']

qualification_matches, _, _, _ = load_data_event('week1/2020ncwak')
qualification_matches

Unnamed: 0,match_key,match_type,match_number,blue_1_key,blue_2_key,blue_3_key,blue_keys,blue_endgame_level,blue_foul_count,blue_points_scored,...,red_3_init_line,red_1_endgame,red_2_endgame,red_3_endgame,red_cells_bottom_auto,red_cells_bottom_teleop,red_cells_outer_auto,red_cells_outer_teleop,red_cells_inner_auto,red_cells_inner_teleop
11,2020ncwak_qm1,qm,1,frc435,frc6565,frc5511,"[frc435, frc6565, frc5511]",True,1,81,...,5,0,5,5,0,0,0,0,0,0
12,2020ncwak_qm10,qm,10,frc5919,frc7265,frc4291,"[frc5919, frc7265, frc4291]",True,0,10,...,5,0,0,25,0,0,3,0,0,0
13,2020ncwak_qm11,qm,11,frc435,frc6240,frc6502,"[frc435, frc6240, frc6502]",False,0,73,...,5,5,5,5,0,9,0,0,0,0
14,2020ncwak_qm12,qm,12,frc3459,frc4828,frc5190,"[frc3459, frc4828, frc5190]",False,0,107,...,5,5,5,0,0,0,0,0,0,0
15,2020ncwak_qm13,qm,13,frc7890,frc5518,frc5511,"[frc7890, frc5518, frc5511]",True,0,92,...,5,5,5,0,0,4,0,1,0,0
16,2020ncwak_qm14,qm,14,frc5607,frc5160,frc2642,"[frc5607, frc5160, frc2642]",True,0,27,...,0,5,0,5,0,0,2,1,1,1
17,2020ncwak_qm15,qm,15,frc6500,frc3459,frc6496,"[frc6500, frc3459, frc6496]",False,1,59,...,5,5,5,5,0,0,3,3,0,0
18,2020ncwak_qm16,qm,16,frc2059,frc5919,frc5762,"[frc2059, frc5919, frc5762]",False,4,37,...,5,5,5,5,0,5,1,5,2,0
19,2020ncwak_qm17,qm,17,frc8090,frc7890,frc3229,"[frc8090, frc7890, frc3229]",True,0,50,...,5,5,5,5,0,0,0,6,0,0
20,2020ncwak_qm18,qm,18,frc7671,frc5160,frc7463,"[frc7671, frc5160, frc7463]",False,0,54,...,0,5,5,0,1,1,0,0,0,0


In [10]:
# common function for calculating a team's OPR given a point objective to get the OPR for and matches to look at
# still, thanks to [this](https://blog.thebluealliance.com/2017/10/05/the-math-behind-opr-an-introduction/)
def calculate_oprs(matches, team_keys, point_objective):
    match_matrix = np.zeroes((len(matches)*2, len(teams)))
    score_matrix = np.zeroes((len(matches)*2, 1))

    for _, match in matches.iterrows():
        i = match['match_number']-1

        for red_team in match['red_keys']:
            match_matrix[i*2][team_keys.index(red_team)] = 1
        score_matrix[i*2][0] = match[f'red_{point_objective}']

        for blue_team in match['blue_keys']:
            match_matrix[i*2+1][team_keys.index(blue_team)] = 1
        score_matrix[i*2+1][0] = match[f'blue_{point_objective}']

    l_matrix = match_matrix.T.dot(match_matrix)
    r_matrix = match_matrix.T.dot(score_matrix)
    opr_matrix = np.linalg.pinv(l_matrix).dot(r_matrix)

    return opr_matrix