# Tournament Model
#### Using [previously defined models](drops.ipynb), we can create a tournament model that uses a weighted average of the models to aggregate a new ranking of the teams. 

In [1]:
from src.tournament import Tournament
from src import Iqr, StdDeviation, SuperScoreModel, Mean

models = [(0.4, Iqr), (0.1, StdDeviation), (0.3, SuperScoreModel), (0.2, Mean)]

t_model = Tournament(
    path="../data/2023-02-18_penn_invitational_c.yaml", models_to_use=models
)

 Each tournament is assigned a `tourney_weight`, calculated based on the number of teams present that were also present at the national tournament the previous year * a recency multiplier. 
 #### ⚠️ This is a major area for improvement, as the current method of calculating the `tourney_weight` is not robust nor necessarily an accurate approach.

##### Each team's `rel_score` is the sum of the $\sum (w_m \cdot r_t )$ for each model, where $w_m$ is the weight of the model and $r_t$ is the team's ranking in the model.

This logic is implemented in the `aggregate` method of the `Tournament` class.

In [2]:
t_model.aggregate()

for i, (name, rel_score) in enumerate(sorted(t_model.prelim.items(), key=lambda x: x[1])):
    print(f"{i+1}: {name} - {rel_score}")

1: Montgomery High School - 0.3
2: Adlai E. Stevenson High School - 0.6
3: West Windsor-Plainsboro High School North - 0.8999999999999999
4: West Windsor-Plainsboro High School South - 1.2
5: Ward Melville High School - 1.5
6: Harriton High School - 1.7999999999999998
7: William G. Enloe High School - 2.1
8: Great Neck South High School - 2.4
9: Syosset High School - 2.6999999999999997
10: Lower Merion High School - 3.0
11: Cumberland Valley High School - 3.3
12: Centennial High School - 3.5999999999999996
13: South Brunswick High School - 3.9
14: East Brunswick High School - 4.2
15: John Jay High School - 4.5
16: Chaminade High School - 4.8
17: Northridge High School - 5.1
18: Penncrest High School - 5.3999999999999995
19: Fairfax High School - 5.7
20: North Pocono High School - 6.0
21: Columbia High School - 6.3
22: Ed W. Clark High School - 6.6
23: Bayard Rustin High School - 6.8999999999999995
24: Garnet Valley High School - 7.199999999999999
25: Charter School of Wilmington - 7.5


### Major Drawbacks

This model fails to account for very competitive teams that may not have attended the national tournament the previous year. This is especially prevalent in competitive states such as California, where the state is limited to 2 bids to the national tournament. This issue can be approached through manually assigning `tourney_weight` values to tournaments, but this also is not a robust solution. 

##### Another consideration is the actual results that should also be weighed in the final ranking. This is not currently implemented, but is a major area for improvement. # TODO

# Implementation of This Model
This model is intended to be a representation of a tournament and should be used to model many tournaments and aggregate the results.

##### Example:

In [3]:
tournament_paths = [
    "../data/2022-11-19_palatine_invitational_c.yaml",
    "../data/2022-12-03_northview_invitational_c.yaml",
    "../data/2023-02-18_penn_invitational_c.yaml",
    "../data/2023-02-04_solon_invitational_c.yaml",
    "../data/2023-01-21_mit_invitational_c.yaml",
    "../data/2023-02-11_golden_gate_invitational_c.yaml",
    "../data/2022-12-03_boyceville_satellite_invitational_c.yaml"
]
team_d: dict[str, []] = {}
for path in tournament_paths:
    t_model = Tournament(path=path, models_to_use=models)
    t_model.aggregate()
    print(f"Results for {path}")
    teams = len(t_model.ranks)
    for i, name in enumerate(
        t_model.ranks
    ):
        if name not in team_d:
            team_d[name] = []
        team_d[name].append((1 - (i+1)/teams, t_model.tourney_weight))

final_rank = {}
for name, dlist in team_d.items():
    items = len(dlist)
    for i, (rel_score, tourney_weight) in enumerate(dlist):
        if name not in final_rank:
            final_rank[name] = 0
        final_rank[name] += (rel_score * tourney_weight)
    final_rank[name] * 100/items
        
            
            
for i, (name, rel_score) in enumerate(sorted(final_rank.items(), key=lambda x: x[1], reverse=True)):
    print(f"{i+1}: {name} - {rel_score}")

Results for ../data/2022-11-19_palatine_invitational_c.yaml
Results for ../data/2022-12-03_northview_invitational_c.yaml
Results for ../data/2023-02-18_penn_invitational_c.yaml
Results for ../data/2023-02-04_solon_invitational_c.yaml
Results for ../data/2023-01-21_mit_invitational_c.yaml
Results for ../data/2023-02-11_golden_gate_invitational_c.yaml
Results for ../data/2022-12-03_boyceville_satellite_invitational_c.yaml
1: Montgomery High School - 0.2485542272991666
2: West Windsor-Plainsboro High School North - 0.2480253664464191
3: Ward Melville High School - 0.24749717818441236
4: Adlai E. Stevenson High School - 0.23565245918343483
5: Harriton High School - 0.23393767725751533
6: New Trier High School - 0.22493379356152188
7: Troy High School - 0.22105240464145712
8: Mountain View High School - 0.21851754287078695
9: Syosset High School - 0.20987918963627467
10: Mason High School - 0.20676137869019615
11: William G. Enloe High School - 0.20043725205668528
12: Lower Merion High Scho

This model's major flaw is it favors teams who attended the tournaments in which we sampled from. The weighing could be done on a non-linear regression and expanding the sample size of tournaments would be beneficial. 

Possible ideas to compensate:
- When a team does not attend a tournament, assign them some sort of default score
