# Modelling the bracket

* Now that we have our clean dataset we can model our bracket!

* We have the true results stored in the data folder so if you want to cheat go look there first!

* Our goal is to assign a "score" to each cartoon based on human intution, although we could take a bayesian approach if we wanted.

* We then use that score to "simulate" the bracket, in each face-off the higher score wins.



## Imports and loading data

In [None]:
import pandas as pd
import numpy as np
import time
import sys
import json

In [None]:
#df = pd.read_csv('data/my_cartoons_filled.csv')
df = pd.read_csv('data/original/cartoons_filled.csv')

## Helper function for testing results

## Parameters for model
We define a piecewise linear model for the "score" of the form:

$S = R + f(N) + w_yI(y<y_0) + (w_V V)/s_V$

More plainly:

Score = Rating + f(Number of Seasons) + (Over-the-hill-weight)*(Indicator if show aired before specified year) + (vote weight * votes)/vote scale

where f is a piecewise constant function that assigns a value to each (increasing with increasing seasons)



In [None]:
oth = 1979

def weight_seasons(num_seasons):
    if num_seasons < 5:
        return 0.5
    elif num_seasons >= 5 and num_seasons < 10:
        return 1.0
    elif num_seasons >= 10:
        return 1.5
    else:
        return 0.10
    

w_oth = -0.5

w_votes = 0.25

scale_votes = 400000.0
    
    

In [None]:
df['over_the_hill'] = df['year'] < oth

df['seasons_weight'] = df['seasons'].apply(weight_seasons)


df['score'] =(df['rating'] 
              + df['seasons_weight'] 
              + w_oth*df['over_the_hill'] 
              + w_votes*(df['votes']/scale_votes)
             )

In [None]:
titles = df['title'].tolist()
scores = df['score'].tolist()

## Simulating the bracket

* We stack every show in order of their face offs (Imagine taking the bracket and stacking the "east" and "west" sides).
* Taking the top two shows on this stack we compute the result of that faceoff and stick the winner on the bottom
* We repeat this process until there is only one winner left, storing the results every time the length halves

In [None]:
def create_bracket(titles,scores):
    assert len(titles) == len(scores), 'score and title length mismatch!'
    assert len(titles)%2 == 0, 'uneven bracket!'
    
    tuples = list(zip(titles,scores))
    
    N = len(tuples)

    temp = []
    
    rounds = [tuples.copy()]
    
    while len(tuples) > 1:
        
        toon1 = tuples.pop(0)
        toon2 = tuples.pop(0)
        
        if toon1[1] > toon2[1]:
            tuples.append(toon1)
            print(f'{toon1[0]} over {toon2[0]}')
            
        else:
            tuples.append(toon2)
            print(f'{toon2[0]} over {toon1[0]}')
            
        if len(tuples) == N//2:
            print('Round Over')
            rounds.append(tuples.copy())
            N = len(tuples)
            
        
    
    return rounds, tuples
    
    

## View Results

In [None]:
rounds, tuples = create_bracket(titles,scores)

In [None]:
rounds

## Visualize Results
* Taking advantage of a graph visualization library we can visualize the bracket as a DAG (directed acyclic graph)
* play with the layout function to see what kinds of other shapes you can get (I don't have the skills to replicate the bracket structure here).

In [None]:
import networkx as nx
import io
import pygraphviz
from networkx.drawing.nx_agraph import write_dot, graphviz_layout
from PIL import Image
G = nx.DiGraph()


left_graphs = []
right_graphs = []

# for i in range(len(rounds)-2):
#     left = [x[0]+f'(R{i})' for x in rounds[i][len(rounds[i])//2:]]
#     right = [x[0]+f'(R{i})' for x in rounds[i][:len(rounds[i])//2]]
    
#     left_graphs.append(left)
#     right_graphs.append(right)
    
# center = [[rounds[len(rounds)-1][0][0]+f'(R{len(rounds)-1})']]

# subgraphs = left_graphs + center + list(reversed(right_graphs))

# for k,sg in enumerate(subgraphs):
#     for n in sg:
#         G.add_node(n,y=0)

for i in range(len(rounds)-1,-1,-1):
    M = len(rounds[i])
    for j in range(M):
        G.add_node(rounds[i][j][0]+f'(R{i})')
        
        if i < len(rounds)-1:
            parent = int(np.floor(j/2.0))
            G.add_edge(rounds[i][j][0]+f'(R{i})',rounds[i+1][parent][0]+f'(R{i+1})')

            

        

A = nx.nx_agraph.to_agraph(G)
A.graph_attr.update(landscape='false',ranksep='3',strict='false')

# for i,sg in enumerate(subgraphs):
#     A.add_subgraph(sg,rank=i)

# Possible layouts [‘neato’|’dot’|’twopi’|’circo’|’fdp’|’nop’] WARNING: nop and fdp might crash the container.
A.layout('twopi', args='-Nfontsize=8 -Nwidth=".2" -Nheight=".2" -Nmargin=0 -Gfontsize=6 -Goverlap=True')
A.draw('bracket.png')
im = Image.open('bracket.png')
display(im)

## Checking our accuracy
* we calculate the accuracy by checking each bracket slot and seeing if we were right, then divide by the total number of bracket slots 
    * can you find a closed form solution for the number of slots an arbitrary bracket of size N s.t. mod(N,2) == 0?

In [None]:
with open('data/original/ground_truth.json') as f:
    true_rounds = json.load(f)['rounds']

def get_accuracy(predict,truth):
    assert len(predict) == len(truth), 'truth and predict length mismatch!'
    N = len(predict)
    
    total = 0
    correct = 0
    
    for i in range(N):
        true_rnd = [x[0] for x in truth[i]]
        predict_rnd = [x[0] for x in predict[i]]
        
        for t,p in zip(true_rnd,predict_rnd):
            if t == p:
                correct += 1
            total += 1
    
    return correct/float(total)
        

In [None]:
print(f'Bracket Accuracy: {get_accuracy(rounds,true_rounds)}')