# CARTOON MARCH MADNESS

* Sonic 102.9 (Edmonton Radio Station) put on a march madness for cartoons in, you guessed it, March of 2019. 

* People we're surveyed and voted for each cartoon, and those votes determined how the bracket played out. 

* For fun we endeavored to model the bracket and predict the outcome, but before we could do that we needed to gather the data!

* to do this we took advantage of an unofficial client library for IMDb: https://github.com/alberanid/imdbpy

* The hypothesis is that some combination of factors (votes, tenure, ratings) would be predictive of the voting outcome for sonics bracket!

## Load Libraries and fetch bracket list.

In [None]:
# IMDb has no official API instead we use an unoffical client (which actually scrapes the website!)

import pandas as pd
import time
import sys
from imdb import IMDb
from IPython.display import clear_output

In [None]:
from imdb.Movie import Movie

In [None]:
cartoons = pd.read_csv('data/original/cartoons.csv')

In [None]:
imdb_client = IMDb()
# movie.items() to list all attributes of a movie

## Manually curating the dataset with the help of the API
We search for each cartoon based on the title, but since there are potentially tons of shows 
with the same name (reboots,remakes,reimaginings) we need to manually select the correct result.

This is a super common task, and if it was to be repeated a ton the ideal solution would be to
make an application specifically for annotation. However Jupyter allows for the creation of a barebones
annotation tool, that is quick to create and simple to use.

In [None]:
# Because of the possibility of reboots/remakes we need to manually verify the correct show.

titles = cartoons.title.tolist()
data = {'title':[],
        'id':[],
        'year':[],
        'rating':[],
        'votes':[],
        'seasons':[]}

named_attributes = ['year','rating','votes','seasons']
        

#LOOP THROUGH SHOW TITLES AND VALIDATE MATCHING IMDB ENTRY
for title in titles:
    print(title)
    
    # include both the raw title and a search with animated appended (heuristic for false matches)
    raw_search = imdb_client.search_movie(title)
    animated_search = imdb_client.search_movie(title + ' animated')
    
    # should deduplicate here but the library has an issue with it's comparison method.
    # so set does not work.
    full_search = raw_search + animated_search
    
    # restrict to series only
    series_only = [item for item in full_search if item['kind'] == 'tv series']

    # LOOP THROUGH ALL SEARCH RESULTS THAT ARE SERIES AND PRINT RELEVANT INFO
    for idx,item in enumerate(series_only):
        imdb_client.update(item)
        try:
            print(idx,item['long imdb canonical title'], item['series years'], item['votes'], item.movieID)
        except Exception as e:
            print(f'Key Error:{e}')
                
    # GRAB CORRECT MATCH FROM USER
    user_input = int(input('Which index is the correct title?'))
    
    selected = series_only[user_input]
    
    data['title'].append(title)
    data['id'].append(selected.movieID)
    
    for key in named_attributes:
        data[key].append(selected[key])
    
    clear_output(wait=True)
    

## Final Dataset
view (and save if desired) the final dataset

In [None]:
cartoons_filled = pd.DataFrame(data=data)
with pd.option_context('display.max_rows',100):
    display(cartoons_filled)

In [None]:
# Save to file (Uncomment if you'd like to save)
# cartoons_filled.to_csv('data/my_cartoons_filled.csv',index=False)