# A Two-Stage Sabermetrics Bayesian Model of Baseball Predictions

## AM207 Final Project
## Author: Thomas Leu

In [1]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
sns.set_style("white")

import time
import timeit

import scipy.stats 
import pandas as pd
import pymc as pm

import re
import numpy as np

import string
import json

## Introduction

An [article](http://www.jds-online.com/v2-1) published in the Journal of Data Science in January 2001 attempts to 
build a Bayesian model for predicting baseball winners by comparing the winning percentage, batting average, and ERA 
(earned run average) of the opposing teams. These statistical measurements were widely used to compare teams and 
players until [Sabermetrics](http://sabr.org/sabermetrics/the-basics), an analysis of baseball statistics championed
by Bill James and popularized in the book [Moneyball](https://en.wikipedia.org/wiki/Moneyball), began to replace these
traditional ranking systems. 

The following notebook attempts to recreate the statistical models built in Yang and Swartz's paper using OBP (on base 
percentage), 

In [2]:
# load 2014 postseason data
with open('2014_postseason.json', 'r') as infile:
    postseason_2014_str = infile.read()
    postseason_2014 = json.loads(postseason_2014_str)
    
num_games = len(postseason_2014['games'])
original_data = np.empty((num_games,4))
sabr_data = np.empty((num_games,4))

pitchers = postseason_2014['pitchers']
teams = postseason_2014['teams']

def original_game_data(game):
    home = game['home']; away = game['away']
    era_ratio = pitchers[away['starter']]['era'] / pitchers[home['starter']]['era']
    batavg_ratio = teams[home['team']]['BAT_AVG'] / teams[away['team']]['BAT_AVG']
    win_ratio = teams[home['team']]['WIN_PCT'] / teams[away['team']]['WIN_PCT']
    return np.array([era_ratio, batavg_ratio, win_ratio, game['x']])
    
def sabr_game_data(game):
    home = game['home']; away = game['away']
    fip_ratio = pitchers[away['starter']]['fip'] / pitchers[home['starter']]['fip']
    ops_ratio = teams[home['team']]['OPS'] / teams[away['team']]['OPS']
    win_ratio = teams[home['team']]['WIN_PCT'] / teams[away['team']]['WIN_PCT']
    return np.array([fip_ratio, ops_ratio, win_ratio, game['x']])

# build numpy data objects
for i in xrange(num_games):
    game = postseason_2014['games'][i]
    original_data[i] = original_game_data(game)
    sabr_data[i] = sabr_game_data(game)

In [None]:
def likelihood(data, weights, home_adv):
    print "Under Construction"