# Top Batters in 2017

In [1]:
import os
import sys
import numpy as np
import pandas as pd
import pickle
from sqlalchemy import create_engine

engine = create_engine('postgresql://baseball:baseball@localhost:5432/baseball')

bsd_query = open('sql_queries//batter_scores_daily.sql').read()
res= engine.execute(bsd_query)
res.close()

In [2]:
batter_scores = pd.read_sql("SELECT * FROM batter_scores_daily WHERE \"Date\" > '2017-01-01'", engine)

Ok, so "dejop001" (Paul DeJong) looks pretty fantastic, with an average "run contribution" of almost 1 run per at-bat! Unfortunately, it's not quite that simple. This is just a single game. Admittedly, he had a really solid game - he came in as a pinch hitter. On his one plate appearance, he knocked it out of the park for a solo home run. But one game isn't enough to make a meaningful estimate

In [3]:
batter_scores.sort_values('run_contribution_overall', ascending = False).head()

Unnamed: 0,Batter,Date,win_contribution_overall,run_contribution_overall,other_run_contribution_overall,num_game_ever,win_contribution_30ab,run_contribution_30ab,other_run_contribution_30ab,num_game
11646,dejop001,2017-05-28,0.00748,1.012213,0.024751,1.0,0.00748,1.012213,0.024751,1.0
45417,stepj002,2017-07-01,0.102774,0.767552,0.084392,2.0,0.102774,0.767552,0.084392,2.0
48358,urenr001,2017-09-01,0.144174,0.703478,-0.34461,1.0,0.144174,0.703478,-0.34461,1.0
50310,winkj002,2017-04-15,0.086944,0.673817,0.04255,2.0,0.086944,0.673817,0.04255,2.0
14159,ervip001,2017-08-17,0.04194,0.627629,0.067479,10.0,0.04194,0.627629,0.067479,10.0


Let's try that again - here, we only look at people once they've played in at least 30 games before. That should smooth out some of that volatility.

Top here is "manct001", or Trey Mancini. This is a pretty good guess - Mancini placed third in the Rookie of the Year voting for the 2017 season, so he definitely was playing well. He also has the nickname "Ice Trey", because he has an unusual ability to make big plays when they count. Remember, our scores are rewarding batters for making big plays, which shape the outcome of the game. 

In [4]:
batter_scores[batter_scores['num_game_ever'] > 30].sort_values('run_contribution_overall', ascending = False).head()

Unnamed: 0,Batter,Date,win_contribution_overall,run_contribution_overall,other_run_contribution_overall,num_game_ever,win_contribution_30ab,run_contribution_30ab,other_run_contribution_30ab,num_game
28392,manct001,2017-04-16,0.013554,0.224504,0.01255,39.0,0.013554,0.224504,0.01255,39.0
3711,bellc002,2017-05-06,0.007725,0.2161,0.016462,47.0,0.007725,0.2161,0.016462,47.0
28393,manct001,2017-04-18,0.013082,0.214672,0.011406,40.0,0.013082,0.214672,0.011406,40.0
14167,ervip001,2017-09-07,0.018572,0.208902,0.001572,31.0,0.018572,0.208902,0.001572,31.0
23178,hoskr001,2017-08-26,0.00875,0.200136,0.032528,72.0,0.00875,0.200136,0.032528,72.0


Now let's zoom out even more, and require the player to have been in at least 300 games previously. Here, we see Cody Bellinger (2017 All-Star), Joey Votto (perpetual All-Star), Mike Trout (perpetual All-Star), Miguel Cabrera (perpetual All-Star)...

Needless to say, this is proving pretty good at picking up on batters who shape the outcome of the game.

In [5]:
batter_scores[batter_scores['num_game_ever'] > 300].groupby('Batter').agg({'run_contribution_overall': 'mean'}).sort_values('run_contribution_overall', ascending = False).head()

Unnamed: 0_level_0,run_contribution_overall
Batter,Unnamed: 1_level_1
cabrm001,0.086183
vottj001,0.083697
bellc002,0.079422
troum001,0.079063
goldp001,0.077479
