# Top Pitchers in 2017

In [1]:
import os
import sys
import numpy as np
import pandas as pd
import pickle
from sqlalchemy import create_engine
from credentials import BASEBALL_DB_NAME, BASEBALL_DB_PWD

engine = create_engine('postgresql://baseball:{}@localhost:5432/{}'.format(BASEBALL_DB_PWD, BASEBALL_DB_NAME))

psd_query = open('sql_queries//pitcher_scores_daily.sql').read()
res= engine.execute(psd_query)
res.close()

In [2]:
pitcher_scores = pd.read_sql("SELECT * FROM pitcher_scores_daily WHERE \"Date\" > '2017-01-01'", engine)

# Best debut

First a note of caution - for batters, we consider them successful if they have a large "run contribution" (i.e. they're scoring runs for their team). For pitchers, though, it's the opposite. They should have a very low "other team run contribution" (i.e. they're keeping the other team from scoring).

The best debut belongs to Dillon Peters - he pitched through the end of the 7th inning, and gave up no runs during that time. Thus, he brought down the Phillies "expected score" by almost 4 runs.

In [3]:
pitcher_scores[pitcher_scores['num_game_ever'] == 1].sort_values('other_run_contribution_overall', ascending = True).head()

Unnamed: 0,Pitcher,Date,win_contribution_overall,run_contribution_overall,other_run_contribution_overall,num_game_ever,win_contribution_15game,run_contribution_15game,other_run_contribution_15game,num_game
14473,peted001,2017-09-01,0.552802,-0.039622,-4.148448,1,0.552802,-0.039622,-4.148448,1
20135,woodb005,2017-08-04,0.581038,0.153054,-4.074209,1,0.581038,0.153054,-4.074209,1
6374,garra001,2017-04-07,0.478885,0.419595,-3.311178,1,0.478885,0.419595,-3.311178,1
17385,skoge001,2017-05-30,0.474837,0.137314,-3.190977,1,0.474837,0.137314,-3.190977,1
16912,senza001,2017-04-06,0.379094,0.121467,-3.125499,1,0.379094,0.121467,-3.125499,1


Now let's zoom out even more, and require the player to have been in at least 50 games previously. Here we see some familiar names - Clayton Kershaw, Jacob deGrom, Noah Syndergaard, Johnny Cueto and Madison Bumgarner.

In [4]:
pitcher_scores[pitcher_scores['num_game_ever'] > 50].groupby('Pitcher').agg({'other_run_contribution_overall': 'mean'}).sort_values('other_run_contribution_overall', ascending = True).head()

Unnamed: 0_level_0,other_run_contribution_overall
Pitcher,Unnamed: 1_level_1
kersc001,-1.859172
degrj001,-1.31675
cuetj001,-1.194329
syndn001,-1.177949
hendk001,-1.123473


What's odd about this is that none of these are the Cy Young award winners for 2017! Part of that is that these numbers span seasons - we don't "start over" each year, like the Cy Young might.

Here, we look just at the last 15 game appearances. By this metric, Matt Scherzer should win (he did), and Corey Kluber is a reasonable choice (he also won).

In [5]:
pitcher_scores[(pitcher_scores['Date'] > '2017-07-01') & (pitcher_scores['num_game_ever'] > 50)].groupby('Pitcher').agg({'other_run_contribution_15game': 'mean'}).sort_values('other_run_contribution_15game', ascending = True).head()

Unnamed: 0_level_0,other_run_contribution_15game
Pitcher,Unnamed: 1_level_1
kersc001,-1.912058
schem001,-1.887505
klubc001,-1.80773
gonzg003,-1.649005
wooda002,-1.58944


But if we measure which pitchers did the most to help their team win, Clayton Kershaw was robbed! In games where he appeared, the likelihood of his team winning increased by 24%! Some of that is due to good production from the offense, but it's still a remarkable statistic.

In [6]:
pitcher_scores[(pitcher_scores['Date'] > '2017-07-01') & (pitcher_scores['num_game_ever'] > 50)].groupby('Pitcher').agg({'win_contribution_15game': 'mean'}).sort_values('win_contribution_15game', ascending = False).head()

Unnamed: 0_level_0,win_contribution_15game
Pitcher,Unnamed: 1_level_1
kersc001,0.270098
schem001,0.223489
gonzg003,0.205667
klubc001,0.204494
wooda002,0.183465
