# Batting Metrics
### Author: Ryan Berns
### Modified by Matthew Chin (2021)

**Data**
https://github.com/chadwickbureau/baseballdatabank

In [1]:
# plotly standard imports
import plotly.graph_objs as go
import chart_studio.plotly as py

# Cufflinks wrapper on plotly
import cufflinks

# Data science imports
import pandas as pd
import numpy as np

# Options for pandas
pd.options.display.max_columns = 999

# Display all cell outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

# from __future__ import print_function, division
import matplotlib as mpl
import matplotlib.pyplot as plt
# %matplotlib inline

from plotly.offline import iplot
cufflinks.go_offline()

# Set global theme
cufflinks.set_config_file(world_readable=True, theme='pearl')

---
## I. Batting Average & OBP

In [2]:
batting = pd.read_csv("https://raw.githubusercontent.com/chadwickbureau/baseballdatabank/master/core/Batting.csv",sep=',')

In [3]:
players = pd.read_csv("https://raw.githubusercontent.com/chadwickbureau/baseballdatabank/master/core/People.csv",sep=',')

In [4]:
battingPlayers = pd.merge(batting,players,left_on="playerID",right_on="playerID",how='inner')

In [5]:
battingPlayers = battingPlayers.assign(age = (battingPlayers.yearID - battingPlayers.birthYear))

---
### Batting Average

Batting average is the most common baseball metric which many passive fans also understand. As a refresher, batting average is just the proportion of at bats a player has that results in a hit.

$Batting Average = \frac{Hits}{At Bats}$

In [6]:
# Calculate Batting average
battingPlayers = battingPlayers.assign(BA = round((battingPlayers.H/battingPlayers.AB),3))

In [7]:
battingPlayers.head()

Unnamed: 0,playerID,yearID,stint,teamID,lgID,G,AB,R,H,2B,3B,HR,RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP,birthYear,birthMonth,birthDay,birthCountry,birthState,birthCity,deathYear,deathMonth,deathDay,deathCountry,deathState,deathCity,nameFirst,nameLast,nameGiven,weight,height,bats,throws,debut,finalGame,retroID,bbrefID,age,BA
0,abercda01,1871,1,TRO,,1,4,0,0,0,0,0,0.0,0.0,0.0,0,0.0,,,,,0.0,1850.0,1.0,2.0,USA,OK,Fort Towson,1939.0,11.0,11.0,USA,PA,Philadelphia,Frank,Abercrombie,Francis Patterson,,,,,1871-10-21,1871-10-21,aberd101,abercda01,21.0,0.0
1,addybo01,1871,1,RC1,,25,118,30,32,6,0,0,13.0,8.0,1.0,4,0.0,,,,,0.0,1842.0,2.0,,CAN,ON,Port Hope,1910.0,4.0,9.0,USA,ID,Pocatello,Bob,Addy,Robert Edward,160.0,68.0,L,L,1871-05-06,1877-10-06,addyb101,addybo01,29.0,0.271
2,addybo01,1873,1,PH2,,10,51,12,16,1,0,0,10.0,1.0,1.0,2,0.0,,,,,0.0,1842.0,2.0,,CAN,ON,Port Hope,1910.0,4.0,9.0,USA,ID,Pocatello,Bob,Addy,Robert Edward,160.0,68.0,L,L,1871-05-06,1877-10-06,addyb101,addybo01,31.0,0.314
3,addybo01,1873,2,BS1,,31,152,37,54,6,3,1,32.0,6.0,5.0,2,1.0,,,,,0.0,1842.0,2.0,,CAN,ON,Port Hope,1910.0,4.0,9.0,USA,ID,Pocatello,Bob,Addy,Robert Edward,160.0,68.0,L,L,1871-05-06,1877-10-06,addyb101,addybo01,31.0,0.355
4,addybo01,1874,1,HR1,,50,213,25,51,9,2,0,22.0,4.0,2.0,1,1.0,,,,,0.0,1842.0,2.0,,CAN,ON,Port Hope,1910.0,4.0,9.0,USA,ID,Pocatello,Bob,Addy,Robert Edward,160.0,68.0,L,L,1871-05-06,1877-10-06,addyb101,addybo01,32.0,0.239


Let's take a look at the top batting averages for players with at least 100 at bats

In [8]:
battingPlayers.loc[(battingPlayers.AB > 100),['nameFirst','nameLast','bats','yearID','teamID','age','BA']]\
                .sort_values('BA', ascending=False).head()

Unnamed: 0,nameFirst,nameLast,bats,yearID,teamID,age,BA
390,Levi,Meyerle,R,1871,PH1,22.0,0.492
5476,Hugh,Duffy,R,1894,BSN,28.0,0.44
3185,Tip,O'Neill,R,1887,SL4,27.0,0.435
57,Ross,Barnes,R,1873,BS1,23.0,0.431
381,Cal,McVey,R,1871,BS1,22.0,0.431


The top 5 leaders of single season batting average were in the 19th century.  Although still impressive, this might be more telling of the type of play of the day rather than legendary skills at the plate.
> <h4> Who is Levi Meyerle?</h4> Levi was an infielder for the Philadelphia Athletics of the National Association.  The National Association was the predecessor of MLB's National league.  The league was founded in 1871 and lasted just 4 years until 1875.  Levi's record batting average of 0.492 is actually not considered a "record" by the MLB as they do not recognize statistics from the NA.  After bouncing around a bit after the 1871 season, Levin returned to the Philadelphia Athletics for their inaugural season in the National League.  The team was expelled from the league in 1876 when Levi went on to play for the Reds.  Levin finished his career with a career average of .356, 10 home runs and 276 RBIs.
*Source: https://en.wikipedia.org/wiki/Levi_Meyerle*

Let's see how batting average has trended in aggregate over the years

In [9]:
BA_by_yr = battingPlayers.loc[(battingPlayers.AB > 100),['yearID','BA']].groupby(battingPlayers.yearID).mean()

In [10]:
#  Create a dataframe of Towards Data Science Articles
BAnewYr = BA_by_yr.set_index("yearID")
# Plot read time as a time series
BAnewYr[['BA']].iplot(y='BA', mode='lines+markers',xTitle='Season', yTitle='Batting Average',\
                  text='BA', title='Batting Average by Season')

In [11]:
battingPlayers = battingPlayers.assign(decade = str(str(battingPlayers.yearID)[:3]+"0"))

In [12]:
battingPlayers = battingPlayers.assign(decade = battingPlayers['yearID'].map(lambda x: str(str(x)[:3]+"0")))

In [13]:
import seaborn as sns

cm = sns.light_palette("green", as_cmap=True)
battingPlayers.loc[(battingPlayers.AB >= 200),['decade','nameFirst','nameLast','yearID','teamID','age','BA']]\
                .sort_values('BA',ascending=False)\
                .groupby(battingPlayers['decade'],as_index=True).head(5).sort_values(['decade','BA'],ascending=False)\
                .style.background_gradient(subset=['BA','age'],cmap=cm)

Unnamed: 0,decade,nameFirst,nameLast,yearID,teamID,age,BA
99191,2020,Freddie,Freeman,2020,ATL,31.0,0.341
103253,2020,Marcell,Ozuna,2020,ATL,30.0,0.338
105638,2020,Trea,Turner,2020,WAS,27.0,0.335
104883,2020,Michael,Conforto,2020,NYN,27.0,0.322
105747,2020,Tim,Anderson,2020,CHA,27.0,0.322
89578,2010,Marco,Scutaro,2012,SFN,37.0,0.362
95459,2010,Josh,Hamilton,2010,TEX,29.0,0.359
100760,2010,DJ,LeMahieu,2016,COL,28.0,0.348
89898,2010,Miguel,Cabrera,2013,DET,30.0,0.348
97042,2010,Daniel,Murphy,2016,WAS,31.0,0.347


In [14]:
hof = pd.read_csv("https://raw.githubusercontent.com/chadwickbureau/baseballdatabank/master/core/HallOfFame.csv",sep=',')
hof = hof.loc[(hof.inducted=="Y"),:]

In [15]:
battingPlayersHof = pd.merge(battingPlayers,hof,left_on="playerID",right_on="playerID",how='left')
battingPlayersHof['inducted'] = battingPlayersHof['inducted'].fillna("N")

In [19]:
temp1 = battingPlayersHof.reindex((battingPlayersHof.AB >= 200),['decade','nameFirst','nameLast','playerID','yearID','teamID','age','inducted','BA'])\
                .sort_values('BA',ascending=False)\
                .groupby(battingPlayersHof['decade'],as_index=True).head(5).sort_values(['decade','BA'],ascending=False)

color = (temp1.inducted == 'Y').map({True: 'background-color: yellow', False: ''})
           
temp1.style.apply(lambda s: color)


Interpreting call
	'.reindex(a, b)' as 
	'.reindex(index=a, columns=b)'.
Use named arguments to remove any ambiguity. In the future, using positional arguments for 'index' or 'columns' will raise  a 'TypeError'.



ValueError: style is not supported for non-unique indices.

---
### On-Base Percentage

Batting average is the most common baseball metric which many passive fans also understand. As a refresher, batting average is just the proportion of at bats a player has that results in a hit.

$OBP = \frac{BB + H + HBP}{AB + BB + HBP + SF}$

In [27]:
battingPlayers = battingPlayers\
                    .assign(OBP = round(((battingPlayers.H + battingPlayers.BB + battingPlayers.HBP)/\
                                         (battingPlayers.AB + battingPlayers.BB + battingPlayers.HBP + battingPlayers.SF)),3))

In [28]:
battingPlayers.loc[(battingPlayers.AB > 100),['playerID','nameFirst','nameLast','bats','yearID','teamID','age','BA','OBP']]\
                .sort_values('OBP', ascending=False).head(10)

Unnamed: 0,playerID,nameFirst,nameLast,bats,yearID,teamID,age,BA,OBP
68499,bondsba01,Barry,Bonds,L,2004,SFN,40.0,0.362,0.609
68497,bondsba01,Barry,Bonds,L,2002,SFN,38.0,0.37,0.582
68498,bondsba01,Barry,Bonds,L,2003,SFN,39.0,0.341,0.529
31457,willite01,Ted,Williams,L,1957,BOS,39.0,0.388,0.526
68496,bondsba01,Barry,Bonds,L,2001,SFN,37.0,0.328,0.515
31454,willite01,Ted,Williams,L,1954,BOS,36.0,0.345,0.513
38324,mantlmi01,Mickey,Mantle,B,1957,NYA,26.0,0.365,0.512
31455,willite01,Ted,Williams,L,1955,BOS,37.0,0.356,0.496
107972,sotoju01,Juan,Soto,L,2020,WAS,22.0,0.351,0.49
77909,ramirma02,Manny,Ramirez,R,2008,LAN,36.0,0.396,0.489


Looking at the top ob-base percentages of all-time is a story of two players: **Barry Bonds and Ted Williams**. The two players hold 7 of the top 10 spots for OBP in a single season.

#### Barry Bonds

In [29]:
bonds = battingPlayers.loc[(battingPlayers.playerID == 'bondsba01'),['yearID','HR','OBP']]

In [30]:
bonds[['yearID', 'HR', 'OBP']].iplot(
    x='yearID',
    y='OBP',
    mode='lines+markers',
    secondary_y = 'HR',
    secondary_y_title='Home Runs',
    opacity=0.8,
    size=8,
    symbol=1,
    xTitle='Year',
    yTitle='On-Base Percentage',
    title='Barry Bonds OBP and Home Runs by Season')

#### Ted Williams

In [31]:
will = battingPlayers.loc[(battingPlayers.playerID == 'willite01'),['yearID','HR','OBP']]

In [32]:
will[['yearID', 'HR', 'OBP']].iplot(
    x='yearID',
    y='OBP',
    mode='lines+markers',
    secondary_y = 'HR',
    secondary_y_title='Home Runs',
    opacity=0.8,
    size=8,
    symbol=1,
    xTitle='Year',
    yTitle='On-Base Percentage',
    title='Ted Williams OBP and Home Runs by Season')

## Why doesn't Ted Williams have an OBP prior to the 1954 season?

In [33]:
battingPlayers.loc[(battingPlayers.playerID == 'willite01'),['yearID','AB','BB','H','SF','HBP']].sort_values("yearID")

Unnamed: 0,yearID,AB,BB,H,SF,HBP
31442,1939,565,107,185,,2.0
31443,1940,561,96,193,,3.0
31444,1941,456,147,185,,3.0
31445,1942,522,145,186,,4.0
31446,1946,514,156,176,,2.0
31447,1947,528,162,181,,2.0
31448,1948,509,126,188,,3.0
31449,1949,566,162,194,,2.0
31450,1950,334,82,106,,0.0
31451,1951,531,144,169,,0.0


Batters have not been charged with a time at-bat for a sacrifice hit since 1893, but baseball has changed the sacrifice fly rule multiple times. The sacrifice fly as a statistical category was instituted in 1908, only to be discontinued in 1931. The rule was again adopted in 1939, only to be eliminated again in 1940, before being adopted for the last time in 1954.<br>
https://en.wikipedia.org/wiki/Sacrifice_fly
