# Win Expectancy and Leverage #

Name: Michael Dresser

### Win Expectancy (WE)###
<a href="http://m.mlb.com/glossary/advanced-stats/win-expectancy" target="_blank">Win Expectancy Explained</a>
Win expectancy is the probability that a team will win a game from a particular game state.

### Win Probability Added (WPA) ###
WPA captures how players impact their teams probability of a win. Basically, we sum up the change in WE for a player over the course of the season.
<a href="http://m.mlb.com/glossary/advanced-stats/win-probability-added" target="_blank">Win Probability Added Explained</a>

### Leverage Index (LI) ###
Measures the importance of the event by quantifying how much the WE could change with the event.
<a href="http://m.mlb.com/glossary/advanced-stats/leverage-index" target="_blank">Leverage Index Explained</a>

### Tom Tango ###
He's another baseball guy who has written some books and done some cool stuff with math. I don't think Tom Tango is his real name, and there's no picture of him anywhere.

### Understanding these terms ###
Here are the links to some pre-calculated LI and WE values. They change a bit year-to-year and environment-to-environment, but the basic idea stays the same.

#### Leverage ####
<a href="http://www.insidethebook.com/li.shtml#18" target="_blank">Leverage Index values</a>

#### Win Expectancy ####
<a href="http://www.tangotiger.net/welist.html" target="_blank">Win Expectancy values</a>

## Short-answer Questions ##
(2 pts)
1. What is the relationship between WE and Leverage? Are they positively or negatively correlated, or neither? Provide a few examples to describe your answer.
2. Why is -2 with bases loaded in the 9th a higher leverage situation than -3 and bases loaded in the 9th?
3. List 3 high-leverage situations in the 1st inning and 3 very high-leverage situations in the 7th inning.
4. In the WE tables, what state changes have the largest change in WE? What baseball events could cause those state changes?
5. When do a walk and a HR have the same impact on WE?

## Programming Questions ##
(4 pts)
6. How often did Nolan Arenado come up to bat last year in the 9th inning with the Rockies losing?
7. How often did Gerardo Parra come up to bat last year in the 9th inning with the Rockies losing?
8. Find two players on different teams with similar BA, OBP, SLG. One player should be on a high-scoring team and the other should be on a low-scoring team. Use the 2016 season so that you have Lahman and statcast data. You can use runs scored that season for determining high scoring and low scoring teams.
9. For those two players, how many times did they come up to bat with runners on base? How many times did they come up to bat with bases empty?

1. WE and Leverage do not have a linear relationship. When WE is very low, Leverage will not be very high because any action by a single player won't have an opportunity to change WE very much. When WE is middling, Leverage will be very high because any action by a single player can increase WE substantially. When WE is very high, Leverage will again not be very high because any action by a single player will not be able to change the WE very much.

2. Because in the -2 situation the batter only needs to score two of the runners in order to tie the game and continue it (increasing odds) or 3 of the runners to win outright (a home run is not required). In the -3 situation the batter needs *at least* a triple to tie the game and a home run to win it outright. In other words, in the -2 situation there are more plays that a batter can make to continue or win the game than in the -3 situation.

3. 1st (top): Runners on 1st and 2nd, no outs; bases loaded, no outs; bases loaded, 1 out. 7th (top): Bases loaded, down 2, no outs; bases loaded, down 2, 1 out; bases loaded, down 1, 1 out.

4. Bottom of the 9th, runner on 3rd, tie game, no outs -> bottom of the 9th, bases empty, tie game, 1 out. -.337. Event: runner on 3rd picked off. Bottom of the 9th, bases empty, down 1, 2 outs -> bottom of the 9th, bases empty, tie game, 2 outs. +.49. Event: HR. 

5. Bottom of the 9th, bases loaded, tie game.

In [1]:
from pybaseball import statcast

In [6]:
d2018 = statcast('2018-03-29', '2018-10-01')

This is a large query, it may take a moment to complete
Completed sub-query from 2018-03-29 to 2018-04-03
Completed sub-query from 2018-04-04 to 2018-04-09
Completed sub-query from 2018-04-10 to 2018-04-15
Completed sub-query from 2018-04-16 to 2018-04-21
Completed sub-query from 2018-04-22 to 2018-04-27
Completed sub-query from 2018-04-28 to 2018-05-03
Completed sub-query from 2018-05-04 to 2018-05-09
Completed sub-query from 2018-05-10 to 2018-05-15
Completed sub-query from 2018-05-16 to 2018-05-21
Completed sub-query from 2018-05-22 to 2018-05-27
Completed sub-query from 2018-05-28 to 2018-06-02
Completed sub-query from 2018-06-03 to 2018-06-08
Completed sub-query from 2018-06-09 to 2018-06-14
Completed sub-query from 2018-06-15 to 2018-06-20
Completed sub-query from 2018-06-21 to 2018-06-26
Completed sub-query from 2018-06-27 to 2018-07-02
Completed sub-query from 2018-07-03 to 2018-07-08
Completed sub-query from 2018-07-09 to 2018-07-14
Completed sub-query from 2018-07-15 to 2018-

In [8]:
from pybaseball import playerid_lookup

In [9]:
playerid_lookup("Arenado")

Gathering player lookup table. This may take a moment.


Unnamed: 0,name_last,name_first,key_mlbam,key_retro,key_bbref,key_fangraphs,mlb_played_first,mlb_played_last
0,arenado,jonah,643198,,,-1,,
1,arenado,nolan,571448,arenn001,arenano01,9777,2013.0,2019.0


In [20]:
d2018.loc[(d2018["batter"] == 571448.0) & (d2018["inning"] == 9) &
          (
              ((d2018["inning_topbot"] == "Top") & (d2018["home_score"] > d2018["away_score"])) | 
              ((d2018["inning_topbot"] == "Bot") & (d2018["away_score"] > d2018["home_score"]))
          )
         ]["game_date"].nunique()

28

## 6. 28 times in 2018

In [21]:
playerid_lookup("Parra", "gerardo")

Gathering player lookup table. This may take a moment.


Unnamed: 0,name_last,name_first,key_mlbam,key_retro,key_bbref,key_fangraphs,mlb_played_first,mlb_played_last
0,parra,gerardo,467827,parrg001,parrage01,8553,2009.0,2019.0


In [22]:
d2018.loc[(d2018["batter"] == 467827.0) & (d2018["inning"] == 9) &
          (
              ((d2018["inning_topbot"] == "Top") & (d2018["home_score"] > d2018["away_score"])) | 
              ((d2018["inning_topbot"] == "Bot") & (d2018["away_score"] > d2018["home_score"]))
          )
         ]["game_date"].nunique()

26

## 7. 26 times in 2018

In [23]:
d2016 = statcast('2016-03-29', '2016-10-01')

This is a large query, it may take a moment to complete
Completed sub-query from 2016-03-29 to 2016-04-03
Completed sub-query from 2016-04-04 to 2016-04-09
Completed sub-query from 2016-04-10 to 2016-04-15
Completed sub-query from 2016-04-16 to 2016-04-21
Completed sub-query from 2016-04-22 to 2016-04-26
Completed sub-query from 2016-04-27 to 2016-04-27
Completed sub-query from 2016-04-28 to 2016-05-03
Completed sub-query from 2016-05-04 to 2016-05-09
Completed sub-query from 2016-05-10 to 2016-05-15
Completed sub-query from 2016-05-16 to 2016-05-21
Completed sub-query from 2016-05-22 to 2016-05-27
Completed sub-query from 2016-05-28 to 2016-06-02
Completed sub-query from 2016-06-03 to 2016-06-08
Completed sub-query from 2016-06-09 to 2016-06-14
Completed sub-query from 2016-06-15 to 2016-06-20
Completed sub-query from 2016-06-21 to 2016-06-26
Completed sub-query from 2016-06-27 to 2016-07-02
Completed sub-query from 2016-07-03 to 2016-07-07
Completed sub-query from 2016-07-08 to 2016-

In [24]:
from pybaseball.lahman import batting

b = batting()

In [25]:
def get_obp_series(df):
    return (df["H"] + df["BB"] + df["HBP"]) / (df["AB"] + df["BB"] + df["HBP"] + df["SF"])

def get_slg_series(df):
    singles = df["H"] - df["HR"] - df["2B"] - df["3B"]
    return (singles + 2*df["2B"] + 3*df["3B"] + 4*df["HR"]) / df["AB"]

def get_avg_series(df):
    return (df["H"]) / (df["AB"])

In [26]:
b["OBP"] = get_obp_series(b)
b["SLG"] = get_slg_series(b)
b["BA"] = get_avg_series(b)

In [31]:
b = b.loc[(b["yearID"] == 2016) & (b["AB"] > 100)].copy()

In [35]:
tor = b.loc[(b["teamID"] == "TOR")] # high scoring team: blue jays

In [38]:
oak = b.loc[(b["teamID"] == "OAK")] # low scoring team: a's

In [51]:
matching_ids = []
threshold = 0.05
for _, trow in tor.iterrows():
    for _, orow in oak.iterrows():
        if abs(trow["OBP"] - orow["OBP"]) < threshold and abs(trow["SLG"] - orow["SLG"]) < threshold and abs(trow["BA"] - orow["BA"]) - threshold:
            matching_ids.append((trow["playerID"], orow["playerID"]))

In [52]:
matching_ids

[('barneda01', 'alonsyo01'),
 ('barneda01', 'butlebi03'),
 ('barneda01', 'crispco01'),
 ('barneda01', 'smolija01'),
 ('barneda01', 'vogtst01'),
 ('bautijo02', 'butlebi03'),
 ('bautijo02', 'reddijo01'),
 ('bautijo02', 'valenda01'),
 ('carreez01', 'alonsyo01'),
 ('carreez01', 'butlebi03'),
 ('carreez01', 'crispco01'),
 ('carreez01', 'lowrije01'),
 ('carreez01', 'smolija01'),
 ('encared01', 'healyry01'),
 ('goinsry01', 'burnsbi02'),
 ('goinsry01', 'coghlch01'),
 ('goinsry01', 'eibnebr01'),
 ('martiru01', 'alonsyo01'),
 ('martiru01', 'butlebi03'),
 ('martiru01', 'crispco01'),
 ('martiru01', 'semiema01'),
 ('martiru01', 'valenda01'),
 ('martiru01', 'vogtst01'),
 ('pillake01', 'alonsyo01'),
 ('pillake01', 'butlebi03'),
 ('pillake01', 'crispco01'),
 ('pillake01', 'smolija01'),
 ('pillake01', 'vogtst01'),
 ('saundmi01', 'daviskh01'),
 ('saundmi01', 'healyry01'),
 ('saundmi01', 'reddijo01'),
 ('saundmi01', 'semiema01'),
 ('saundmi01', 'valenda01'),
 ('smoakju01', 'alonsyo01'),
 ('smoakju01', 'b

## 8. Yonder Alonso and Darwin Barney

In [53]:
tor.loc[tor["playerID"] == "barneda01"]

Unnamed: 0,playerID,yearID,stint,teamID,lgID,G,AB,R,H,2B,...,BB,SO,IBB,HBP,SH,SF,GIDP,OBP,SLG,BA
101407,barneda01,2016,1,TOR,AL,104,279,35,75,13,...,22,48.0,1.0,1.0,2.0,2.0,8.0,0.322368,0.37276,0.268817


In [55]:
oak.loc[oak["playerID"] == "alonsyo01"]

Unnamed: 0,playerID,yearID,stint,teamID,lgID,G,AB,R,H,2B,...,BB,SO,IBB,HBP,SH,SF,GIDP,OBP,SLG,BA
101356,alonsyo01,2016,1,OAK,AL,156,482,52,122,34,...,45,74.0,1.0,1.0,0.0,4.0,15.0,0.315789,0.36722,0.253112


In [59]:
from pybaseball import playerid_reverse_lookup

playerid_reverse_lookup(["barneda01", "alonsyo01"], key_type="bbref")

Gathering player lookup table. This may take a moment.


Unnamed: 0,name_last,name_first,key_mlbam,key_retro,key_bbref,key_fangraphs,mlb_played_first,mlb_played_last
0,alonso,yonder,475174,alony001,alonsyo01,2530,2010.0,2019.0
1,barney,darwin,446381,barnd001,barneda01,2430,2010.0,2017.0


In [66]:
def with_runners_on(statcast_df, batter_id):
    df = statcast_df
    return len(df.loc[(df["batter"] == batter_id) &
                      (
                          (df["on_1b"].notna() | df["on_2b"].notna() | df["on_3b"].notna())
                      )
                     ].groupby(["game_date", "inning"]).size())

def without_runners_on(statcast_df, batter_id):
    df = statcast_df
    return len(df.loc[(df["batter"] == batter_id) &
                     (df["on_1b"].isna()) &
                     (df["on_2b"].isna()) &
                     (df["on_3b"].isna())
                    ].groupby(["game_date", "inning"]).size())

## 9. See below

In [68]:
alonso = 475174
barney = 446381

print("Yonder Alonso runners on / bases empty")
print(with_runners_on(d2016, alonso))
print(without_runners_on(d2016, alonso))

print("Darwin Barney runners on / bases empty")
print(with_runners_on(d2016, barney))
print(without_runners_on(d2016, barney))

Yonder Alonso runners on / bases empty
237
293
Darwin Barney runners on / bases empty
118
189
