## Which rotations have gotten the most starts from their top-5 starters?

There was a reddit thread about teams keeping their 5-man rotations healthy, and getting
a high fraction of their starts from those top 5 starters.  Many people were chiming in
with anecdotal instances.

Cool, but let's generate a leaderboard.  Teams, since integration, that have gotten the greatest
fraction of starts from 5 pitchers.  And better yet, let's include the names and GS for those
pitchers.

*(Next, we'll generalize this away from 5 to any number, and away from GS to any stat.)*

In [1]:
import pandas as pd

# Find all player-seasons (since integration, with GS>0), ranked among their team-season by GS
ps = pd.read_parquet("file:../data/pitching.parquet")[['year_id', 'team_id', 'player_id', 'gs']]
ps = ps[(ps['gs']>0) & (ps['year_id']>=1947)]
ps['rank_on_team'] = ps.sort_values(['gs'], ascending=False).groupby(['year_id', 'team_id']).cumcount()+1
ps

Unnamed: 0,year_id,team_id,player_id,gs,rank_on_team
12617,1947,NY1,ayersbi01,4,11
12618,1947,PIT,bagbyji02,6,9
12619,1947,PIT,bahred01,11,7
12621,1947,BRO,bantaja01,1,10
12622,1947,BRO,barnere02,9,7
...,...,...,...,...,...
47617,2019,TBA,yarbrry01,14,5
47620,2019,BAL,ynoaga01,13,6
47622,2019,ARI,youngal01,15,5
47624,2019,TOR,zeuchtj01,3,13


In [2]:
# Aggregate the GS for each team-season, in total and by their top 5
total_gs = ps.groupby(['year_id', 'team_id']).sum()['gs']
top5_gs = ps[ps['rank_on_team']<=5].groupby(['year_id', 'team_id']).sum()['gs']
teams = pd.merge(total_gs, top5_gs, on=['team_id', 'year_id'])
teams = teams.rename(columns={"gs_x": "gs_total", "gs_y": "gs_top5"})

# Compute the pct of games by top5, filter to 98%+, and sort
teams['top5pct'] = teams['gs_top5']/teams['gs_total']
teams = teams[teams['top5pct']>.98].sort_values(by='top5pct', ascending=False)
teams

Unnamed: 0_level_0,Unnamed: 1_level_0,gs_total,gs_top5,top5pct
team_id,year_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
LAN,1966,162,162,1.0
SEA,2003,162,162,1.0
CIN,2012,162,161,0.993827
LAN,1994,114,113,0.991228
SLN,2005,162,160,0.987654
LAN,1993,162,160,0.987654
SFN,2012,162,160,0.987654
ATL,1980,161,159,0.987578
CHA,1972,154,152,0.987013
BAL,1972,154,152,0.987013


In [3]:
# Now to get the names of the players, from the people table

people = pd.read_parquet("file:../data/people.parquet")[['player_id', 'name_last']]

def lookup_player_name(player_id):
    return people[people['player_id']==player_id]['name_last'].values[0]

lookup_player_name('peavyja01')

'Peavy'

In [4]:
def get_players_desc(year_id, team_id):
    pitchers = pd.DataFrame(ps[(ps['year_id']==year_id) & (ps['team_id']==team_id) &(ps['rank_on_team'] <= 5)])
    pitchers['name'] = pitchers['player_id'].apply(lookup_player_name)
    pitchers_strings = pitchers.sort_values('rank_on_team').apply(lambda row: f"{row['name']} ({row['gs']})", axis=1)
    return ", ".join(pitchers_strings)

get_players_desc(2019, 'SDN')

'Lucchesi (30), Lauer (29), Paddack (26), Quantrill (18), Strahm (16)'

In [5]:
# Add the players' names and totals
def get_players_desc_from_row(row):
    return get_players_desc(row[1], row[0])
teams['top5_names'] = teams.index.map(get_players_desc_from_row)
teams

Unnamed: 0_level_0,Unnamed: 1_level_0,gs_total,gs_top5,top5pct,top5_names
team_id,year_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
LAN,1966,162,162,1.0,"Koufax (41), Drysdale (40), Osteen (38), Sutto..."
SEA,2003,162,162,1.0,"Garcia (33), Moyer (33), Meche (32), Franklin ..."
CIN,2012,162,161,0.993827,"Bailey (33), Cueto (33), Latos (33), Arroyo (3..."
LAN,1994,114,113,0.991228,"Martinez (24), Gross (23), Astacio (23), Candi..."
SLN,2005,162,160,0.987654,"Carpenter (33), Marquis (32), Suppan (32), Mul..."
LAN,1993,162,160,0.987654,"Hershiser (33), Candiotti (32), Gross (32), Ma..."
SFN,2012,162,160,0.987654,"Lincecum (33), Bumgarner (32), Cain (32), Zito..."
ATL,1980,161,159,0.987578,"Niekro (38), Alexander (35), Matula (30), McWi..."
CHA,1972,154,152,0.987013,"Wood (49), Bahnsen (41), Bradley (40), Lemonds..."
BAL,1972,154,152,0.987013,"McNally (36), Palmer (36), Dobson (36), Cuella..."
