## Which rotations have gotten the most starts from their top-5 starters?

There was a reddit thread about teams keeping their 5-man rotations healthy, and getting
a high fraction of their starts from those top 5 starters.  Many people were chiming in
with anecdotal instances.

Cool, but let's generate a leaderboard.  Teams, since integration, that have gotten the greatest
fraction of starts from 5 pitchers.  And better yet, let's include the names and GS for those
pitchers.

*(Next, we'll generalize this away from 5 to any number, and away from GS to any stat.)*

In [1]:
import pandas as pd

category = 'w'
team_threshold=.88

# Find all player-seasons (since integration, with GS>0), ranked among their team-season by GS
ps = pd.read_parquet("file:../data/pitching.parquet")[['year_id', 'team_id', 'player_id', 'gs', category]]
ps = ps[(ps[category]>0) & (ps['year_id']>=1947)]
ps['rank_on_team'] = ps.sort_values([category], ascending=False).groupby(['year_id', 'team_id']).cumcount()+1
ps

Unnamed: 0,year_id,team_id,player_id,gs,w,rank_on_team
12618,1947,PIT,bagbyji02,6,5,5
12619,1947,PIT,bahred01,11,3,8
12622,1947,BRO,barnere02,9,5,7
12623,1947,BSN,barrere01,30,11,3
12625,1947,BSN,beazljo01,2,2,9
...,...,...,...,...,...,...
47617,2019,TBA,yarbrry01,14,11,2
47620,2019,BAL,ynoaga01,13,1,13
47622,2019,ARI,youngal01,15,7,4
47624,2019,TOR,zeuchtj01,3,1,19


In [2]:
# Aggregate the category for each team-season, in total and by their top 5
total_gs = ps.groupby(['year_id', 'team_id']).sum()['gs']
top5_category = ps[ps['rank_on_team']<=5].groupby(['year_id', 'team_id']).sum()[category]
teams = pd.merge(total_gs, top5_category, on=['team_id', 'year_id'])
teams

# Compute the pct of games by top5, filter to team_threshold%+, and sort
teams['top5pct'] = teams[category]/teams['gs']
#teams = teams[teams['top5pct']>team_threshold].sort_values(by='top5pct', ascending=False)
teams 

Unnamed: 0_level_0,Unnamed: 1_level_0,gs,w,top5pct
team_id,year_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
BOS,1947,154,65,0.422078
BRO,1947,152,70,0.460526
BSN,1947,153,67,0.437908
CHA,1947,155,53,0.341935
CHN,1947,151,47,0.311258
...,...,...,...,...
SLN,2019,160,61,0.381250
TBA,2019,135,48,0.355556
TEX,2019,147,48,0.326531
TOR,2019,155,27,0.174194


In [3]:
# Now to get the names of the players, from the people table

people = pd.read_parquet("file:../data/people.parquet")[['player_id', 'name_last']]

def lookup_player_name(player_id):
    return people[people['player_id']==player_id]['name_last'].values[0]

lookup_player_name('peavyja01')

'Peavy'

In [4]:
def get_players_desc(year_id, team_id):
    pitchers = pd.DataFrame(ps[(ps['year_id']==year_id) & (ps['team_id']==team_id) &(ps['rank_on_team'] <= 5)])
    pitchers['name'] = pitchers['player_id'].apply(lookup_player_name)
    pitchers_strings = pitchers.sort_values('rank_on_team').apply(lambda row: f"{row['name']} ({row[category]})", axis=1)
    return ", ".join(pitchers_strings)

get_players_desc(2019, 'SDN')

'Lucchesi (10), Paddack (9), Stammen (8), Lauer (8), Quantrill (6)'

In [5]:
teams = teams.sort_values(by='w', ascending=False).head(25)

# Add the players' names and totals
def get_players_desc_from_row(row):
    return get_players_desc(row[1], row[0])
teams['top5_names'] = teams.index.map(get_players_desc_from_row)
teams

Unnamed: 0_level_0,Unnamed: 1_level_0,gs,w,top5pct,top5_names
team_id,year_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
CLE,1954,155,93,0.6,"Wynn (23), Lemon (23), Garcia (19), Houtteman ..."
OAK,1990,162,88,0.54321,"Welch (27), Stewart (22), Sanderson (17), Moor..."
ATL,1998,162,88,0.54321,"Glavine (20), Maddux (18), Smoltz (17), Millwo..."
BAL,1971,158,87,0.550633,"McNally (21), Palmer (20), Dobson (20), Cuella..."
CLE,1951,152,86,0.565789,"Feller (22), Wynn (20), Garcia (20), Lemon (17..."
BRO,1951,156,85,0.544872,"Roe (22), Newcombe (20), Erskine (16), King (1..."
BAL,1970,162,85,0.524691,"Cuellar (24), McNally (24), Palmer (20), Hall ..."
LAN,1963,163,84,0.515337,"Koufax (25), Drysdale (19), Perranoski (16), P..."
BAL,1980,161,84,0.521739,"Stone (25), McGregor (20), Flanagan (16), Palm..."
SFN,1962,165,84,0.509091,"Sanford (24), O'Dell (19), Marichal (18), Pier..."
