## Moneyball Rosters

Using the 'moneyball_stats' (specifically the 'On Base Percentage' to salary ratio) create
a roster for each year that the proper stats were available). 

#### The Roster

A roster shall consist of the following players:
* Pitcher
* Catcher
* First Baseman
* Second Baseman
* Third Baseman
* Shortstop
* Left Fielder
* Center Fielder
* Right Fielder
* Designated Hitter

*NOTE: pitchers are probably not selected this way. As a matter of fact, pitchers don't even hit in the American League. In the National League, the pitchers actual pitching stats must weigh heavily in the selection process and I did not venture into that for this project. For these reasons I have included a DH along with the 9 defensive postitions.*

#### The Rules
1. One roster for each year.
2. One player per postition.
3. Players that play multiple positions in a season, connot occupy multiple spots on a roster.
4. A player must have at least 20 appearances at a position to be considered for a roster spot.

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv('../MB-Data/moneyball_stats.csv')
df.head()

Unnamed: 0,yearID,playerID,G_p,G_c,G_1b,G_2b,G_3b,G_ss,G_lf,G_cf,...,nameGiven,teamID,AB,H,BB,HBP,SF,OBP,salary,OBP_to_salary
0,1985,aguaylu01,0,0,0,17,7,60,0,0,...,Luis,PHI,165,46,22,6.0,3.0,0.377551,237000,0.159304
1,1985,almonbi01,0,0,7,0,7,43,26,6,...,William Francis,PIT,244,66,22,1.0,3.0,0.32963,255000,0.129267
2,1985,andujjo01,38,0,0,0,0,0,0,0,...,Joaquin,SLN,94,10,5,0.0,0.0,0.151515,1030000,0.01471
3,1985,armasto01,0,0,0,0,0,0,16,69,...,Antonio Rafael,BOS,385,102,18,2.0,5.0,0.297561,915000,0.03252
4,1985,ashbyal01,0,60,0,0,0,0,0,0,...,Alan Dean,HOU,189,53,24,1.0,1.0,0.362791,416667,0.08707


In [3]:
pd.set_option('display.max_columns', 30)


In [4]:
df.shape

(19514, 24)

In [5]:
mb_df = df.copy()

In [6]:
mb_df.columns

Index(['yearID', 'playerID', 'G_p', 'G_c', 'G_1b', 'G_2b', 'G_3b', 'G_ss',
       'G_lf', 'G_cf', 'G_rf', 'G_dh', 'nameFirst', 'nameLast', 'nameGiven',
       'teamID', 'AB', 'H', 'BB', 'HBP', 'SF', 'OBP', 'salary',
       'OBP_to_salary'],
      dtype='object')

In [7]:
positions = ['G_p', 'G_c', 'G_1b', 'G_2b', 'G_3b', 'G_ss','G_lf', 'G_cf',
             'G_rf', 'G_dh']

In [8]:
# Create a list for a specific position and year ordered by obp to salary.
def top_ten_position(position, year):
    '''
    Returns a dataframe(orig index, playerID) consisting of the top ten
    players at a position in a given year ordered by OBP/salary ratio.
    '''
    df = mb_df.copy()
    yr_df = df.loc[(df.yearID == year) & (df[position] >= 20)]
    top_ten_df = yr_df.sort_values(by='OBP_to_salary', ascending=False)
    # reset the index of the dataframe and keep the original index ifor use later. 
    return top_ten_df.iloc[:10].reset_index()[['index','playerID']]
    

In [9]:
top_ten_position('G_rf', 2007)

Unnamed: 0,index,playerID
0,14026,willire03
1,13518,bucktr01
2,13747,kempma01
3,13695,hermije01
4,13690,hawpebr01
5,13974,swishni01
6,13938,schumsk01
7,13939,scottlu01
8,13812,markani01
9,13620,ethiean01


In [10]:
def get_indexes(year):
    '''
    Return a list indexes (correlating to master moneyball_stats
    dataframe) for the player at each position with the best 
    moneyball value (OBP/salary). RULE: a player can not be 
    aelected for multiple postitions.
    '''
    id_df = pd.DataFrame(columns=['index', 'playerID'])
    selected = []
    for p in positions:
        ten_best_df = top_ten_position(p, year)
        best_list = ten_best_df['index'].to_list()
        available = True
        while available:
            idx = best_list.pop(0)
            if idx not in selected:
                selected.append(idx)
                available = False
            else:
                idx = best_list.pop(0)
        #print(p, best_player)        
    return selected


In [32]:
# Enter a year to obtain a roster for that year.
# NOTE: Only years 1985-2016 will work. (No salaray data for past 2016)

def mb_roster_df(year):
    '''Return a roster based on OBP/salary ratio for a given year'''
    df = mb_df.copy()
    index_list = get_indexes(year)
    roster_df = df.loc[index_list][['nameFirst', 'nameLast', 'nameGiven',
       'teamID', 'AB', 'H', 'BB', 'HBP', 'SF', 'OBP', 'salary',
       'OBP_to_salary']]
    roster_df.insert(0, 'position', ['Pitcher', 'Catcher', '1st Base', '2nd Base',
                                     '3rd Base', 'Shortstop', 'Left Field',
                                     'Center Field', 'Right Feld', 'Designated Hitter'])
    return roster_df
    

In [36]:
mb_roster_df(1985)

Unnamed: 0,position,nameFirst,nameLast,nameGiven,teamID,AB,H,BB,HBP,SF,OBP,salary,OBP_to_salary
46,Pitcher,Tom,Browning,Thomas Leo,CIN,88,17,4,0.0,0.0,0.228261,60000,0.380435
346,Catcher,Mark,Salas,Mark Bruce,MIN,360,108,18,1.0,3.0,0.332461,60000,0.554101
84,1st Base,Glenn,Davis,Glenn Earle,HOU,350,95,27,7.0,4.0,0.332474,60000,0.554124
385,2nd Base,Tim,Teufel,Timothy Shawn,MIN,434,113,48,3.0,4.0,0.335378,110000,0.304889
45,3rd Base,Chris,Brown,John Christopher,SFN,432,117,38,11.0,0.0,0.345114,60000,0.575191
109,Shortstop,Mariano,Duncan,Mariano,LAN,562,137,38,3.0,4.0,0.293245,60000,0.488742
299,Left Field,Joe,Orsulak,Joseph Michael,PIT,397,119,26,1.0,3.0,0.34192,60000,0.569867
37,Center Field,Phil,Bradley,Philip Poole,SEA,641,192,55,12.0,2.0,0.364789,125000,0.291831
439,Right Feld,Mike,Young,Michael Darren,BAL,450,123,48,4.0,1.0,0.347913,121000,0.287531
361,Designated Hitter,Larry,Sheets,Larry Kent,BAL,328,86,28,2.0,1.0,0.32312,60000,0.538533


In [37]:
mb_roster_df(1991)

Unnamed: 0,position,nameFirst,nameLast,nameGiven,teamID,AB,H,BB,HBP,SF,OBP,salary,OBP_to_salary
3375,Pitcher,Chris,Hammond,Christopher Andrew,CIN,34,12,2,0.0,0.0,0.388889,100000,0.388889
3274,Catcher,Rick,Cerone,Richard Aldo,NYN,227,62,30,1.0,0.0,0.360465,100000,0.360465
3629,1st Base,Frank,Thomas,Frank Edward,CHA,559,178,138,1.0,2.0,0.452857,120000,0.377381
3359,2nd Base,Craig,Grebeck,Craig Allen,CHA,224,63,38,1.0,1.0,0.386364,125000,0.309091
3607,3rd Base,Ed,Sprague,Edward Nelson,TOR,160,44,19,3.0,1.0,0.360656,100000,0.360656
3399,Shortstop,David,Howard,David Wayne,KCA,236,51,16,1.0,2.0,0.266667,100000,0.266667
3199,Left Field,Beau,Allred,Dale LeBeau,CLE,125,29,25,1.0,2.0,0.359477,102500,0.350709
3409,Center Field,Mike,Huff,Michael Kale,CLE,146,35,25,4.0,1.0,0.363636,100000,0.363636
3510,Right Feld,Warren,Newson,Warren Dale,CHA,132,39,28,0.0,0.0,0.41875,100000,0.41875
3398,Designated Hitter,Sam,Horn,Samuel Lee,BAL,317,74,41,3.0,1.0,0.325967,175000,0.186267


In [38]:
mb_roster_df(2003)

Unnamed: 0,position,nameFirst,nameLast,nameGiven,teamID,AB,H,BB,HBP,SF,OBP,salary,OBP_to_salary
11536,Pitcher,Dontrelle,Willis,Dontrelle Wayne,FLO,58,14,3,0.0,0.0,0.278689,234426,0.118881
11386,Catcher,Jason,Phillips,Jason Lloyd,NYN,403,120,39,10.0,1.0,0.373068,300000,0.124356
11534,1st Base,Brad,Wilkerson,Stephen Bradley,MON,504,135,89,4.0,3.0,0.38,315000,0.120635
11137,2nd Base,Marcus,Giles,Marcus William,ATL,551,174,59,11.0,4.0,0.3904,316500,0.123349
11029,3rd Base,Miguel,Cabrera,Jose Miguel,FLO,314,84,25,2.0,1.0,0.324561,165574,0.196022
10996,Shortstop,Angel,Berroa,Angel Maria,KCA,567,163,29,18.0,8.0,0.337621,302000,0.111795
10973,Left Field,Brian,Banks,Brian Glen,FLO,149,35,25,2.0,2.0,0.348315,300000,0.116105
11011,Center Field,Milton,Bradley,Milton Obelle,CLE,377,121,64,5.0,5.0,0.421286,314300,0.134039
11244,Right Feld,Bobby,Kielty,Robert Michael,MIN,238,60,42,3.0,1.0,0.369718,325000,0.113759
11227,Designated Hitter,Nick,Johnson,Nicholas Robert,NYA,324,92,70,8.0,1.0,0.421836,364100,0.115857


In [39]:
mb_roster_df(2015)

Unnamed: 0,position,nameFirst,nameLast,nameGiven,teamID,AB,H,BB,HBP,SF,OBP,salary,OBP_to_salary
18915,Pitcher,Alex,Wood,Robert Alexander,ATL,33,5,3,0.0,0.0,0.222222,520000,0.042735
18729,Catcher,Roberto,Perez,Roberto Andres,CLE,184,42,33,2.0,2.0,0.348416,508600,0.068505
18778,1st Base,Jason,Rogers,Jason Douglas,MIL,152,45,15,2.0,0.0,0.366864,507500,0.072288
18706,2nd Base,Joe,Panik,Joseph Matthew,SFN,382,119,38,5.0,4.0,0.377622,522500,0.072272
18458,3rd Base,Matt,Duffy,Matthew Michael,SFN,573,169,30,5.0,2.0,0.334426,509000,0.065703
18408,Shortstop,Christian,Colon,Christian Anthony,KCA,107,31,11,0.0,0.0,0.355932,509525,0.069856
18530,Left Field,Brandon,Guyer,Brandon Eric,TBA,332,88,25,24.0,1.0,0.358639,515800,0.069531
18741,Center Field,AJ,Pollock,Allen Lorenz,ARI,609,192,53,2.0,9.0,0.367013,519500,0.070647
18834,Right Feld,George,Springer,George Chelston,HOU,388,107,50,8.0,3.0,0.367483,512900,0.071648
18707,Designated Hitter,Jimmy,Paredes,Jimmy Santiago,BAL,363,100,19,0.0,2.0,0.309896,515000,0.060174
