## Moneyball Rosters

Using the 'moneyball_stats' (specifically the 'On Base Percentage' to salary ratio) create
a roster for each year that the proper stats were available). 

#### The Roster

A roster shall consist of the following players:
* Pitcher
* Catcher
* First Baseman
* Second Baseman
* Third Baseman
* Shortstop
* Left Fielder
* Center Fielder
* Right Fielder
* Designated Hitter

*NOTE: pitchers are probably not selected this way. As a matter of fact, pitchers don't even hit in the American League. In the National League, the pitchers actual pitching stats must weigh heavily in the selection process and I did not venture into that for this project. For these reasons I have included a DH along with the 9 defensive postitions.*

#### The Rules
1. One roster for each year.
2. One player per postition.
3. Players that play multiple positions in a season, connot occupy multiple spots on a roster.
4. A player must have at least 20 appearances at a position to be considered for a roster spot.

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv('../MB-Data/moneyball_stats.csv')
df.head()

Unnamed: 0,yearID,playerID,G_p,G_c,G_1b,G_2b,G_3b,G_ss,G_lf,G_cf,...,nameGiven,teamID,AB,H,BB,HBP,SF,OBP,salary,OBP_to_salary
0,1985,aguaylu01,0,0,0,17,7,60,0,0,...,Luis,PHI,165,46,22,6.0,3.0,0.377551,237000,0.159304
1,1985,almonbi01,0,0,7,0,7,43,26,6,...,William Francis,PIT,244,66,22,1.0,3.0,0.32963,255000,0.129267
2,1985,andujjo01,38,0,0,0,0,0,0,0,...,Joaquin,SLN,94,10,5,0.0,0.0,0.151515,1030000,0.01471
3,1985,armasto01,0,0,0,0,0,0,16,69,...,Antonio Rafael,BOS,385,102,18,2.0,5.0,0.297561,915000,0.03252
4,1985,ashbyal01,0,60,0,0,0,0,0,0,...,Alan Dean,HOU,189,53,24,1.0,1.0,0.362791,416667,0.08707


In [3]:
pd.set_option('display.max_columns', 30)


In [4]:
df.shape

(19514, 24)

In [5]:
mb_df = df.copy()

In [6]:
mb_df.columns

Index(['yearID', 'playerID', 'G_p', 'G_c', 'G_1b', 'G_2b', 'G_3b', 'G_ss',
       'G_lf', 'G_cf', 'G_rf', 'G_dh', 'nameFirst', 'nameLast', 'nameGiven',
       'teamID', 'AB', 'H', 'BB', 'HBP', 'SF', 'OBP', 'salary',
       'OBP_to_salary'],
      dtype='object')

In [7]:
positions = ['G_p', 'G_c', 'G_1b', 'G_2b', 'G_3b', 'G_ss','G_lf', 'G_cf',
             'G_rf', 'G_dh']

In [8]:
# Create a list for a specific position and year ordered by obp to salary.
def top_ten_position(position, year):
    df = mb_df.copy()
    yr_df = df.loc[(df.yearID == year) & (df[position] >= 20)]
    top_ten_df = yr_df.sort_values(by='OBP_to_salary', ascending=False)
    # reset the index of the dataframe and keep the original index ifor use later. 
    return top_ten_df.iloc[:10].reset_index()[['index','playerID']]
    

In [9]:
top_ten_position('G_rf', 2007)

Unnamed: 0,index,playerID
0,14026,willire03
1,13518,bucktr01
2,13747,kempma01
3,13695,hermije01
4,13690,hawpebr01
5,13974,swishni01
6,13938,schumsk01
7,13939,scottlu01
8,13812,markani01
9,13620,ethiean01


In [10]:
def get_indexes(year):

    id_df = pd.DataFrame(columns=['index', 'playerID'])
    selected = []
    for p in positions:
        ten_best_df = top_ten_position(p, year)
        best_list = ten_best_df['index'].to_list()
        available = True
        while available:
            idx = best_list.pop(0)
            if idx not in selected:
                selected.append(idx)
                available = False
            else:
                idx = best_list.pop(0)
        #print(p, best_player)        
    return selected


In [14]:
def mb_roster_df(year):
    '''Return a roster based on OBP/salary ratio for a given year'''
    df = mb_df.copy()
    index_list = get_indexes(year)
    roster_df = df.loc[index_list]
    #print('\nMoneyball Roster for ' + str(year) + ':\n', roster_df) 
    return roster_df
    

In [18]:
# Enter a year to obtain a roster for that year.
# NOTE: Only years 1985-2016 will work. (No salaray data for past 2016)
yr = 2000
mb = mb_roster_df(yr)[['nameFirst', 'nameLast', 'nameGiven',
       'teamID', 'AB', 'H', 'BB', 'HBP', 'SF', 'OBP', 'salary',
       'OBP_to_salary']]
print(f'The {yr} moneyball roster providing the best value is:')
full_position = ['Pitcher', 'Catcher', '1st Base', '2nd Base', '3rd Base', 'Shortstop',
                'Left Field', 'Center Field', 'Right FIeld', 'Designated Hitter']
#for i, p in enumerate(positions):
    


The 2000 moneyball roster providing the best value is:
