## Moneyball Rosters

Using the 'moneyball_stats' (specifically the 'On Base Percentage' to salary ratio) create
a roster for each year that the proper stats were available). 

#### The Roster

A roster shall consist of the following players:
* Pitcher
* Catcher
* First Baseman
* Second Baseman
* Third Baseman
* Shortstop
* Left Fielder
* Center Fielder
* Right Fielder
* Designated Hitter

*NOTE: pitchers are probably not selected this way. As a matter of fact, pitchers don't even hit in the American League. In the National League, the pitchers actual pitching stats must weigh heavily in the selection process and I did not venture into that for this project. For these reasons I have included a DH along with the 9 defensive postitions.*

#### The Rules
1. One roster for each year.
2. One player per postition.
3. Players that play multiple positions in a season, connot occupy multiple spots on a roster.
4. A player must have at least 20 appearances at a position to be considered for a roster spot.

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv('MB-Data/moneyball_stats.csv')
df.head()

Unnamed: 0,yearID,playerID,G_p,G_c,G_1b,G_2b,G_3b,G_ss,G_lf,G_cf,...,nameGiven,teamID,AB,H,BB,HBP,SF,OBP,salary,OBP_to_salary
0,1985,aguaylu01,0,0,0,17,7,60,0,0,...,Luis,PHI,165,46,22,6.0,3.0,0.377551,237000,0.159304
1,1985,almonbi01,0,0,7,0,7,43,26,6,...,William Francis,PIT,244,66,22,1.0,3.0,0.32963,255000,0.129267
2,1985,andujjo01,38,0,0,0,0,0,0,0,...,Joaquin,SLN,94,10,5,0.0,0.0,0.151515,1030000,0.01471
3,1985,armasto01,0,0,0,0,0,0,16,69,...,Antonio Rafael,BOS,385,102,18,2.0,5.0,0.297561,915000,0.03252
4,1985,ashbyal01,0,60,0,0,0,0,0,0,...,Alan Dean,HOU,189,53,24,1.0,1.0,0.362791,416667,0.08707


In [3]:
df.shape

(19514, 24)

In [4]:
mb_df = df.copy()

In [5]:
mb_df.columns

Index(['yearID', 'playerID', 'G_p', 'G_c', 'G_1b', 'G_2b', 'G_3b', 'G_ss',
       'G_lf', 'G_cf', 'G_rf', 'G_dh', 'nameFirst', 'nameLast', 'nameGiven',
       'teamID', 'AB', 'H', 'BB', 'HBP', 'SF', 'OBP', 'salary',
       'OBP_to_salary'],
      dtype='object')

In [6]:
positions = ['G_p', 'G_c', 'G_1b', 'G_2b', 'G_3b', 'G_ss',
             'G_lf', 'G_cf', 'G_rf', 'G_dh']

In [9]:
# Create a list for a specific position and year ordered by obp to salary.
def top_ten_position(position, year):
    df = mb_df.copy()
    yr_df = df.loc[(df.yearID == year) & (df[position] >= 20)]
    top_ten_df = yr_df.sort_values(by='OBP_to_salary', ascending=False)
    
    return top_ten_df.iloc[:10]
    

In [14]:
yr_and_pos('G_ss', 1999)

Unnamed: 0,yearID,playerID,G_p,G_c,G_1b,G_2b,G_3b,G_ss,G_lf,G_cf,...,nameGiven,teamID,AB,H,BB,HBP,SF,OBP,salary,OBP_to_salary
8484,1999,friasha01,0,0,0,8,0,53,0,0,...,Hanley,ARI,150,41,29,0.0,0.0,0.391061,209000,0.187111
8855,1999,shavejo01,0,0,9,1,6,24,0,0,...,Jonathan Taylor,TEX,73,21,5,2.0,0.0,0.35,200000,0.175
8351,1999,blumge01,0,0,0,2,0,42,0,0,...,Geoffrey Edward,MON,133,32,17,0.0,0.0,0.326667,200000,0.163333
8599,1999,jacksda04,0,0,0,21,0,100,2,0,...,Damian Jacques,SDN,388,87,53,3.0,3.0,0.319911,203000,0.157591
8571,1999,holbera01,0,0,0,11,1,22,0,0,...,Ray Arthur,KCA,100,28,8,0.0,1.0,0.330275,215000,0.153616
8508,1999,gonzaal02,0,0,0,0,0,135,0,0,...,Alexander,FLO,560,155,15,12.0,3.0,0.308475,201000,0.15347
8720,1999,merlolo01,0,0,1,8,9,24,1,0,...,Louis William,BOS,126,32,8,2.0,1.0,0.306569,200000,0.153285
8408,1999,collilo01,0,0,0,4,7,31,9,0,...,Louis Keith,MIL,135,35,14,0.0,2.0,0.324503,215000,0.150932
8341,1999,bergda01,0,0,0,29,19,37,3,0,...,David Scott,FLO,304,87,27,2.0,0.0,0.348348,235000,0.148233
8951,1999,wilsocr02,0,0,1,7,72,22,0,0,...,Craig Franklin,CHA,252,60,23,0.0,1.0,0.300725,205000,0.146695
