The purpose of this notebook is to create a function that takes the imported player list:
- clean to only needed information
- seperate by pither and batter
- merge each on player name to get projections from model
- merge back pitcher and batter data 
- output df for now

Next steps:
- build line up based on salary

In [1]:
# imports
import pandas as pd
import numpy as np

In [2]:
# will need fanduel import
fd = pd.read_csv('../CapStone_Data/FanDuel-MLB-2021 ET-05 ET-04 ET-58318-players-list.csv')

In [3]:
# print first 5 rows of fd
fd.head()

Unnamed: 0,Id,Position,First Name,Nickname,Last Name,FPPG,Played,Salary,Game,Team,Opponent,Injury Indicator,Injury Details,Tier,Probable Pitcher,Batting Order,Roster Position
0,58318-52859,P,Jacob,Jacob deGrom,deGrom,61.6,5.0,12500,NYM@STL,NYM,STL,,,,Yes,,P
1,58318-16956,P,Gerrit,Gerrit Cole,Cole,54.166667,6.0,12200,HOU@NYY,NYY,HOU,,,,,0.0,P
2,58318-82554,P,Shane,Shane Bieber,Bieber,55.666667,6.0,12000,CLE@KC,CLE,KC,,,,,,P
3,58318-5481,P,Max,Max Scherzer,Scherzer,42.166667,6.0,12000,ATL@WSH,WSH,ATL,,,,,0.0,P
4,58318-82604,P,Corbin,Corbin Burnes,Burnes,49.6,5.0,11100,MIL@PHI,MIL,PHI,IL,Undisclosed,,,0.0,P


only columns needed:
- id - will need this later for template
- Position
- nickname - renamed to Name
- salary
- game
- team
- opponent
- injury indicator
- probable pitcher

In [4]:
fd.columns

Index(['Id', 'Position', 'First Name', 'Nickname', 'Last Name', 'FPPG',
       'Played', 'Salary', 'Game', 'Team', 'Opponent', 'Injury Indicator',
       'Injury Details', 'Tier', 'Probable Pitcher', 'Batting Order',
       'Roster Position'],
      dtype='object')

In [5]:
# clean up fd to match column list above
fd.drop(columns=['First Name', 'Last Name', 'FPPG', 'Played',
                 'Injury Details', 'Tier', 'Batting Order', 'Roster Position'], inplace=True)

Next step is fill nulls in probable pitcher and injury indicator

In [6]:
# filling nulls for probable pitcher
fd['Probable Pitcher'].fillna('No', inplace=True)

In [7]:
# fill nulls for injury indicator
fd['Injury Indicator'].fillna('Healthy', inplace=True)

In [8]:
# review new cleaned df
fd.head()

Unnamed: 0,Id,Position,Nickname,Salary,Game,Team,Opponent,Injury Indicator,Probable Pitcher
0,58318-52859,P,Jacob deGrom,12500,NYM@STL,NYM,STL,Healthy,Yes
1,58318-16956,P,Gerrit Cole,12200,HOU@NYY,NYY,HOU,Healthy,No
2,58318-82554,P,Shane Bieber,12000,CLE@KC,CLE,KC,Healthy,No
3,58318-5481,P,Max Scherzer,12000,ATL@WSH,WSH,ATL,Healthy,No
4,58318-82604,P,Corbin Burnes,11100,MIL@PHI,MIL,PHI,IL,No


In [9]:
fd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 831 entries, 0 to 830
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Id                831 non-null    object
 1   Position          831 non-null    object
 2   Nickname          831 non-null    object
 3   Salary            831 non-null    int64 
 4   Game              831 non-null    object
 5   Team              831 non-null    object
 6   Opponent          831 non-null    object
 7   Injury Indicator  831 non-null    object
 8   Probable Pitcher  831 non-null    object
dtypes: int64(1), object(8)
memory usage: 58.6+ KB


Next step rename Nickname to name

In [10]:
# renaming nickname column
fd.rename(columns={'Nickname': 'Name'}, inplace=True)

Next step filter to only healthy players

In [11]:
# fitler to only healthy players
fd = fd.loc[fd['Injury Indicator'] == 'Healthy']

In [12]:
# review dataframe
fd.head()

Unnamed: 0,Id,Position,Name,Salary,Game,Team,Opponent,Injury Indicator,Probable Pitcher
0,58318-52859,P,Jacob deGrom,12500,NYM@STL,NYM,STL,Healthy,Yes
1,58318-16956,P,Gerrit Cole,12200,HOU@NYY,NYY,HOU,Healthy,No
2,58318-82554,P,Shane Bieber,12000,CLE@KC,CLE,KC,Healthy,No
3,58318-5481,P,Max Scherzer,12000,ATL@WSH,WSH,ATL,Healthy,No
5,58318-16931,P,Yu Darvish,11000,PIT@SD,SD,PIT,Healthy,No


In [13]:
fd.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 681 entries, 0 to 830
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Id                681 non-null    object
 1   Position          681 non-null    object
 2   Name              681 non-null    object
 3   Salary            681 non-null    int64 
 4   Game              681 non-null    object
 5   Team              681 non-null    object
 6   Opponent          681 non-null    object
 7   Injury Indicator  681 non-null    object
 8   Probable Pitcher  681 non-null    object
dtypes: int64(1), object(8)
memory usage: 53.2+ KB


next step is to split into pithcers and batters

In [14]:
# split using .loc by position and make new dataframe for pitchers
pitchers = fd.loc[fd['Position']=='P']

In [15]:
# split using .loc by position and make new dataframe for batters
batters = fd.loc[fd['Position']!='P']

Now we have cleaned dataframes for each group. Now lets focus on pitchers to get projections added to dataframe.

First step for pitchers is to get the probable pitchers first, we only care about the ones that will start.

In [16]:
# save pitchers df to only starting pitchers
pitchers = pitchers.loc[pitchers['Probable Pitcher']=='Yes']

In [17]:
# review changes
pitchers.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 21 entries, 0 to 398
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Id                21 non-null     object
 1   Position          21 non-null     object
 2   Name              21 non-null     object
 3   Salary            21 non-null     int64 
 4   Game              21 non-null     object
 5   Team              21 non-null     object
 6   Opponent          21 non-null     object
 7   Injury Indicator  21 non-null     object
 8   Probable Pitcher  21 non-null     object
dtypes: int64(1), object(8)
memory usage: 1.6+ KB


Next step is to combine projections with 2021 stats.
- import testing data with projections from model
- merge the two data frames

In [18]:
# read in pitcher projections
pitcher_proj = pd.read_csv('../Projections/pitcher_projections_2021.csv')

Next step merge.

In [19]:
# merge attempt
pitcher_projections = pitchers.merge(pitcher_proj, how='left', on='Name')

Next step is to drop nulls from this list.  Will need to handle this in previous processes for final product.

In [20]:
pitcher_projections.dropna(inplace=True)

Next step is to clean up and only take what is needed from pitchers so that it can be merged with batters.
Columns needed:
- ID
- Position
- Name
- Salary
- Team
- Opponent
- Proj_FPPG

In [21]:
# overwrite df with only the columns needed
pitcher_projections = pitcher_projections[['Id', 'Position', 'Name', 'Salary', 'Team_x', 'Opponent', 'Proj_FPPG']]

In [22]:
# rename team column
pitcher_projections.rename(columns={'Team_x' : 'Team'}, inplace=True)

Now we have a cleaned and organized df of our pitchers that are starting. Next is the batters.

---

First step is to merge on projections.

In [239]:
# read in projections file
batter_21 = pd.read_csv('../Projections/batter_projections_2021.csv')

In [240]:
# merge projections with batter df, creating new df
batters_projections = batters.merge(batter_21, how='left', on='Name')

Will have to drop nulls for now, but need to figure out why there is no data for those batters.

In [241]:
# dropping batters with no projections
batters_projections.dropna(inplace=True)

In [242]:
# drop unneeded columns for merge with pitcher, overwrite current df
batters_projections = batters_projections[['Id', 'Position', 'Name', 'Salary', 'Team', 'Opponent','Proj_FPPG']]

Now there are two clean dataframes with the same columns, now we will determine optimal line up.

Steps to get optimal line up:
1. set salary cap
2. sort by fppg

    a. choose pitcher - based on projected fppg; in final product need to incoporate stats with strong coefficient values
    b. choose batters based on remaining salary and highest fppg; in final product need to incoporate stats with strong coefficient values
    
3. create list to append choices to
4. save to one df for output
5. take df and save to templet for export

In [243]:
# set cap for fanduel
salary_cap = 35_000

In [244]:
# sort pitcher by fppg projections
pitcher_projections.sort_values(by='Proj_FPPG', ascending=False, inplace=True, ignore_index=True)

In [245]:
# create a player list to 
lineup = []
lineup.append(pitcher_projections.values[0])

In [246]:
# need to update remaining salary
salary_cap -= pitcher_projections['Salary'][0]

In [247]:
# with updated salary fill remaining roster based on position and highest fppg
# create position list for remaining roster spots
position_list = ['C', '1B', '2B', '3B', 'SS', 'OF', 'OF', 'OF']
# sort batters by FPPG
batters_projections.sort_values(by='Proj_FPPG', ascending=False, inplace=True, ignore_index=True)

In [248]:
# redo 
# create count based on remaining positions
sal_count = 8

# create average salary variable for remaining players
avg_sal = salary_cap/sal_count

# create for loop for each position in list to take highest fppg
for pos in position_list:
    # setting counter to increase if player is already in list
    # this is inside the for loop beacuse it needs to be per position
    counter = 0
    # if salary greater than average move to next player
    for salary in batters_projections.loc[batters_projections['Position'] == pos]['Salary']:
        # test if salary is greater than average if it is increase counter
        if batters_projections.loc[batters_projections['Position'] == pos]['Salary'].values[counter] > avg_sal:
            counter += 1
        else:
            # if less than average add player to list
            lineup.append(batters_projections.loc[batters_projections['Position'] == pos].values[counter])
            # drop player so no duplicates are added
            batters_projections.drop(batters_projections.loc[batters_projections['Position'] == pos].index.values[counter], inplace=True)
            # decrease sal_count
            sal_count -= 1
            # decrease salary cap
            salary_cap -= batters_projections.loc[batters_projections['Position'] == pos]['Salary'].values[counter]
            # create new average salary
            avg_sal = salary_cap/sal_count
            break


  avg_sal = salary_cap/sal_count


In [249]:
df = pd.DataFrame(lineup, columns=['Id', 'Position', 'Name', 'Salary', 'Team', 'Opponent','Proj_FPPG'])
df

Unnamed: 0,Id,Position,Name,Salary,Team,Opponent,Proj_FPPG
0,58318-79951,P,Nick Pivetta,8300,BOS,DET,27.943867
1,58318-79174,C,Jacob Nottingham,2500,MIL,PHI,23.842438
2,58318-104483,1B,Jared Walsh,3500,LAA,TB,9.265179
3,58318-68587,2B,Ozzie Albies,3300,ATL,WSH,7.646085
4,58318-5104,3B,Pablo Sandoval,2000,ATL,WSH,8.927933
5,58318-37982,SS,Xander Bogaerts,3500,BOS,DET,8.415627
6,58318-5763,OF,Nelson Cruz,3900,MIN,TEX,9.506211
7,58318-60643,OF,Aaron Judge,3600,NYY,HOU,8.827579
8,58318-39086,OF,Byron Buxton,4300,MIN,TEX,11.413326


In [250]:
df['Salary'].sum()

34900