The purpose of this notebook is to create a function that takes the imported player list:
- clean to only needed information
- seperate by pither and batter
- merge each on player name to get projections from model
- merge back pitcher and batter data 
- output df for now

Next steps:
- build line up based on salary

In [164]:
# imports
import pandas as pd
import numpy as np

In [165]:
# will need fanduel import
fd = pd.read_csv('../../../Downloads/FanDuel-MLB-2021 ET-05 ET-14 ET-58874-players-list.csv')

In [166]:
# print first 5 rows of fd
fd.head()

Unnamed: 0,Id,Position,First Name,Nickname,Last Name,FPPG,Played,Salary,Game,Team,Opponent,Injury Indicator,Injury Details,Tier,Probable Pitcher,Batting Order,Roster Position
0,58874-5481,P,Max,Max Scherzer,Scherzer,45.428571,7.0,12500,WSH@ARI,WSH,ARI,,,,Yes,,P
1,58874-52859,P,Jacob,Jacob deGrom,deGrom,57.333333,6.0,12500,NYM@TB,NYM,TB,IL,Side,,,0.0,P
2,58874-82554,P,Shane,Shane Bieber,Bieber,52.75,8.0,12200,CLE@SEA,CLE,SEA,,,,,,P
3,58874-16956,P,Gerrit,Gerrit Cole,Cole,53.25,8.0,12000,NYY@BAL,NYY,BAL,,,,,,P
4,58874-16959,P,Trevor,Trevor Bauer,Bauer,44.0,8.0,11000,MIA@LAD,LAD,MIA,,,,,,P


only columns needed:
- id - will need this later for template
- Position
- nickname - renamed to Name
- salary
- game
- team
- opponent
- injury indicator
- probable pitcher

In [167]:
fd.columns

Index(['Id', 'Position', 'First Name', 'Nickname', 'Last Name', 'FPPG',
       'Played', 'Salary', 'Game', 'Team', 'Opponent', 'Injury Indicator',
       'Injury Details', 'Tier', 'Probable Pitcher', 'Batting Order',
       'Roster Position'],
      dtype='object')

In [168]:
# clean up fd to match column list above
fd.drop(columns=['First Name', 'Last Name', 'FPPG', 'Played',
                 'Injury Details', 'Tier', 'Batting Order', 'Roster Position'], inplace=True)

Next step is fill nulls in probable pitcher and injury indicator

In [169]:
# filling nulls for probable pitcher
fd['Probable Pitcher'].fillna('No', inplace=True)

In [170]:
# fill nulls for injury indicator
fd['Injury Indicator'].fillna('Healthy', inplace=True)

In [171]:
# review new cleaned df
fd.head()

Unnamed: 0,Id,Position,Nickname,Salary,Game,Team,Opponent,Injury Indicator,Probable Pitcher
0,58874-5481,P,Max Scherzer,12500,WSH@ARI,WSH,ARI,Healthy,Yes
1,58874-52859,P,Jacob deGrom,12500,NYM@TB,NYM,TB,IL,No
2,58874-82554,P,Shane Bieber,12200,CLE@SEA,CLE,SEA,Healthy,No
3,58874-16956,P,Gerrit Cole,12000,NYY@BAL,NYY,BAL,Healthy,No
4,58874-16959,P,Trevor Bauer,11000,MIA@LAD,LAD,MIA,Healthy,No


In [172]:
fd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1022 entries, 0 to 1021
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Id                1022 non-null   object
 1   Position          1022 non-null   object
 2   Nickname          1022 non-null   object
 3   Salary            1022 non-null   int64 
 4   Game              1022 non-null   object
 5   Team              1022 non-null   object
 6   Opponent          1022 non-null   object
 7   Injury Indicator  1022 non-null   object
 8   Probable Pitcher  1022 non-null   object
dtypes: int64(1), object(8)
memory usage: 72.0+ KB


Next step rename Nickname to name

In [173]:
# renaming nickname column
fd.rename(columns={'Nickname': 'Name'}, inplace=True)

Next step filter to only healthy players

In [174]:
# fitler to only healthy players
fd = fd.loc[fd['Injury Indicator'] == 'Healthy']

In [175]:
# review dataframe
fd.head()

Unnamed: 0,Id,Position,Name,Salary,Game,Team,Opponent,Injury Indicator,Probable Pitcher
0,58874-5481,P,Max Scherzer,12500,WSH@ARI,WSH,ARI,Healthy,Yes
2,58874-82554,P,Shane Bieber,12200,CLE@SEA,CLE,SEA,Healthy,No
3,58874-16956,P,Gerrit Cole,12000,NYY@BAL,NYY,BAL,Healthy,No
4,58874-16959,P,Trevor Bauer,11000,MIA@LAD,LAD,MIA,Healthy,No
5,58874-97300,P,John Means,11000,NYY@BAL,BAL,NYY,Healthy,No


In [176]:
fd.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 809 entries, 0 to 1021
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Id                809 non-null    object
 1   Position          809 non-null    object
 2   Name              809 non-null    object
 3   Salary            809 non-null    int64 
 4   Game              809 non-null    object
 5   Team              809 non-null    object
 6   Opponent          809 non-null    object
 7   Injury Indicator  809 non-null    object
 8   Probable Pitcher  809 non-null    object
dtypes: int64(1), object(8)
memory usage: 63.2+ KB


next step is to split into pithcers and batters

In [177]:
# split using .loc by position and make new dataframe for pitchers
pitchers = fd.loc[fd['Position']=='P']

In [178]:
# split using .loc by position and make new dataframe for batters
batters = fd.loc[fd['Position']!='P']

Now we have cleaned dataframes for each group. Now lets focus on pitchers to get projections added to dataframe.

First step for pitchers is to get the probable pitchers first, we only care about the ones that will start.

In [179]:
# save pitchers df to only starting pitchers
pitchers = pitchers.loc[pitchers['Probable Pitcher']=='Yes']

In [180]:
# review changes
pitchers.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 26 entries, 0 to 226
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Id                26 non-null     object
 1   Position          26 non-null     object
 2   Name              26 non-null     object
 3   Salary            26 non-null     int64 
 4   Game              26 non-null     object
 5   Team              26 non-null     object
 6   Opponent          26 non-null     object
 7   Injury Indicator  26 non-null     object
 8   Probable Pitcher  26 non-null     object
dtypes: int64(1), object(8)
memory usage: 2.0+ KB


Next step is to combine projections with 2021 stats.
- import testing data with projections from model
- merge the two data frames

In [181]:
# read in pitcher projections
pitcher_proj = pd.read_csv('../Projections/pitcher_projections_2021.csv')

Next step merge.

In [182]:
# merge attempt
pitcher_projections = pitchers.merge(pitcher_proj, how='left', on='Name')

Next step is to drop nulls from this list.  Will need to handle this in previous processes for final product.

In [183]:
pitcher_projections.dropna(inplace=True)

Next step is to clean up and only take what is needed from pitchers so that it can be merged with batters.
Columns needed:
- ID
- Position
- Name
- Salary
- Team
- Opponent
- Proj_FPPG

In [184]:
# overwrite df with only the columns needed
pitcher_projections = pitcher_projections[['Id', 'Position', 'Name', 'Salary', 'Team_x', 'Opponent', 'AVG', 'Projected_FPPG']]

In [185]:
# rename team column
pitcher_projections.rename(columns={'Team_x' : 'Team'}, inplace=True)

Now we have a cleaned and organized df of our pitchers that are starting. Next is the batters.

---

First step is to merge on projections.

In [186]:
# read in projections file
batter_21 = pd.read_csv('../Projections/batter_projections_2021.csv')

In [187]:
# merge projections with batter df, creating new df
batters_projections = batters.merge(batter_21, how='left', on='Name')

Will have to drop nulls for now, but need to figure out why there is no data for those batters.

In [188]:
# dropping batters with no projections
batters_projections.dropna(inplace=True)

In [189]:
batters_projections

Unnamed: 0,Id,Position,Name,Salary,Game,Team_x,Opponent,Injury Indicator,Probable Pitcher,Team_y,...,CS,TB,AVG,OBP,SLG,OPS,ISO,PTS,FPPG,Projected_FPPG
0,58874-38872,OF,Jesse Winker,4600,CIN@COL,CIN,COL,Healthy,No,CIN,...,0.0,65.0,0.359,0.421,0.631,1.052,0.272,361.1,13.888462,10.010649
1,58874-13342,OF,Nick Castellanos,4500,CIN@COL,CIN,COL,Healthy,No,CIN,...,0.0,74.0,0.314,0.357,0.627,0.984,0.313,405.8,13.993103,9.805777
2,58874-12933,OF,Mike Trout,4400,LAA@BOS,LAA,BOS,Healthy,No,LAA,...,0.0,72.0,0.365,0.484,0.692,1.176,0.327,433.4,13.980645,11.217342
3,58874-52158,SS,Trevor Story,4300,CIN@COL,COL,CIN,Healthy,No,COL,...,2.0,61.0,0.285,0.357,0.496,0.853,0.211,389.2,11.447059,8.355613
4,58874-82575,OF,Yordan Alvarez,4200,TEX@HOU,HOU,TEX,Healthy,No,HOU,...,0.0,67.0,0.345,0.378,0.609,0.987,0.264,362.7,12.953571,9.453620
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
364,58874-81869,C,Chad Wallach,2000,MIA@LAD,MIA,LAD,Healthy,No,MIA,...,0.0,16.0,0.255,0.275,0.340,0.615,0.085,77.9,4.582353,5.035409
365,58874-12237,C,Jonathan Lucroy,2000,WSH@ARI,WSH,ARI,Healthy,No,WSH,...,0.0,6.0,0.357,0.357,0.429,0.786,0.072,25.0,5.000000,6.212545
366,58874-102358,SS,Jose Devers,2000,MIA@LAD,MIA,LAD,Healthy,No,MIA,...,0.0,2.0,0.167,0.154,0.167,0.321,0.000,15.9,1.987500,2.197133
367,58874-102363,OF,Luke Raley,2000,MIA@LAD,LAD,MIA,Healthy,No,LAD,...,0.0,11.0,0.206,0.308,0.324,0.632,0.118,61.4,4.385714,5.508057


In [190]:
# drop unneeded columns for merge with pitcher, overwrite current df
batters_projections = batters_projections[['Id', 'Position', 'Name', 'Salary', 'Team_x', 'Opponent','Projected_FPPG']]

In [191]:
# rename team column
batters_projections.rename(columns={'Team_x' : 'Team'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


Now there are two clean dataframes with the same columns, now we will determine optimal line up.

Steps to get optimal line up:
1. set salary cap
2. sort by fppg

    a. choose pitcher - based on projected fppg; in final product need to incoporate stats with strong coefficient values
    b. choose batters based on remaining salary and highest fppg; in final product need to incoporate stats with strong coefficient values
    
3. create list to append choices to
4. save to one df for output
5. take df and save to templet for export

In [192]:
# set cap for fanduel
salary_cap = 35_000

In [193]:
# sort by avg
pitcher_projections.sort_values(by='AVG', ascending=False, inplace=True, ignore_index=True)

In [194]:
# create list of teams to filter batters
team_list = []
for x in range(0,4):
    team_list.append(pitcher_projections['Opponent'][x])

In [195]:
# drop avg
pitcher_projections.drop(columns='AVG', inplace=True)

In [196]:
# create team filter 
team_filter = (batters_projections['Team'] == team_list[0]) | (batters_projections['Team'] == team_list[1]) |(batters_projections['Team'] == team_list[2]) |(batters_projections['Team'] == team_list[3])
# new batter dataframe with team filter
batters_projections = batters_projections[team_filter]
# reset index
batters_projections.reset_index(drop=True, inplace=True)

In [197]:
# clean position
pos_list = [pos[:2] for pos in batters_projections['Position']]
batters_projections['Position'] = pos_list

In [198]:
# sort pitcher by fppg projections
pitcher_projections.sort_values(by='Projected_FPPG', ascending=False, inplace=True, ignore_index=True)

In [199]:
# create a player list to 
lineup = []
lineup.append(pitcher_projections.values[0])

In [200]:
# need to update remaining salary
salary_cap -= pitcher_projections['Salary'][0]

In [201]:
# with updated salary fill remaining roster based on position and highest fppg
# create position list for remaining roster spots
position_list = ['C', '1B', '2B', '3B', 'SS', 'OF', 'OF', 'OF']
# sort batters by FPPG
batters_projections.sort_values(by='Projected_FPPG', ascending=False, inplace=True, ignore_index=True)

In [202]:
# redo 
# create count based on remaining positions
sal_count = 8

# create average salary variable for remaining players
avg_sal = salary_cap/sal_count

# create for loop for each position in list to take highest fppg
for pos in position_list:
    # setting counter to increase if player is already in list
    # this is inside the for loop beacuse it needs to be per position
    counter = 0
    # if salary greater than average move to next player
    for salary in batters_projections.loc[batters_projections['Position'] == pos]['Salary']:
        # test if salary is greater than average if it is increase counter
        if salary > avg_sal:
            counter += 1
        else:
            # if less than average add player to list
            lineup.append(batters_projections.loc[batters_projections['Position'] == pos].values[counter])
            # drop player so no duplicates are added
            batters_projections.drop(batters_projections.loc[batters_projections['Position'] == pos].index.values[counter], inplace=True)
            # decrease sal_count
            sal_count -= 1
            # decrease salary cap
            salary_cap -= batters_projections.loc[batters_projections['Position'] == pos]['Salary'].values[counter]
            # create new average salary
            avg_sal = salary_cap/sal_count
            break


  avg_sal = salary_cap/sal_count


In [203]:
df = pd.DataFrame(lineup, columns=['Id', 'Position', 'Name', 'Salary', 'Team', 'Opponent','Proj_FPPG'])
df

Unnamed: 0,Id,Position,Name,Salary,Team,Opponent,Proj_FPPG
0,58874-79951,P,Nick Pivetta,8300,BOS,LAA,46.814891
1,58874-79918,C,Kyle Higashioka,2700,NYY,BAL,9.400274
2,58874-52175,1B,Matt Olson,3200,OAK,MIN,8.560223
3,58874-13771,2B,Josh Harrison,3100,WSH,ARI,7.543716
4,58874-68544,3B,Matt Chapman,2900,OAK,MIN,7.072461
5,58874-65881,SS,Trea Turner,4000,WSH,ARI,8.335203
6,58874-84133,OF,Franmil Reyes,3200,CLE,SEA,9.208146
7,58874-12302,OF,Giancarlo Stanton,3600,NYY,BAL,9.088541
8,58874-60643,OF,Aaron Judge,3300,NYY,BAL,8.133931
