# Evaluating different projections of MLB statistics for 2018 season
##### The purpose of this project was to evaluate how well different models predicted a player's statistics for the 2018 MLB season. There are 6 different models evaluated in this project- ATC, Depth Charts, Fangraphs, Steamer, The Bat, ZiPS, and an average of those 5 projections. Raw data of each projection was downloaded via fangraphs.com and compressed into one standardized spreadsheet. Exact number of projections may vary from player to player as some projections excluded certain players from their projections.
##### This project is not fully complete as the 2018 MLB season has not yet completed. Currently, this workbook does some data cleansing, then data manipulation to calculate average projections before adding the average projection for each player to the table of projections. Upon completion of the 2018 MLB season, this workbook will be updated with each player's actual statistics from the season, along with some work and visualization to see how well each projection model did at predicting players' stats.

In [361]:
import pandas as pd
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', 25)

### Load file

In [362]:
projections = pd.read_csv("2018-MLB-Projections-Hitters.csv")

In [363]:
projections.head(10)

Unnamed: 0,Name,Team,Type,G,PA,AB,H,2B,3B,HR,R,RBI,BB,SO,HBP,SB,CS,AVG,OBP,SLG,OPS,wOBA,Fld,BsR,WAR,ADP,playerid
0,A.J. Ellis,Padres,ATC,32,114,99,22,4,0,3,9,10,12,20,,0,0,0.218,0.312,0.349,0.661,0.292,,,,999.0,5677
1,A.J. Ellis,Padres,The Bat,40,158,138,33,6,1,4,16,16,16,30,,1,0,0.237,0.32,0.372,0.692,0.301,,,,999.0,5677
2,A.J. Ellis,Padres,Depth Charts,38,160,140,31,6,0,4,15,17,15,30,3.0,1,0,0.223,0.309,0.361,0.67,0.293,1.1,-0.7,0.5,999.0,5677
3,A.J. Ellis,Padres,Steamer,33,129,112,26,5,0,4,13,13,14,25,2.0,1,1,0.234,0.323,0.379,0.702,0.307,-0.8,1.2,0.6,999.0,5677
4,A.J. Ellis,Padres,ZiPS,50,155,137,29,6,0,4,13,16,12,28,4.0,0,0,0.212,0.294,0.343,0.637,0.279,1.0,-0.1,0.3,999.0,5677
5,A.J. Pollock,Diamondbacks,ATC,141,586,533,150,36,7,16,88,60,45,86,,28,7,0.281,0.341,0.463,0.803,0.342,,,,68.2,9256
6,A.J. Pollock,Diamondbacks,The Bat,143,619,557,155,36,5,18,83,76,53,92,,22,7,0.278,0.344,0.456,0.801,0.342,,,,68.2,9256
7,A.J. Pollock,Diamondbacks,Depth Charts,147,616,557,160,38,6,19,92,68,48,92,5.0,26,9,0.287,0.347,0.48,0.827,0.351,4.4,1.9,3.6,68.2,9256
8,A.J. Pollock,Diamondbacks,Fangraphs,130,591,534,152,35,5,16,96,64,49,89,4.0,26,5,0.285,0.347,0.459,0.806,0.344,5.0,2.7,3.2,68.2,9256
9,A.J. Pollock,Diamondbacks,Steamer,124,559,500,141,33,5,17,80,61,47,85,4.0,22,9,0.282,0.346,0.469,0.815,0.347,1.2,2.3,2.9,68.2,9256


In [364]:
projections.columns

Index(['Name', 'Team', 'Type', 'G', 'PA', 'AB', 'H', '2B', '3B', 'HR', 'R',
       'RBI', 'BB', 'SO', 'HBP', 'SB', 'CS', 'AVG', 'OBP', 'SLG', 'OPS',
       'wOBA', 'Fld', 'BsR', 'WAR', 'ADP', 'playerid'],
      dtype='object')

### Find columns with NaNs and remove

In [365]:
projections.columns[projections.isna().any()].tolist()

['HBP', 'Fld', 'BsR', 'WAR']

In [366]:
projections = projections.drop(["HBP", "Fld", "BsR", "WAR"], axis=1)

In [367]:
projections.columns[projections.isna().any()].tolist()

[]

In [368]:
projections.columns

Index(['Name', 'Team', 'Type', 'G', 'PA', 'AB', 'H', '2B', '3B', 'HR', 'R',
       'RBI', 'BB', 'SO', 'SB', 'CS', 'AVG', 'OBP', 'SLG', 'OPS', 'wOBA',
       'ADP', 'playerid'],
      dtype='object')

In [369]:
projections.head()

Unnamed: 0,Name,Team,Type,G,PA,AB,H,2B,3B,HR,R,RBI,BB,SO,SB,CS,AVG,OBP,SLG,OPS,wOBA,ADP,playerid
0,A.J. Ellis,Padres,ATC,32,114,99,22,4,0,3,9,10,12,20,0,0,0.218,0.312,0.349,0.661,0.292,999.0,5677
1,A.J. Ellis,Padres,The Bat,40,158,138,33,6,1,4,16,16,16,30,1,0,0.237,0.32,0.372,0.692,0.301,999.0,5677
2,A.J. Ellis,Padres,Depth Charts,38,160,140,31,6,0,4,15,17,15,30,1,0,0.223,0.309,0.361,0.67,0.293,999.0,5677
3,A.J. Ellis,Padres,Steamer,33,129,112,26,5,0,4,13,13,14,25,1,1,0.234,0.323,0.379,0.702,0.307,999.0,5677
4,A.J. Ellis,Padres,ZiPS,50,155,137,29,6,0,4,13,16,12,28,0,0,0.212,0.294,0.343,0.637,0.279,999.0,5677


### Create list of all player names that have projections, plus dictionary storing the number of different projections for each name
##### To make the average projections more worthwhile, I will only use players that had at least 3 projections

In [370]:
names = []
nameCounter = {}
for index, row in projections.iterrows():
    name = row["Name"]
    if name not in names:
        names.append(name)
        nameCounter[name] = 1
    else:
        nameCounter[name] += 1
print(len(names))

712


In [371]:
nameCounter

{'A.J. Ellis': 5,
 'A.J. Pollock': 6,
 'A.J. Reed': 5,
 'Aaron Altherr': 6,
 'Aaron Hicks': 6,
 'Aaron Judge': 6,
 'Abiatal Avelino': 1,
 'Adalberto Mondesi': 6,
 'Adam Duvall': 6,
 'Adam Eaton': 6,
 'Adam Engel': 5,
 'Adam Frazier': 6,
 'Adam Jones': 6,
 'Adam Moore': 1,
 'Adam Rosales': 2,
 'Addison Russell': 6,
 'Adeiny Hechavarria': 5,
 'Adolis Garcia': 1,
 'Adrian Beltre': 6,
 'Adrian Sanchez': 2,
 'Albert Almora Jr.': 6,
 'Albert Pujols': 6,
 'Alberto Rosario': 2,
 'Alcides Escobar': 6,
 'Aledmys Diaz': 6,
 'Alen Hanson': 2,
 'Alex Avila': 6,
 'Alex Blandino': 1,
 'Alex Bregman': 6,
 'Alex Dickerson': 2,
 'Alex Gordon': 6,
 'Alex Mejia': 2,
 'Alex Verdugo': 5,
 'Alfredo Gonzalez': 1,
 'Ali Solis': 1,
 'Allen Cordoba': 5,
 'Allen Craig': 1,
 'Amed Rosario': 6,
 'Andrelton Simmons': 6,
 'Andres Blanco': 2,
 'Andrew Benintendi': 6,
 'Andrew Knapp': 5,
 'Andrew McCutchen': 6,
 'Andrew Romine': 5,
 'Andrew Stevenson': 5,
 'Andrew Susac': 5,
 'Andrew Toles': 5,
 'Andy Wilkins': 1,
 'An

### Select names with 3+ projections

In [372]:
# create a list to store the names of players without 3+ projections. Add those players to that list, and remove those players from the original list of names
notEnoughEntries = []
for name in nameCounter:
    if nameCounter[name] < 3:
        notEnoughEntries.append(name)
        names.remove(name)
notEnoughEntries

['Abiatal Avelino',
 'Adam Moore',
 'Adam Rosales',
 'Adolis Garcia',
 'Adrian Sanchez',
 'Alberto Rosario',
 'Alen Hanson',
 'Alex Blandino',
 'Alex Dickerson',
 'Alex Mejia',
 'Alfredo Gonzalez',
 'Ali Solis',
 'Allen Craig',
 'Andres Blanco',
 'Andy Wilkins',
 'Anthony Recker',
 'Aramis Garcia',
 'Arismendy Alcantara',
 'Aristides Aquino',
 'Audry Perez',
 'Austin Dean',
 'Austin Slater',
 'Austin Wynns',
 'Beau Taylor',
 'Billy Burns',
 'Billy McKinney',
 'Blake Trahan',
 'Bobby Wilson',
 'Brandon Barnes',
 'Brandon Lowe',
 'Brandon Snyder',
 'Brett Eibner',
 'Brett Nicholas',
 'Brett Phillips',
 'Breyvic Valera',
 'Brock Stassi',
 'Bryan Holaday',
 'Bryce Brentz',
 'Cam Perkins',
 'Cameron Rupp',
 'Carlos Moncrief',
 'Carlos Perez',
 'Carlos Rivero',
 'Cedric Mullins II',
 'Cesar Puello',
 'Chad Huffman',
 'Charlie Tilson',
 "Chase d'Arnaud",
 'Chris Coghlan',
 'Chris Dominguez',
 'Chris Herrmann',
 'Chris Marrero',
 'Chris Shaw',
 'Chris Stewart',
 'Christian Colon',
 'Christian 

In [373]:
names

['A.J. Ellis',
 'A.J. Pollock',
 'A.J. Reed',
 'Aaron Altherr',
 'Aaron Hicks',
 'Aaron Judge',
 'Adalberto Mondesi',
 'Adam Duvall',
 'Adam Eaton',
 'Adam Engel',
 'Adam Frazier',
 'Adam Jones',
 'Addison Russell',
 'Adeiny Hechavarria',
 'Adrian Beltre',
 'Albert Almora Jr.',
 'Albert Pujols',
 'Alcides Escobar',
 'Aledmys Diaz',
 'Alex Avila',
 'Alex Bregman',
 'Alex Gordon',
 'Alex Verdugo',
 'Allen Cordoba',
 'Amed Rosario',
 'Andrelton Simmons',
 'Andrew Benintendi',
 'Andrew Knapp',
 'Andrew McCutchen',
 'Andrew Romine',
 'Andrew Stevenson',
 'Andrew Susac',
 'Andrew Toles',
 'Anthony Alford',
 'Anthony Rendon',
 'Anthony Rizzo',
 'Anthony Santander',
 'Asdrubal Cabrera',
 'Austin Barnes',
 'Austin Hays',
 'Austin Hedges',
 'Austin Jackson',
 'Austin Meadows',
 'Austin Romine',
 'Avisail Garcia',
 'Ben Gamel',
 'Ben Zobrist',
 'Billy Hamilton',
 'Blake Swihart',
 'Boog Powell',
 'Bradley Zimmer',
 'Brandon Belt',
 'Brandon Crawford',
 'Brandon Dixon',
 'Brandon Drury',
 'Brandon

### Remove projections of players that have less than 3 different projections

In [374]:
for name in notEnoughEntries:
    projections = projections[projections.Name != name]

In [375]:
projections

Unnamed: 0,Name,Team,Type,G,PA,AB,H,2B,3B,HR,R,RBI,BB,SO,SB,CS,AVG,OBP,SLG,OPS,wOBA,ADP,playerid
0,A.J. Ellis,Padres,ATC,32,114,99,22,4,0,3,9,10,12,20,0,0,0.218,0.312,0.349,0.661,0.292,999.0,5677
1,A.J. Ellis,Padres,The Bat,40,158,138,33,6,1,4,16,16,16,30,1,0,0.237,0.320,0.372,0.692,0.301,999.0,5677
2,A.J. Ellis,Padres,Depth Charts,38,160,140,31,6,0,4,15,17,15,30,1,0,0.223,0.309,0.361,0.670,0.293,999.0,5677
3,A.J. Ellis,Padres,Steamer,33,129,112,26,5,0,4,13,13,14,25,1,1,0.234,0.323,0.379,0.702,0.307,999.0,5677
4,A.J. Ellis,Padres,ZiPS,50,155,137,29,6,0,4,13,16,12,28,0,0,0.212,0.294,0.343,0.637,0.279,999.0,5677
5,A.J. Pollock,Diamondbacks,ATC,141,586,533,150,36,7,16,88,60,45,86,28,7,0.281,0.341,0.463,0.803,0.342,68.2,9256
6,A.J. Pollock,Diamondbacks,The Bat,143,619,557,155,36,5,18,83,76,53,92,22,7,0.278,0.344,0.456,0.801,0.342,68.2,9256
7,A.J. Pollock,Diamondbacks,Depth Charts,147,616,557,160,38,6,19,92,68,48,92,26,9,0.287,0.347,0.480,0.827,0.351,68.2,9256
8,A.J. Pollock,Diamondbacks,Fangraphs,130,591,534,152,35,5,16,96,64,49,89,26,5,0.285,0.347,0.459,0.806,0.344,68.2,9256
9,A.J. Pollock,Diamondbacks,Steamer,124,559,500,141,33,5,17,80,61,47,85,22,9,0.282,0.346,0.469,0.815,0.347,68.2,9256


### Create an average of all projections for each player and add to table

In [376]:
cols = projections.columns.values.tolist()

for name in names:
    currentPlayer = projections.loc[projections['Name'] == name]
    # set name, team, projection type
    playerName = currentPlayer["Name"].values.tolist()[0]
    team = currentPlayer["Team"].values.tolist()[0]
    projectionType = "A-Average" # the name 'A-Average' is solely for sorting purposes- want the average projections to appear first later on
    # calculate average games
    g = currentPlayer["G"].values.tolist()
    total = 0
    for i in g:
        total = total + i
    average = total / len(g)
    g = int(round(average))
    # calculate average plate appearances
    pa = currentPlayer["PA"].values.tolist()
    total = 0
    for i in pa:
        total = total + i
    average = total / len(pa)
    pa = int(round(average))
    # calculate average at bats
    ab = currentPlayer["AB"].values.tolist()
    total = 0
    for i in ab:
        total = total + i
    average = total / len(ab)
    ab = int(round(average))
    # calculate average hits
    h = currentPlayer["H"].values.tolist()
    total = 0
    for i in h:
        total = total + i
    average = total / len(h)
    h = int(round(average))
    # calculate average doubles
    doubles = currentPlayer["2B"].values.tolist()
    total = 0
    for i in doubles:
        total = total + i
    average = total / len(doubles)
    doubles = int(round(average))
    # calculate average triples
    triples = currentPlayer["3B"].values.tolist()
    total = 0
    for i in triples:
        total = total + i
    average = total / len(triples)
    triples = int(round(average))
    # calculate average home runs
    hr = currentPlayer["HR"].values.tolist()
    total = 0
    for i in hr:
        total = total + i
    average = total / len(hr)
    hr = int(round(average))
    # calculate average runs
    r = currentPlayer["R"].values.tolist()
    total = 0
    for i in r:
        total = total + i
    average = total / len(r)
    r = int(round(average))
    # calculate average runs batted in
    rbi = currentPlayer["RBI"].values.tolist()
    total = 0
    for i in rbi:
        total = total + i
    average = total / len(rbi)
    rbi = int(round(average))
    # calculate average walks
    bb = currentPlayer["BB"].values.tolist()
    total = 0
    for i in bb:
        total = total + i
    average = total / len(bb)
    bb = int(round(average))
    # calculate average strikeouts
    so = currentPlayer["SO"].values.tolist()
    total = 0
    for i in so:
        total = total + i
    average = total / len(so)
    so = int(round(average))
    # calculate average steals
    sb = currentPlayer["SB"].values.tolist()
    total = 0
    for i in sb:
        total = total + i
    average = total / len(sb)
    sb = int(round(average))
    # calculate average caught stealing
    cs = currentPlayer["CS"].values.tolist()
    total = 0
    for i in cs:
        total = total + i
    average = total / len(cs)
    cs = int(round(average))
    # calculate average batting average
    avg = currentPlayer["AVG"].values.tolist()
    total = 0
    for i in avg:
        total = total + i
    average = total / len(avg)
    avg = round(average, 3)
    # calculate average on base percentage
    obp = currentPlayer["OBP"].values.tolist()
    total = 0
    for i in obp:
        total = total + i
    average = total / len(obp)
    obp = round(average, 3)
    # calculate average slugging percentage
    slg = currentPlayer["SLG"].values.tolist()
    total = 0
    for i in slg:
        total = total + i
    average = total / len(slg)
    slg = round(average, 3)
    # calculate average on base plus slugging
    ops = currentPlayer["OPS"].values.tolist()
    total = 0
    for i in ops:
        total = total + i
    average = total / len(ops)
    ops = round(average, 3)
    # calculate average weighted on base average
    woba = currentPlayer["wOBA"].values.tolist()
    total = 0
    for i in woba:
        total = total + i
    average = total / len(woba)
    woba = round(average, 3)
    #set average draft position, player ID
    adp = currentPlayer["ADP"].values.tolist()[0]
    playerID = currentPlayer["playerid"].values.tolist()[0]
    
    #create dataframe of new entry with average projections
    newEntry = [playerName, team, projectionType, g, pa, ab, h, doubles, triples, hr, r, rbi, bb, so, sb, cs, avg, obp, slg, ops, woba, adp, playerID]
    newEntry = pd.DataFrame([newEntry], columns=cols)
    
    #append new entry to projections
    projections = projections.append(newEntry)

### Sort the table by name, team, and projection type for easier comparisons

In [377]:
 projections = projections.sort_values(by=["Name", "Team", "Type"])

In [378]:
projections

Unnamed: 0,Name,Team,Type,G,PA,AB,H,2B,3B,HR,R,RBI,BB,SO,SB,CS,AVG,OBP,SLG,OPS,wOBA,ADP,playerid
0,A.J. Ellis,Padres,A-Average,39,143,125,28,5,0,4,13,14,14,27,1,0,0.225,0.312,0.361,0.672,0.294,999.0,5677
0,A.J. Ellis,Padres,ATC,32,114,99,22,4,0,3,9,10,12,20,0,0,0.218,0.312,0.349,0.661,0.292,999.0,5677
2,A.J. Ellis,Padres,Depth Charts,38,160,140,31,6,0,4,15,17,15,30,1,0,0.223,0.309,0.361,0.670,0.293,999.0,5677
3,A.J. Ellis,Padres,Steamer,33,129,112,26,5,0,4,13,13,14,25,1,1,0.234,0.323,0.379,0.702,0.307,999.0,5677
1,A.J. Ellis,Padres,The Bat,40,158,138,33,6,1,4,16,16,16,30,1,0,0.237,0.320,0.372,0.692,0.301,999.0,5677
4,A.J. Ellis,Padres,ZiPS,50,155,137,29,6,0,4,13,16,12,28,0,0,0.212,0.294,0.343,0.637,0.279,999.0,5677
0,A.J. Pollock,Diamondbacks,A-Average,135,580,524,149,35,6,17,86,64,46,86,24,7,0.284,0.345,0.470,0.815,0.347,68.2,9256
5,A.J. Pollock,Diamondbacks,ATC,141,586,533,150,36,7,16,88,60,45,86,28,7,0.281,0.341,0.463,0.803,0.342,68.2,9256
7,A.J. Pollock,Diamondbacks,Depth Charts,147,616,557,160,38,6,19,92,68,48,92,26,9,0.287,0.347,0.480,0.827,0.351,68.2,9256
8,A.J. Pollock,Diamondbacks,Fangraphs,130,591,534,152,35,5,16,96,64,49,89,26,5,0.285,0.347,0.459,0.806,0.344,68.2,9256


### Once the 2018 MLB season has completed, this workbook will be updated to include actual player stats from the season, along with additional computations and visualizations of how accurate each projection model was

##### Notes to self-
##### Compare each projection to the player's actual stat line, see which projections tended to be most accurate
##### Consider reducing the scope of players to those that were going to play a full season- Results could be skewed from someone like Byron Buxton who people reasonably expected to play a full season but ended up not having much time in the MLB. Limit comparisons to players that played at least 100(?) games this season and reasonably were expected to play that many?
##### Average how far each projection model was off for all players- i.e. if Buxton's ZiPS projected 5 steals and he actually had 10, and Brian Dozier's ZiPS projected 3 steals and he actually had 7, overall ZiPS projection was off by 7 steals- Think more if this would be worthwhile to look at league-wide, or for players with 100+ games played
##### Consider repeating this project for 2017, 2016, 2015, etc data to get an idea of historical accuracy for each predictive model