# Problem

Updating data at end of February, some players aren't getting caught (e.g. Yu Darvish)

My goal is to:

1. Identify where Darvish is getting lost
2. Fix the problem

I'll load the necessary things:

In [6]:
from dataScraping import * 
import pandas as pd
import numpy as np
from sqlalchemy import create_engine
from sqlalchemy_utils import database_exists, create_database
import psycopg2
import math
from scipy.stats import zscore
from buildFeatureMatrix import *

I'm going to replicate the necessary steps for building out the features and see where Darvish disappears

First, I'll grab him using free agent data:

In [2]:
# Grab all free agents since 2006
all_years = list(range(2006,2018))
all_fa_data = getAllFAData(all_years)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  all_fa_contracts_real['Length'] = pd.to_numeric(all_fa_contracts_real['Length'])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  all_fa_contracts_real['Dollars'] = pd.to_numeric(all_fa_contracts_real['Dollars'].str.strip('$').str.replace(',',''))


Now I'll check to make sure he's there for the year 2017

In [4]:
all_fa_data[all_fa_data.nameLast == 'Darvish']

Unnamed: 0,Age,Full_Name,WAR_3,nameFirst,nameLast,Year,Dollars,Length,Name,Position
2070,31,Yu Darvish,6.5,Yu,Darvish,2017,126000000.0,6,Yu Darvish,P


Looks good, now I'm going to fast forward to loading from the database, as that's the next time I'd think he'd disappear

In [10]:
engine = db_connect()
pitching_df = createPitchingTable(engine)
pitching_fa = addFilterFreeAgents(pitching_df, engine)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[k1] = value[k2]


In [11]:
pitching_fa[pitching_fa.nameLast == 'Darvish']

Unnamed: 0,Age,WAR_3,nameFirst,nameLast,Year,Dollars,Length,Position,playerID,yearID,ERA,WHIP,K_9,HR_9,IPouts,W,SV
994,31,6.5,Yu,Darvish,2017,126000000.0,6,P,darviyu01,2017,-0.244162,-0.45139,0.656312,-0.109456,2.42751,1.714552,-0.271589


He's still there, so I haven't lost him yet. I'll do the next couple steps, though they shouldn't affect much:

In [13]:
pitching_war = allPositionWAR(pitching_fa, engine)
pitching_adjusted = addInflation(pitching_war, engine)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  inflation.loc[2017] = inflation.loc[2016] * (1 + per_year)


In [14]:
pitching_adjusted[pitching_adjusted.nameLast == 'Darvish']

Unnamed: 0,Age,WAR_3,nameFirst,nameLast,Year,Dollars,Length,Position,playerID,yearID,...,K_9,HR_9,IPouts,W,SV,Med_WAR,Min_WAR,Inflation_Factor,Total,Dollars_2006


Looks like this is the problem step; I'll just make sure the pitching_war step isn't the problem:

In [15]:
pitching_war[pitching_war.nameLast == 'Darvish']

Unnamed: 0,Age,WAR_3,nameFirst,nameLast,Year,Dollars,Length,Position,playerID,yearID,ERA,WHIP,K_9,HR_9,IPouts,W,SV,Med_WAR,Min_WAR


Okay, so this is the problem step. Somehow, Darvish is getting dropped out, so I'll now jump into that method

In [17]:
# Pull the data but drop the index
position_only_war = pullFullTable('position_team_war', engine).drop(['index'], axis = 1)
pitching_war = pullFullTable('pitcher_team_war', engine).drop(['index'], axis = 1)
    
# Put them together
position_war = pd.concat([position_only_war, pitching_war])
    
# Change the Year to "yearID"
position_war['yearID'] = position_war.Year
position_war = position_war.drop(['Year'], axis = 1)
    
# Create a dictionary for converting these to abbreviations
team_dict = {'Angels' : 'LAA', 'Astros' : 'HOU', 'Athletics' : 'OAK', 'Blue Jays' : 'TOR', 
                 'Braves' : 'ATL', 'Brewers': 'MIL', 'Cardinals' : 'STL', 'Cubs' : 'CHN',
                 'Diamondbacks' : 'ARI', 'Dodgers' : 'LAN', 'Giants' : 'SFN', 'Indians' : 'CLE',
                 'Mariners' : 'SEA', 'Marlins' : 'MIA', 'Mets' : 'NYN', 'Nationals' : 'WAS',
                 'Orioles' : 'BAL', 'Padres' : 'SDN', 'Phillies' : 'PHI', 'Pirates' : 'PIT', 
                 'Rangers' : 'TEX', 'Rays' : 'TBR', 'Red Sox' : 'BOS', 'Reds' : 'CIN', 
                 'Rockies' : 'COL', 'Royals' : 'KCR', 'Tigers' : 'DET', 'Twins' : 'MIN', 
                 'White Sox' : 'CHA', 'Yankees' : 'NYA'}
    
# Alter it to include WAR and Change the actual data frame
team_dict = {key : value + "_WAR" for key, value in team_dict.items()}
position_war = position_war.rename(columns = team_dict)
    
# Create stats for non-position/Year categories
position_war['Med_WAR'] = position_war.drop(['yearID', 'Position'], axis = 1).median(axis = 1)
position_war['Min_WAR'] = position_war.drop(['yearID', 'Position'], axis = 1).min(axis = 1)

# Shrink to only the Year/Position/Median/Min WAR stats
position_war_small = position_war[['yearID', 'Position', 'Med_WAR', 'Min_WAR']]
    

This will get merged with the pitching_fa data....it's doing so on position and yearID

--> The problem here is the position; Darvish is a "P", not an 'SP'....this is really annoying

I'd wager is this also the case for Arrieta:

In [18]:
pitching_fa[pitching_fa.nameLast == 'Arrieta']

Unnamed: 0,Age,WAR_3,nameFirst,nameLast,Year,Dollars,Length,Position,playerID,yearID,ERA,WHIP,K_9,HR_9,IPouts,W,SV
1052,32,15.1,Jake,Arrieta,2017,,0,P,arrieja01,2017,-0.288174,-0.389471,0.224369,-0.154607,2.083104,2.727146,-0.271589


Alright, so when did THIS crap happen? Apparently now ESPN has "P" as position for some pitchers. I think I will.....

Just convert all "P" to "SP for now? I think that's a quick fix. Unfortunate, but that's what I'll have to do