# Sabermetrics: Computing Park Factors<br>Accounting for Road Schedule


&#128308; First Draft

This notebook will continue where the previous notebook left off by adjusting the Park Factor for each park to account for each team's road schedule.

## Case Example: Fenway Park 2019
All park factors for all stadiums for all years will be calculated, paying particular attention to a specific example, Fenway Park in Boston in 2019.

## Park Factor Refinements
There are several refinements that can be made.

**Home Team not playing in Home Park**  
The Park Factor was created to measure the affect of each park on baseball statistics.  The numerator in the PF formula is not for games played as the home team, but for games played at the home park.  Usually these are the same, but not always, as was the case for Boston in London in 2019.

This was discussed in the previous notebook and will be applied here.

**Road Games are not at Parks with PF = 1.0**  
Each team's opponent schedule is not uniform.  Teams in the same division play each other more than teams in other divisions.  Only a few interleague games are played.  The basic PF formula assumes that the average PF is 1.0 for all road games, but this is not the case.

One way to account for this is to use the basic PF formula to get a PF per park, and then adjust the runs scored on the road by using the PF appropriate for each road game.  The adjusted run total on the road can then be used to compute a new PF per park.  This process can be repeated.  The result being that each team's road schedule is taken into account when computing their home park factor.

This will be the approach taken in this notebook.

# Strategy to Find Park Factor
The strategy to compute the Park Factor, accounting for each teams road schedule, is:
* Compute a **home_parks** dataframe:
  * one row per team per year
  * has: home_park_id, RS, RA, G, and initial PF=1.0
* Compute a **road_parks** dataframe:
  * one row per team per year per road park
  * has: road_park_id, RS, RA, G and PF
* Compute a **road_totals** dataframe:
  * sum **road_parks** over all parks to get the road total per team per year
  * the sum will use the PF adjusted runs per road game
* Compute the PF per park using the home_parks and road_totals dataframes
  * update the home_parks PF
* Repeat
  * update road_parks dataframe with the new PF
  * recompute the sum of the PF adjusted runs per road game
  * recompute the PF and update home_parks
* Web Scrape the Park Factors from Fangraphs and compare

In [1]:
import os
import pandas as pd
import numpy as np
from pathlib import Path
import re
from scipy.stats import linregress

In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

In [3]:
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 100 # increase dpi, will make figures larger and clearer

In [4]:
import sys

# import data_helper.py from download_scripts directory
sys.path.append('../download_scripts')
import data_helper as dh

In [5]:
pd.set_option("display.max_columns", 100)

In [6]:
data_dir = Path('../data')
lahman_data = data_dir.joinpath('lahman/wrangled').resolve()
retrosheet_data = data_dir.joinpath('retrosheet/wrangled').resolve()

In [7]:
# select a subset of the available fields for team_game
cols = ['game_id', 'year', 'at_home', 'team_id', 'opponent_team_id', 'r']
team_game = dh.from_csv_with_types(retrosheet_data / 'team_game.csv.gz', usecols=cols)

In [8]:
cols = ['game_id', 'park_id']
game = dh.from_csv_with_types(retrosheet_data / 'game.csv.gz', usecols=cols)

In [9]:
cols = ['park_id', 'name', 'city', 'state', 'start', 'end', 'league']
parks = dh.from_csv_with_types(retrosheet_data / 'parks.csv', usecols=cols)

In [10]:
cols = ['team_id', 'year', 'name']
teams = dh.from_csv_with_types(retrosheet_data / 'teams.csv', usecols=cols)

# Create home_parks DataFrame

In [11]:
# TEMPORAILY use from 2015 on
# Focus on just the fields needed
tg = team_game.query('year >= 2015')

# bring in the park_id field from game
tg_park = tg.merge(game)
tg_park.head(2)

Unnamed: 0,game_id,at_home,team_id,opponent_team_id,r,year,park_id
0,ANA201504100,True,ANA,KCA,2,2015,ANA01
1,ANA201504100,False,KCA,ANA,4,2015,ANA01


In [12]:
# compute RS per team per year per park
rs = tg_park.groupby(['team_id', 'year', 'park_id']).agg(
    rs=('r', 'sum'), games=('r', 'count')).reset_index()
rs.head(3)

Unnamed: 0,team_id,year,park_id,rs,games
0,ANA,2015,ANA01,320.0,81
1,ANA,2015,ARL02,69.0,10
2,ANA,2015,BAL12,9.0,3


In [13]:
# Compute RA per team per year per park
ra = tg_park.groupby(['opponent_team_id', 'year', 'park_id']).agg(
    ra=('r', 'sum'), games=('r', 'count')).reset_index()
ra.head(3)

Unnamed: 0,opponent_team_id,year,park_id,ra,games
0,ANA,2015,ANA01,298.0,81
1,ANA,2015,ARL02,46.0,10
2,ANA,2015,BAL12,5.0,3


RS and RA have now been computed for each team for each park.  Merge the two dataframes.

In [14]:
# rename the axis to allow for join
ra = ra.rename(columns={'opponent_team_id':'team_id'})

# join to have RS and RA in the same dataframe
rt = rs.merge(ra, 
              left_on=['team_id', 'year', 'park_id'], 
              right_on=['team_id', 'year', 'park_id'],
              suffixes=('_rs', '_ra'))
rt.head()

Unnamed: 0,team_id,year,park_id,rs,games_rs,ra,games_ra
0,ANA,2015,ANA01,320.0,81,298.0,81
1,ANA,2015,ARL02,69.0,10,46.0,10
2,ANA,2015,BAL12,9.0,3,5.0,3
3,ANA,2015,BOS07,16.0,3,19.0,3
4,ANA,2015,CHI12,4.0,3,14.0,3


In [15]:
# Add a rt column and just use games instead of games_rs and games_ra
rt['rt'] = rt['rs'] + rt['ra']
rt['games'] = rt['games_rs']
rt = rt.drop(columns=['games_rs', 'games_ra'])
rt.head(3)

Unnamed: 0,team_id,year,park_id,rs,ra,rt,games
0,ANA,2015,ANA01,320.0,298.0,618.0,81
1,ANA,2015,ARL02,69.0,46.0,115.0,10
2,ANA,2015,BAL12,9.0,5.0,14.0,3


Rank each park by games played at that park and year.

The home park has rank == 1.  
The road parks have rank > 1.

In [16]:
# per team_id per year, rank by the number of batting outs
rt['rank'] = rt.groupby(['team_id', 'year'])['games'].rank(method='first', ascending=False)
rt.head(3)

Unnamed: 0,team_id,year,park_id,rs,ra,rt,games,rank
0,ANA,2015,ANA01,320.0,298.0,618.0,81,1.0
1,ANA,2015,ARL02,69.0,46.0,115.0,10,2.0
2,ANA,2015,BAL12,9.0,5.0,14.0,3,9.0


In [17]:
# rank == 1 identifies each team's home park
home_parks = rt.query('rank == 1').copy()

# compute the average total runs at home per game
home_parks['r_avg'] = home_parks['rt'] / home_parks['games']

# assign and initial PF of 1.0
home_parks['pf'] = 1.0

# rank no longer needed
home_parks = home_parks.drop(columns=['rank'])
home_parks = home_parks.set_index(['team_id', 'year'])
home_parks.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
ANA,2015,ANA01,320.0,298.0,618.0,81,7.62963,1.0
ANA,2016,ANA01,337.0,351.0,688.0,81,8.493827,1.0
ANA,2017,ANA01,356.0,335.0,691.0,81,8.530864,1.0
ANA,2018,ANA01,355.0,355.0,710.0,81,8.765432,1.0
ANA,2019,ANA01,385.0,411.0,796.0,79,10.075949,1.0


In [18]:
# example home parks for Boston
home_parks.query('team_id == "BOS" and year==2019')

Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
BOS,2019,BOS07,431.0,410.0,841.0,79,10.64557,1.0


We see that 79 games were played at Fenway and 841 runs in total were scored at Fenway.

# Create road_parks DataFrame

In [19]:
# rank not needed after query
road_parks = rt.query('rank != 1').copy()
road_parks = road_parks.drop(columns=['rank'])
road_parks.head(3)

Unnamed: 0,team_id,year,park_id,rs,ra,rt,games
1,ANA,2015,ARL02,69.0,46.0,115.0,10
2,ANA,2015,BAL12,9.0,5.0,14.0,3
3,ANA,2015,BOS07,16.0,19.0,35.0,3


In [20]:
# for each road park, get the pf from the home_parks dataframe
home_parks = home_parks.reset_index()
cols = ['park_id', 'year', 'pf']
road_parks = road_parks.merge(home_parks[cols], 
                 how='left',  # left join
                 left_on=['park_id', 'year'], 
                 right_on=['park_id', 'year'])
home_parks = home_parks.set_index(['team_id', 'year'])
road_parks.head(3)

Unnamed: 0,team_id,year,park_id,rs,ra,rt,games,pf
0,ANA,2015,ARL02,69.0,46.0,115.0,10,1.0
1,ANA,2015,BAL12,9.0,5.0,14.0,3,1.0
2,ANA,2015,BOS07,16.0,19.0,35.0,3,1.0


In [21]:
# there are some road parks that are not anyone's home park
missing_parks = list(set(road_parks['park_id'].unique()) -set(home_parks['park_id'].unique()))
missing_parks

['WIL02', 'FTB01', 'LON01', 'OMA01', 'SJU01', 'MNT01']

In [22]:
# join with parks to get more info about these
missing_parks = pd.Series(missing_parks, name='park_id')
parks.merge(missing_parks)

Unnamed: 0,park_id,name,city,state,start,end,league
0,FTB01,Fort Bragg Field,Fort Bragg,NC,2016-07-03,2016-07-03,NL
1,LON01,London Stadium,London,UK,2019-06-29,2019-06-30,AL
2,MNT01,Estadio Monterrey,Monterrey,MX,1996-08-16,1999-04-04,NL
3,OMA01,TD Ameritrade Park,Omaha,NE,2019-06-13,2019-06-13,KC1
4,SJU01,Estadio Hiram Bithorn,San Juan,PR,2001-04-01,2010-06-30,NL
5,WIL02,BB&T Ballpark at Bowman Field,Williamsport,PA,2017-08-20,2017-08-20,NL


## Remove Games Where Home Team is Not Playing at Home

In [23]:
# the left join above created records for which there was no PF
road_parks.isna().sum()

team_id     0
year        0
park_id     0
rs          0
ra          0
rt          0
games       0
pf         18
dtype: int64

In [24]:
road_parks = road_parks.dropna()

In [25]:
# create adjusted road runs based on each park's pf
road_parks['rs_adj'] = road_parks['rs'] / road_parks['pf']
road_parks['ra_adj'] = road_parks['ra'] / road_parks['pf']
road_parks['pf_games'] = road_parks['pf'] * road_parks['games']  # to compute weighted avg of road pf
road_parks.head(3)

Unnamed: 0,team_id,year,park_id,rs,ra,rt,games,pf,rs_adj,ra_adj,pf_games
0,ANA,2015,ARL02,69.0,46.0,115.0,10,1.0,69.0,46.0,10.0
1,ANA,2015,BAL12,9.0,5.0,14.0,3,1.0,9.0,5.0,3.0
2,ANA,2015,BOS07,16.0,19.0,35.0,3,1.0,16.0,19.0,3.0


## Compute Road Totals

In [26]:
road_totals = road_parks.groupby(['team_id', 'year']).agg(
    rs_adj=('rs_adj', 'sum'), games=('games', 'sum'),
    ra_adj=('ra_adj', 'sum'), pf_adj_sum=('pf_games', 'sum'))
road_totals['rt_adj'] = road_totals['rs_adj'] + road_totals['ra_adj']
road_totals['r_avg_adj'] = road_totals['rt_adj'] / road_totals['games']
road_totals['pf_avg_road'] = road_totals['pf_adj_sum'] / road_totals['games']  # weighted avg of road pf

road_totals.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,rs_adj,games,ra_adj,pf_adj_sum,rt_adj,r_avg_adj,pf_avg_road
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
ANA,2015,341.0,81,377.0,81.0,718.0,8.864198,1.0
ANA,2016,380.0,81,376.0,81.0,756.0,9.333333,1.0
ANA,2017,354.0,81,374.0,81.0,728.0,8.987654,1.0
ANA,2018,366.0,81,367.0,81.0,733.0,9.049383,1.0
ANA,2019,378.0,81,433.0,81.0,811.0,10.012346,1.0


## Compute Park Factor

In [27]:
# home_parks = home_parks.set_index(['team_id', 'year'])
home_parks['pf'] = home_parks['r_avg'] / road_totals['r_avg_adj']
home_parks['pf_avg_road'] = road_totals['pf_avg_road']
# home_parks = home_parks.reset_index()

In [28]:
home_parks.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf,pf_avg_road
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ANA,2015,ANA01,320.0,298.0,618.0,81,7.62963,0.860724,1.0
ANA,2016,ANA01,337.0,351.0,688.0,81,8.493827,0.910053,1.0
ANA,2017,ANA01,356.0,335.0,691.0,81,8.530864,0.949176,1.0
ANA,2018,ANA01,355.0,355.0,710.0,81,8.765432,0.968622,1.0
ANA,2019,ANA01,385.0,411.0,796.0,79,10.075949,1.006353,1.0


In [29]:
home_parks.query('team_id == "BOS" and year==2019').round(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf,pf_avg_road
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BOS,2019,BOS07,431.0,410.0,841.0,79,10.646,1.029,1.0


## Repeat the Process to Better Adjust for Road Schedule

In [30]:
# repeat 10 more times
for i in range(10):
    road_parks = road_parks.drop(columns=['pf'])

    home_parks = home_parks.reset_index()
    cols = ['park_id', 'year', 'pf']
    road_parks = road_parks.merge(home_parks[cols], 
                     how='left',  # left join
                     left_on=['park_id', 'year'], 
                     right_on=['park_id', 'year'])
    home_parks = home_parks.set_index(['team_id', 'year'])

    # create adjusted road runs based on each park's pf
    road_parks['rs_adj'] = road_parks['rs'] / road_parks['pf']
    road_parks['ra_adj'] = road_parks['ra'] / road_parks['pf']
    road_parks['pf_games'] = road_parks['pf'] * road_parks['games']  # to compute weighted avg of road pf

    road_totals = road_parks.groupby(['team_id', 'year']).agg(
        rs_adj=('rs_adj', 'sum'), games=('games', 'sum'),
        ra_adj=('ra_adj', 'sum'), pf_adj_sum=('pf_games', 'sum'))
    road_totals['rt_adj'] = road_totals['rs_adj'] + road_totals['ra_adj']
    road_totals['r_avg_adj'] = road_totals['rt_adj'] / road_totals['games']
    road_totals['pf_avg_road'] = road_totals['pf_adj_sum'] / road_totals['games']  # weighted avg of road pf

    home_parks['pf'] = home_parks['r_avg'] / road_totals['r_avg_adj']
    home_parks['pf_avg_road'] = road_totals['pf_avg_road']

    display(home_parks.query('team_id == "BOS" and year==2019').round(3))

Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf,pf_avg_road
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BOS,2019,BOS07,431.0,410.0,841.0,79,10.646,1.025,0.996


Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf,pf_avg_road
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BOS,2019,BOS07,431.0,410.0,841.0,79,10.646,1.026,0.996


Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf,pf_avg_road
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BOS,2019,BOS07,431.0,410.0,841.0,79,10.646,1.029,0.999


Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf,pf_avg_road
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BOS,2019,BOS07,431.0,410.0,841.0,79,10.646,1.031,1.001


Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf,pf_avg_road
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BOS,2019,BOS07,431.0,410.0,841.0,79,10.646,1.033,1.003


Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf,pf_avg_road
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BOS,2019,BOS07,431.0,410.0,841.0,79,10.646,1.034,1.005


Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf,pf_avg_road
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BOS,2019,BOS07,431.0,410.0,841.0,79,10.646,1.035,1.006


Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf,pf_avg_road
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BOS,2019,BOS07,431.0,410.0,841.0,79,10.646,1.036,1.006


Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf,pf_avg_road
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BOS,2019,BOS07,431.0,410.0,841.0,79,10.646,1.037,1.007


Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf,pf_avg_road
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
BOS,2019,BOS07,431.0,410.0,841.0,79,10.646,1.037,1.008


# Webscrape FanGraphs for PF

In [31]:
import requests
from io import StringIO
from bs4 import BeautifulSoup
import requests

In [32]:
# read the parks factor table on the fangraphs website
data = []
for year in range(2015, 2020):
    url = f'https://www.fangraphs.com/guts.aspx?type=pf&season={year}&teamid=0'
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'lxml')
    
    table = soup.find('table', class_='rgMasterTable')
    
    header = table.find('thead')
    cols = [col.text for col in header.find_all('th')]
    
    body = table.find('tbody')
    for row in body.find_all('tr'):
        data.append([col.text for col in row.find_all('td')])

In [33]:
fg = pd.DataFrame(data, columns = cols)

# change datatypes from string to int
for col in fg.columns:
    if col != 'Team':
        fg[col] = fg[col].astype('int16')

fg.head()

Unnamed: 0,Season,Team,Basic (5yr),3yr,1yr,1B,2B,3B,HR,SO,BB,GB,FB,LD,IFFB,FIP
0,2015,Angels,97,95,93,100,96,88,98,102,97,101,100,98,100,98
1,2015,Orioles,101,101,108,101,96,87,106,98,100,101,102,100,100,103
2,2015,Red Sox,104,107,109,103,112,103,95,99,99,102,97,103,101,98
3,2015,White Sox,99,98,95,98,95,93,105,102,103,98,101,98,105,102
4,2015,Indians,102,106,112,101,106,83,102,100,100,101,97,101,92,100


In [34]:
# add the team name to the pf dataframe to compare with Fangraphs
hp = home_parks.reset_index()[['team_id', 'year', 'pf']]
pf = hp.merge(teams, left_on=['team_id', 'year'], right_on=['team_id', 'year'])
pf.head()

Unnamed: 0,team_id,year,pf,name
0,ANA,2015,0.855181,Angels
1,ANA,2016,0.902584,Angels
2,ANA,2017,0.954383,Angels
3,ANA,2018,0.966873,Angels
4,ANA,2019,1.034173,Angels


In [35]:
# add the 1yr PF from Fangraphs
pf_fg = pf.merge(fg[['Season', 'Team', '1yr']],
         left_on=['year', 'name'],
         right_on=['Season', 'Team'],
         validate='one_to_one')
pf_fg.head()

Unnamed: 0,team_id,year,pf,name,Season,Team,1yr
0,ANA,2015,0.855181,Angels,2015,Angels,93
1,ANA,2016,0.902584,Angels,2016,Angels,96
2,ANA,2017,0.954383,Angels,2017,Angels,98
3,ANA,2018,0.966873,Angels,2018,Angels,99
4,ANA,2019,1.034173,Angels,2019,Angels,101


In [36]:
# compute the maximum relative difference
pf_fg['pf_half'] = 100*(1+pf_fg['pf'])/2  # to be on the same scale as Fangraphs
rel_diff = np.abs(1.0 - pf_fg['pf_half'] / pf_fg['1yr'])
rel_diff.max()

0.038054824055911096

In [37]:
pf_fg.loc[rel_diff.idxmax()].to_frame().T

Unnamed: 0,team_id,year,pf,name,Season,Team,1yr,pf_half
86,NYA,2016,1.11763,Yankees,2016,Yankees,102,105.882


The 2016 New York Yankees had an unusually high pf on the road.  This reduced the effective runs scored on the road causing the home park factor to be adjusted upward.  As this calculation is likely not performed by Fangraphs, it is not surprising they they calculate a lower park factor.

In [38]:
# distribution of the average road Park Factor
home_parks['pf_avg_road'].describe()

count    150.000000
mean       1.006601
std        0.028504
min        0.930702
25%        0.988693
50%        1.006905
75%        1.025313
max        1.083488
Name: pf_avg_road, dtype: float64

# Summary
The Park Factor was adjusted for the road schedule for each team.  In addition, the weighted average of the road Park Factor was computed.  It was shown, in a few cases, to be significantly different from assuming a road PF of 1.0.  This new metric, the average Park Factor on the road, may be useful.

In [39]:
pd.set_option("display.max_rows", 150)
home_parks

Unnamed: 0_level_0,Unnamed: 1_level_0,park_id,rs,ra,rt,games,r_avg,pf,pf_avg_road
team_id,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ANA,2015,ANA01,320.0,298.0,618.0,81,7.62963,0.855181,0.999678
ANA,2016,ANA01,337.0,351.0,688.0,81,8.493827,0.902584,0.997431
ANA,2017,ANA01,356.0,335.0,691.0,81,8.530864,0.954383,1.014054
ANA,2018,ANA01,355.0,355.0,710.0,81,8.765432,0.966873,1.023024
ANA,2019,ANA01,385.0,411.0,796.0,79,10.075949,1.034173,1.03064
ARI,2015,PHO01,366.0,372.0,738.0,81,9.111111,1.056283,0.990859
ARI,2016,PHO01,411.0,493.0,904.0,81,11.160494,1.227067,0.998799
ARI,2017,PHO01,457.0,346.0,803.0,81,9.91358,1.202288,1.001328
ARI,2018,PHO01,359.0,328.0,687.0,81,8.481481,1.060498,1.002834
ARI,2019,PHO01,399.0,370.0,769.0,81,9.493827,0.955717,0.980647
