# Data Exploration

In this notebook, we take an in-depth look at the data and aim to provide insights that can help improve feature engineering. Please feel free to make edits or add your findings to this notebook.

__Highlights__

- Many columns have missing values.
  - For missing values in the `session` column, we could attempt to fill them using the `date` or `home_team_season` columns.
  - We can try to infer a missing stat by the team's past season performance. i.e. looking at the team's `home_batting_RBI_10RA` from previous games

- Task 2 data contains categorical values not present in the training data, e.g., Task 2 has different `home_pitcher` values.

- Some categorical columns may not be useful:
  - `home_team_season` and `away_team_season` are simply "team name + season year" (e.g., 'TBR_2019'), which is redundant information since we already have `session`, `home_team_abbr`, and `away_team_abbr`.

In [29]:
import pandas as pd
import tqdm

In [2]:
pd.set_option('display.max_columns', None)
df = pd.read_csv('data/task1/train_data.csv')
df.head()

Unnamed: 0,id,home_team_abbr,away_team_abbr,date,is_night_game,home_team_win,home_pitcher,away_pitcher,home_team_rest,away_team_rest,home_pitcher_rest,away_pitcher_rest,season,home_batting_batting_avg_10RA,home_batting_onbase_perc_10RA,home_batting_onbase_plus_slugging_10RA,home_batting_leverage_index_avg_10RA,home_batting_RBI_10RA,away_batting_batting_avg_10RA,away_batting_onbase_perc_10RA,away_batting_onbase_plus_slugging_10RA,away_batting_leverage_index_avg_10RA,away_batting_RBI_10RA,home_pitching_earned_run_avg_10RA,home_pitching_SO_batters_faced_10RA,home_pitching_H_batters_faced_10RA,home_pitching_BB_batters_faced_10RA,away_pitching_earned_run_avg_10RA,away_pitching_SO_batters_faced_10RA,away_pitching_H_batters_faced_10RA,away_pitching_BB_batters_faced_10RA,home_pitcher_earned_run_avg_10RA,home_pitcher_SO_batters_faced_10RA,home_pitcher_H_batters_faced_10RA,home_pitcher_BB_batters_faced_10RA,away_pitcher_earned_run_avg_10RA,away_pitcher_SO_batters_faced_10RA,away_pitcher_H_batters_faced_10RA,away_pitcher_BB_batters_faced_10RA,home_team_season,away_team_season,home_team_errors_mean,home_team_errors_std,home_team_errors_skew,away_team_errors_mean,away_team_errors_std,away_team_errors_skew,home_team_spread_mean,home_team_spread_std,home_team_spread_skew,away_team_spread_mean,away_team_spread_std,away_team_spread_skew,home_team_wins_mean,home_team_wins_std,home_team_wins_skew,away_team_wins_mean,away_team_wins_std,away_team_wins_skew,home_batting_batting_avg_mean,home_batting_batting_avg_std,home_batting_batting_avg_skew,home_batting_onbase_perc_mean,home_batting_onbase_perc_std,home_batting_onbase_perc_skew,home_batting_onbase_plus_slugging_mean,home_batting_onbase_plus_slugging_std,home_batting_onbase_plus_slugging_skew,home_batting_leverage_index_avg_mean,home_batting_leverage_index_avg_std,home_batting_leverage_index_avg_skew,home_batting_wpa_bat_mean,home_batting_wpa_bat_std,home_batting_wpa_bat_skew,home_batting_RBI_mean,home_batting_RBI_std,home_batting_RBI_skew,away_batting_batting_avg_mean,away_batting_batting_avg_std,away_batting_batting_avg_skew,away_batting_onbase_perc_mean,away_batting_onbase_perc_std,away_batting_onbase_perc_skew,away_batting_onbase_plus_slugging_mean,away_batting_onbase_plus_slugging_std,away_batting_onbase_plus_slugging_skew,away_batting_leverage_index_avg_mean,away_batting_leverage_index_avg_std,away_batting_leverage_index_avg_skew,away_batting_wpa_bat_mean,away_batting_wpa_bat_std,away_batting_wpa_bat_skew,away_batting_RBI_mean,away_batting_RBI_std,away_batting_RBI_skew,home_pitching_earned_run_avg_mean,home_pitching_earned_run_avg_std,home_pitching_earned_run_avg_skew,home_pitching_SO_batters_faced_mean,home_pitching_SO_batters_faced_std,home_pitching_SO_batters_faced_skew,home_pitching_H_batters_faced_mean,home_pitching_H_batters_faced_std,home_pitching_H_batters_faced_skew,home_pitching_BB_batters_faced_mean,home_pitching_BB_batters_faced_std,home_pitching_BB_batters_faced_skew,home_pitching_leverage_index_avg_mean,home_pitching_leverage_index_avg_std,home_pitching_leverage_index_avg_skew,home_pitching_wpa_def_mean,home_pitching_wpa_def_std,home_pitching_wpa_def_skew,away_pitching_earned_run_avg_mean,away_pitching_earned_run_avg_std,away_pitching_earned_run_avg_skew,away_pitching_SO_batters_faced_mean,away_pitching_SO_batters_faced_std,away_pitching_SO_batters_faced_skew,away_pitching_H_batters_faced_mean,away_pitching_H_batters_faced_std,away_pitching_H_batters_faced_skew,away_pitching_BB_batters_faced_mean,away_pitching_BB_batters_faced_std,away_pitching_BB_batters_faced_skew,away_pitching_leverage_index_avg_mean,away_pitching_leverage_index_avg_std,away_pitching_leverage_index_avg_skew,away_pitching_wpa_def_mean,away_pitching_wpa_def_std,away_pitching_wpa_def_skew,home_pitcher_earned_run_avg_mean,home_pitcher_earned_run_avg_std,home_pitcher_earned_run_avg_skew,home_pitcher_SO_batters_faced_mean,home_pitcher_SO_batters_faced_std,home_pitcher_SO_batters_faced_skew,home_pitcher_H_batters_faced_mean,home_pitcher_H_batters_faced_std,home_pitcher_H_batters_faced_skew,home_pitcher_BB_batters_faced_mean,home_pitcher_BB_batters_faced_std,home_pitcher_BB_batters_faced_skew,home_pitcher_leverage_index_avg_mean,home_pitcher_leverage_index_avg_std,home_pitcher_leverage_index_avg_skew,home_pitcher_wpa_def_mean,home_pitcher_wpa_def_std,home_pitcher_wpa_def_skew,away_pitcher_earned_run_avg_mean,away_pitcher_earned_run_avg_std,away_pitcher_earned_run_avg_skew,away_pitcher_SO_batters_faced_mean,away_pitcher_SO_batters_faced_std,away_pitcher_SO_batters_faced_skew,away_pitcher_H_batters_faced_mean,away_pitcher_H_batters_faced_std,away_pitcher_H_batters_faced_skew,away_pitcher_BB_batters_faced_mean,away_pitcher_BB_batters_faced_std,away_pitcher_BB_batters_faced_skew,away_pitcher_leverage_index_avg_mean,away_pitcher_leverage_index_avg_std,away_pitcher_leverage_index_avg_skew,away_pitcher_wpa_def_mean,away_pitcher_wpa_def_std,away_pitcher_wpa_def_skew
0,0,KFH,KJP,2021-05-16,False,True,juradar01,carraca01,1.0,1.0,15.0,5.0,2021.0,-1.225891,-1.043317,,1.274711,-0.117454,-0.767782,0.204823,-0.91224,0.736905,-0.293145,-1.545052,1.326298,-1.137733,-0.517119,-0.981683,-0.208237,-0.187787,-0.220801,-0.660756,-1.371297,-0.764567,0.138698,,-0.510128,0.421279,-0.903349,TBR_2021,NYM_2021,-1.080484,-1.141178,-0.250033,0.863292,1.534448,0.525851,0.301436,-0.428877,-0.483396,-0.088848,-0.541332,-2.119907,0.311179,0.269419,-0.326138,0.49849,0.234694,-0.52278,-1.207565,-0.212762,1.536088,-0.68296,,0.314234,-0.81382,-0.448673,0.907542,,-0.416869,-0.455878,-0.045797,1.225269,-0.070702,-0.326582,0.291835,0.681472,-0.420972,0.375311,-0.790093,0.517824,-0.378227,-1.512319,-0.659638,-1.061603,-2.032393,2.197353,1.231229,-0.783565,-0.023913,0.748943,-0.879243,,-1.973356,-1.843924,-0.732155,-0.432554,0.434091,1.083299,1.351392,0.418088,-0.876897,0.175316,1.75322,-0.713761,-1.240541,0.393495,0.838672,0.710098,1.021324,0.516786,0.857423,-1.806628,-1.160585,0.513019,1.070935,1.951345,4.221352,1.497494,-1.164861,0.088444,0.356905,-0.588851,-0.042785,-0.392711,0.71211,0.308497,2.162372,0.75767,-0.505291,-0.87842,-0.686719,-0.67986,-1.073587,-1.436283,-1.396256,,-0.774925,-0.757804,0.548638,0.111248,0.200472,0.279251,-0.494965,-0.002013,1.643851,0.266738,-0.655777,1.437349,-0.925936,,0.727554,-0.53841,0.438758,0.577858,0.487583,1.065537,1.13335,-0.974559,-0.900633,-1.093425,0.896974,-0.611051,-0.398111,0.949021,1.007072,0.340438
1,1,VJV,HXK,2019-05-04,True,False,ramirer02,rodrich01,2.0,,9.0,6.0,2019.0,0.218717,0.595775,0.505906,-0.078454,0.858849,-0.147988,-0.006021,-0.117896,0.345131,-0.115893,-0.183885,0.799712,-0.877649,0.786116,-0.743728,0.922561,-1.586282,-0.884691,0.37565,-0.512275,0.197788,-0.122357,-0.489211,-1.070462,,-0.811256,LAA_2019,HOU_2019,-0.014566,-0.319278,-1.006251,-0.802266,-0.968589,-0.525281,0.044845,-1.246412,-0.404061,0.783281,-0.443467,1.143124,-0.282099,0.275837,0.290322,0.530375,0.226817,-0.556458,-0.141871,-0.328477,-0.949102,0.324222,0.723552,-1.295577,0.221863,0.36493,,0.293537,-1.092796,0.901655,-0.579349,-0.312958,-0.807395,0.398161,-0.635111,-0.991865,1.149014,0.34011,0.014218,1.20438,0.153496,-0.003143,1.367754,0.969078,-0.260429,0.723969,-0.004598,-1.249028,-0.435079,0.857667,-0.085423,0.365423,0.328887,,0.615213,0.024161,-0.187974,-0.085267,-0.778768,-0.126916,-0.192589,-1.126251,1.741693,0.917702,0.141553,-0.398967,-0.53723,-1.415182,0.61245,0.159409,,-0.401745,-0.690679,-0.201829,-0.270541,1.519616,1.789219,-0.968409,-1.794213,-0.870656,-0.413673,-1.268734,-0.823327,-1.562924,-1.485809,-1.911265,-1.203808,1.181008,-0.837373,0.212075,0.403485,-0.36079,-1.806381,-0.541507,-0.043339,0.353053,0.2493,-1.214801,-2.042464,-0.159877,0.076928,0.196831,0.395448,-0.518694,-1.139803,-0.713193,-0.311223,1.119829,-0.514006,-0.593109,-1.419052,-1.124279,-0.17594,1.303968,0.07307,-0.574972,-0.044641,-0.878649,-1.079528,-1.719608,0.050448,-0.851738,-0.202878,0.489511,-0.876286,1.416154
2,2,VJV,JEM,2019-06-10,True,True,jarvibr01,tropeni01,1.0,1.0,6.0,6.0,2019.0,,0.533984,0.775941,-0.35461,1.036358,0.533786,0.126973,0.455796,-0.362811,-0.204519,1.092984,1.279097,0.273296,-0.005443,-2.435672,0.834443,-2.727696,-1.439133,-0.352874,0.818033,-0.797421,-0.476123,-0.879722,0.100481,-0.607391,-1.471836,LAA_2019,LAD_2019,-0.120043,-0.206085,-0.284206,-0.508426,0.187192,0.591495,-0.082325,0.4622,,1.486352,-0.203195,0.249802,-0.273817,0.27687,0.281687,1.565176,-0.304426,,0.64047,0.295217,-0.0427,0.975731,0.330245,-0.471824,,0.463071,0.270626,-0.323181,-0.895546,-0.340902,0.087012,0.349644,0.465093,0.75887,0.050908,-0.393459,0.980636,-0.233423,0.201466,1.476933,-0.661004,0.558863,1.526021,0.567774,0.574643,-0.817971,-0.586116,0.467818,1.284789,0.021506,-0.484293,1.207376,0.80219,-0.09825,1.192944,0.504885,0.116468,0.22731,-0.499931,-0.206989,0.041358,-1.082917,1.16329,0.511586,0.052185,-0.399662,-1.088069,-0.882866,0.131625,,-1.176843,0.119959,-1.134487,-0.878522,-0.829399,0.580312,0.303866,0.10514,-0.85843,-0.390671,-0.422695,-1.915657,-0.867586,-0.059842,-0.049751,-0.77919,-0.650227,1.087092,-0.364284,-1.026884,-0.362856,-0.500571,-1.238497,0.844173,-1.190557,-0.260358,-0.809891,-1.230517,-0.242445,-0.527291,-0.201109,1.061643,0.015339,-0.468506,-0.564841,0.675709,-0.027191,0.363149,-0.93183,-0.468359,1.089636,0.251259,0.099176,-0.316298,,-0.227289,-0.10018,-1.702937,-0.867762,1.992552,-0.404961,-0.132717,-0.106344,2.48102,-0.20011,-0.026083
3,3,BPH,FBW,2018-06-26,True,True,diazyi01,johnsji04,1.0,1.0,5.0,6.0,2018.0,-0.415669,0.039655,0.03013,0.17699,-0.383718,0.816137,1.612614,0.993488,-2.699708,1.213491,-1.045875,0.479182,-1.356579,1.111326,-1.241793,0.665862,-0.724074,-1.460551,-0.265237,-0.227176,-0.615632,1.813716,-0.850446,0.760295,-0.385638,-1.864283,STL_2018,CLE_2018,1.567589,1.470115,-0.395447,0.109479,0.480085,-0.286322,0.142649,-0.46723,-0.049698,0.767113,0.588013,0.104073,0.275165,0.274461,-0.288535,0.495177,0.235484,-0.519286,-0.140873,-0.177531,-0.462785,-0.14708,-0.354646,,-0.180802,-0.376529,-0.366319,-0.065323,0.820969,1.73468,0.686106,0.355758,0.411222,-0.322113,-0.818498,0.215215,0.471715,0.666661,0.430022,0.497679,-0.04623,0.824164,0.766258,0.582897,0.829928,-0.955816,0.190307,0.337728,0.203258,0.515799,-1.882902,0.737844,0.36085,-0.432547,-0.702054,-1.184412,-0.702221,-0.216474,-0.3273,,-0.566519,-0.667772,-0.557483,,1.036101,1.712111,0.56872,,-0.265712,-0.279376,-0.173647,-0.437659,-0.416684,0.332175,-0.211662,0.799526,,0.662857,-0.122517,0.03635,-1.018886,-1.599265,-1.33569,0.112849,-0.656822,,0.330273,0.533841,0.56615,-0.748506,-0.269294,-0.213531,-0.407492,-0.093126,-0.149793,,-0.791955,-0.673003,0.774792,,1.217417,-0.270472,1.589648,0.034188,-0.731097,-0.007993,1.533576,0.273441,-0.896115,-0.660942,-0.467018,0.85816,0.166424,0.933574,-0.773771,-0.776762,-1.385079,-1.549095,-1.00847,0.11608,-1.236753,-0.119898,0.005985,1.646317,-0.764309,
4,4,RLJ,DPS,2016-07-05,True,False,willibr02,armstsh01,1.0,1.0,6.0,5.0,2016.0,2.22186,2.140554,1.794125,-1.880372,,0.888447,0.691387,0.627672,-0.541515,1.124865,1.907866,-0.953459,3.450165,-0.010166,1.378177,,1.497108,1.225075,-0.063858,0.302281,0.538247,-1.003755,-0.456499,0.473479,-0.962037,0.66998,BOS_2016,TEX_2016,0.051465,0.581598,0.842816,-0.340517,-0.047144,0.630565,0.63795,0.719801,-1.350335,0.379077,-0.028968,-1.510729,0.417835,0.250931,-0.43793,1.020791,0.043517,-1.090214,2.521679,0.656241,-1.401304,2.126328,,-1.358041,1.982311,0.581218,-1.083025,-0.062983,0.190879,0.493656,,0.377162,-0.577038,1.531512,0.533172,-1.006008,1.166961,0.528832,-0.182092,0.378808,-0.184347,0.975325,0.588621,-0.040999,-0.461084,-0.419796,-0.402919,-0.247948,1.220764,0.171242,0.304705,0.677608,-0.039062,-0.411931,0.428697,0.321169,0.496481,-0.147109,,-0.186139,,0.469915,0.665759,0.105434,0.98436,0.043892,-0.638427,-1.135716,0.019305,-0.451571,0.229473,-1.997511,,0.488944,0.835004,-1.98263,0.278477,-0.338649,0.791731,0.259911,0.129851,0.046172,-0.570667,-1.007998,0.947001,0.451493,-0.128817,0.353667,0.683641,-0.911888,0.306232,0.186748,0.524814,0.633551,0.516905,-0.38559,0.612087,0.717104,3.219464,,-0.601457,-0.324979,0.259883,-0.342976,-0.838219,-0.311751,0.884279,-0.322573,-0.479145,-0.484865,0.589492,0.49002,,-0.487029,-0.986882,-1.093997,0.356122,0.663967,-0.123547,0.361822,-0.035276,-0.285671,-2.563819,0.527432,-0.911987,-1.109533


In [3]:
df.describe()

Unnamed: 0,id,home_team_rest,away_team_rest,home_pitcher_rest,away_pitcher_rest,season,home_batting_batting_avg_10RA,home_batting_onbase_perc_10RA,home_batting_onbase_plus_slugging_10RA,home_batting_leverage_index_avg_10RA,home_batting_RBI_10RA,away_batting_batting_avg_10RA,away_batting_onbase_perc_10RA,away_batting_onbase_plus_slugging_10RA,away_batting_leverage_index_avg_10RA,away_batting_RBI_10RA,home_pitching_earned_run_avg_10RA,home_pitching_SO_batters_faced_10RA,home_pitching_H_batters_faced_10RA,home_pitching_BB_batters_faced_10RA,away_pitching_earned_run_avg_10RA,away_pitching_SO_batters_faced_10RA,away_pitching_H_batters_faced_10RA,away_pitching_BB_batters_faced_10RA,home_pitcher_earned_run_avg_10RA,home_pitcher_SO_batters_faced_10RA,home_pitcher_H_batters_faced_10RA,home_pitcher_BB_batters_faced_10RA,away_pitcher_earned_run_avg_10RA,away_pitcher_SO_batters_faced_10RA,away_pitcher_H_batters_faced_10RA,away_pitcher_BB_batters_faced_10RA,home_team_errors_mean,home_team_errors_std,home_team_errors_skew,away_team_errors_mean,away_team_errors_std,away_team_errors_skew,home_team_spread_mean,home_team_spread_std,home_team_spread_skew,away_team_spread_mean,away_team_spread_std,away_team_spread_skew,home_team_wins_mean,home_team_wins_std,home_team_wins_skew,away_team_wins_mean,away_team_wins_std,away_team_wins_skew,home_batting_batting_avg_mean,home_batting_batting_avg_std,home_batting_batting_avg_skew,home_batting_onbase_perc_mean,home_batting_onbase_perc_std,home_batting_onbase_perc_skew,home_batting_onbase_plus_slugging_mean,home_batting_onbase_plus_slugging_std,home_batting_onbase_plus_slugging_skew,home_batting_leverage_index_avg_mean,home_batting_leverage_index_avg_std,home_batting_leverage_index_avg_skew,home_batting_wpa_bat_mean,home_batting_wpa_bat_std,home_batting_wpa_bat_skew,home_batting_RBI_mean,home_batting_RBI_std,home_batting_RBI_skew,away_batting_batting_avg_mean,away_batting_batting_avg_std,away_batting_batting_avg_skew,away_batting_onbase_perc_mean,away_batting_onbase_perc_std,away_batting_onbase_perc_skew,away_batting_onbase_plus_slugging_mean,away_batting_onbase_plus_slugging_std,away_batting_onbase_plus_slugging_skew,away_batting_leverage_index_avg_mean,away_batting_leverage_index_avg_std,away_batting_leverage_index_avg_skew,away_batting_wpa_bat_mean,away_batting_wpa_bat_std,away_batting_wpa_bat_skew,away_batting_RBI_mean,away_batting_RBI_std,away_batting_RBI_skew,home_pitching_earned_run_avg_mean,home_pitching_earned_run_avg_std,home_pitching_earned_run_avg_skew,home_pitching_SO_batters_faced_mean,home_pitching_SO_batters_faced_std,home_pitching_SO_batters_faced_skew,home_pitching_H_batters_faced_mean,home_pitching_H_batters_faced_std,home_pitching_H_batters_faced_skew,home_pitching_BB_batters_faced_mean,home_pitching_BB_batters_faced_std,home_pitching_BB_batters_faced_skew,home_pitching_leverage_index_avg_mean,home_pitching_leverage_index_avg_std,home_pitching_leverage_index_avg_skew,home_pitching_wpa_def_mean,home_pitching_wpa_def_std,home_pitching_wpa_def_skew,away_pitching_earned_run_avg_mean,away_pitching_earned_run_avg_std,away_pitching_earned_run_avg_skew,away_pitching_SO_batters_faced_mean,away_pitching_SO_batters_faced_std,away_pitching_SO_batters_faced_skew,away_pitching_H_batters_faced_mean,away_pitching_H_batters_faced_std,away_pitching_H_batters_faced_skew,away_pitching_BB_batters_faced_mean,away_pitching_BB_batters_faced_std,away_pitching_BB_batters_faced_skew,away_pitching_leverage_index_avg_mean,away_pitching_leverage_index_avg_std,away_pitching_leverage_index_avg_skew,away_pitching_wpa_def_mean,away_pitching_wpa_def_std,away_pitching_wpa_def_skew,home_pitcher_earned_run_avg_mean,home_pitcher_earned_run_avg_std,home_pitcher_earned_run_avg_skew,home_pitcher_SO_batters_faced_mean,home_pitcher_SO_batters_faced_std,home_pitcher_SO_batters_faced_skew,home_pitcher_H_batters_faced_mean,home_pitcher_H_batters_faced_std,home_pitcher_H_batters_faced_skew,home_pitcher_BB_batters_faced_mean,home_pitcher_BB_batters_faced_std,home_pitcher_BB_batters_faced_skew,home_pitcher_leverage_index_avg_mean,home_pitcher_leverage_index_avg_std,home_pitcher_leverage_index_avg_skew,home_pitcher_wpa_def_mean,home_pitcher_wpa_def_std,home_pitcher_wpa_def_skew,away_pitcher_earned_run_avg_mean,away_pitcher_earned_run_avg_std,away_pitcher_earned_run_avg_skew,away_pitcher_SO_batters_faced_mean,away_pitcher_SO_batters_faced_std,away_pitcher_SO_batters_faced_skew,away_pitcher_H_batters_faced_mean,away_pitcher_H_batters_faced_std,away_pitcher_H_batters_faced_skew,away_pitcher_BB_batters_faced_mean,away_pitcher_BB_batters_faced_std,away_pitcher_BB_batters_faced_skew,away_pitcher_leverage_index_avg_mean,away_pitcher_leverage_index_avg_std,away_pitcher_leverage_index_avg_skew,away_pitcher_wpa_def_mean,away_pitcher_wpa_def_std,away_pitcher_wpa_def_skew
count,11067.0,10420.0,10412.0,9533.0,9509.0,10514.0,10416.0,10413.0,10413.0,10419.0,10413.0,10418.0,10412.0,10415.0,10417.0,10417.0,10418.0,10414.0,10413.0,10415.0,10412.0,10411.0,10417.0,10414.0,9793.0,9796.0,9783.0,9800.0,9774.0,9790.0,9788.0,9776.0,10413.0,10415.0,10199.0,10412.0,10413.0,10214.0,10413.0,10415.0,10307.0,10417.0,10416.0,10307.0,10417.0,10417.0,10223.0,10409.0,10412.0,10215.0,10415.0,10319.0,10210.0,10418.0,10315.0,10215.0,10413.0,10313.0,10214.0,10416.0,10319.0,10208.0,10417.0,10322.0,10215.0,10417.0,10318.0,10213.0,10412.0,10315.0,10213.0,10414.0,10310.0,10212.0,10414.0,10314.0,10213.0,10412.0,10317.0,10212.0,10415.0,10308.0,10212.0,10414.0,10311.0,10223.0,10415.0,10314.0,10218.0,10415.0,10313.0,10222.0,10414.0,10316.0,10219.0,10415.0,10323.0,10207.0,10416.0,10313.0,10215.0,10419.0,10324.0,10211.0,10415.0,10308.0,10219.0,10414.0,10311.0,10210.0,10412.0,10308.0,10218.0,10412.0,10308.0,10210.0,10416.0,10313.0,10216.0,10410.0,10312.0,10214.0,9793.0,9069.0,8375.0,9782.0,9087.0,8395.0,9796.0,9069.0,8387.0,9791.0,9090.0,8382.0,9790.0,9086.0,8391.0,9796.0,9083.0,8394.0,9783.0,9075.0,8422.0,9782.0,9088.0,8395.0,9782.0,9095.0,8428.0,9777.0,9097.0,8407.0,9782.0,9093.0,8421.0,9771.0,9107.0,8420.0
mean,5533.0,1.153743,1.155109,6.88566,6.84583,2019.423245,0.010912,0.025081,0.016323,0.026122,0.002036,-0.006905,0.01084,-0.004009,0.014304,0.003425,0.00604,-0.015088,0.000998,0.038992,0.017155,-0.008743,0.009016,0.066083,0.002727,-0.005292,-0.023649,0.052891,0.002635,0.011784,-0.042941,0.070324,0.032892,-0.040942,-0.131937,0.050454,-0.031415,-0.128668,-0.012322,-0.128611,0.010342,-0.016652,-0.121197,0.009922,-0.011157,-0.086392,0.020411,-0.013115,-0.086262,0.005546,-0.063497,-0.048551,-0.037258,-0.005033,-0.026874,0.003145,-0.03521,-0.031837,-0.04698,0.043072,-0.081257,-0.074472,-0.043794,0.013722,0.013647,-0.040329,-0.042918,-0.097628,-0.058739,-0.037265,-0.043453,-0.020595,-0.014391,-0.007146,-0.037299,-0.014335,-0.043283,0.031266,-0.079135,-0.071491,-0.050242,0.004585,0.022721,-0.041138,-0.031899,-0.088225,-0.01859,-0.034513,-0.112267,0.028166,-0.063145,-0.031167,-0.060677,-0.036443,-0.056713,0.096317,0.057579,-0.05768,0.028754,-0.072643,-0.053836,0.02796,0.007665,0.001568,-0.01041,-0.042469,-0.125289,0.028851,-0.066169,-0.029438,-0.069836,-0.051525,-0.056477,0.117727,0.050669,-0.049897,0.034591,-0.068397,-0.054458,0.03567,0.01903,-0.001196,0.014203,-0.026449,-0.189432,-0.006365,-0.053584,-0.063646,-0.008963,-0.05703,-0.058337,0.050364,-0.019376,-0.10225,0.037763,-0.035671,-0.025526,0.007071,-0.025817,0.044572,0.002659,-0.03465,-0.199722,0.011841,-0.055787,-0.065012,-0.02571,-0.082243,-0.068823,0.057418,-0.015139,-0.10231,0.034829,-0.049211,-0.038866,0.018352,-0.018128,0.036265
std,3194.912049,0.541909,0.542719,6.179876,6.047317,2.434817,1.034945,1.021776,1.023586,1.039679,1.020796,1.037677,1.019306,1.024512,1.017408,1.034966,1.01467,1.008551,1.020714,1.033371,1.027994,1.015383,1.018235,1.039304,1.107938,1.076474,1.078805,1.099259,1.096922,1.067089,1.076358,1.100458,1.186715,1.200467,1.116658,1.186112,1.175057,1.120002,1.085325,1.193288,1.120699,1.0972,1.185295,1.119353,1.109265,1.220756,1.098254,1.113405,1.209033,1.103307,1.140989,1.207423,1.210022,1.130805,1.227708,1.20593,1.121379,1.185066,1.177544,1.204415,1.190121,1.185459,1.158732,1.214148,1.130836,1.122549,1.165303,1.125483,1.139296,1.221457,1.196001,1.136002,1.232799,1.207132,1.122608,1.19825,1.172953,1.17735,1.180067,1.189652,1.164198,1.229358,1.139763,1.132739,1.180003,1.124851,1.115903,1.168328,1.109978,1.107025,1.153476,1.205524,1.109305,1.219033,1.191662,1.127846,1.196802,1.174477,1.160282,1.195307,1.200823,1.144316,1.197826,1.127988,1.133931,1.168246,1.114499,1.100296,1.162028,1.207409,1.11648,1.206315,1.191506,1.13983,1.200785,1.18717,1.188112,1.199863,1.203742,1.144891,1.215282,1.128212,1.203631,1.103162,1.025386,1.094879,1.092105,1.170146,1.119145,1.086523,1.143319,1.126243,1.111355,1.171629,1.116309,1.094781,1.083283,1.158467,1.178055,1.17822,1.133447,1.090801,1.017971,1.090512,1.098787,1.161039,1.114709,1.066019,1.143725,1.126946,1.103011,1.144492,1.097477,1.063507,1.07628,1.157832,1.158408,1.186227
min,0.0,0.0,0.0,1.0,1.0,2016.0,-6.214786,-7.141128,-6.552415,-4.317449,-3.845154,-6.005047,-6.869815,-5.968634,-4.582972,-3.838169,-3.576472,-5.01449,-5.70475,-4.475072,-3.498266,-4.871477,-7.035783,-4.491474,-1.387443,-3.44546,-4.612408,-2.328612,-1.490195,-3.47293,-4.614743,-2.257296,-3.495307,-4.946636,-7.514719,-3.517354,-5.014324,-5.5446,-13.8097,-6.174301,-4.981698,-9.968546,-6.204927,-5.249526,-4.381711,-8.552905,-6.827499,-4.347973,-8.468543,-5.964675,-11.122012,-7.660903,-7.209905,-11.305601,-7.661785,-6.207851,-10.086737,-7.324875,-6.639276,-7.568671,-7.677342,-8.040776,-11.418996,-7.764436,-4.64503,-6.038498,-5.973308,-6.279134,-9.497159,-7.712332,-6.536074,-10.907866,-7.90639,-6.398787,-9.230092,-7.513474,-6.703602,-8.394566,-7.798574,-7.756338,-9.599455,-7.473672,-5.650569,-6.101256,-5.916501,-6.580977,-5.641107,-5.878001,-6.537953,-6.816082,-6.818534,-6.536532,-8.890944,-7.895099,-7.62663,-6.72912,-7.325969,-6.959636,-8.381359,-7.872869,-7.117855,-17.754636,-7.480284,-5.081586,-5.356763,-5.786842,-6.507555,-6.67127,-6.631717,-6.789188,-10.848901,-7.809501,-8.110389,-6.663651,-7.227118,-7.302627,-7.604412,-7.898189,-8.127481,-11.219854,-7.687676,-5.008455,-1.451127,-1.059506,-3.405328,-3.596779,-2.44737,-3.878907,-4.870144,-2.620207,-3.998247,-2.451246,-2.172083,-4.069026,-5.847916,-1.490929,-3.013673,-7.821186,-3.432169,-4.748946,-1.580755,-1.057233,-3.335998,-3.636229,-2.424336,-3.877774,-4.880269,-2.613306,-3.772333,-2.384611,-2.17895,-4.131457,-6.046679,-1.488259,-3.155396,-6.768077,-3.423059,-4.693353
25%,2766.5,1.0,1.0,5.0,5.0,2017.0,-0.66046,-0.630292,-0.630346,-0.672189,-0.649982,-0.674813,-0.651529,-0.66604,-0.665233,-0.736273,-0.683889,-0.709759,-0.692588,-0.672837,-0.666803,-0.711421,-0.657766,-0.637168,-0.520416,-0.712908,-0.651372,-0.644194,-0.544444,-0.691827,-0.661666,-0.603079,-0.564156,-0.541727,-0.7325,-0.581222,-0.570086,-0.742348,-0.647308,-0.53215,-0.690335,-0.648076,-0.525107,-0.697153,-0.614272,0.012233,-0.627322,-0.606844,0.011006,-0.638653,-0.671869,-0.552458,-0.612834,-0.614689,-0.530678,-0.621859,-0.626018,-0.555416,-0.657728,-0.537391,-0.618839,-0.635413,-0.596456,-0.498237,-0.66504,-0.64436,-0.661657,-0.727466,-0.680432,-0.567219,-0.600633,-0.645414,-0.531977,-0.590302,-0.636364,-0.563822,-0.654615,-0.538102,-0.592501,-0.623017,-0.620101,-0.490426,-0.649715,-0.644585,-0.655541,-0.732791,-0.672309,-0.70015,-0.721441,-0.756993,-0.670606,-0.656272,-0.679865,-0.539713,-0.653537,-0.563073,-0.56144,-0.613458,-0.564303,-0.584604,-0.661991,-0.606798,-0.51768,-0.643043,-0.646832,-0.705755,-0.71781,-0.746345,-0.648858,-0.649736,-0.661209,-0.569687,-0.654596,-0.561019,-0.56191,-0.617404,-0.535549,-0.606141,-0.656581,-0.552378,-0.551571,-0.660879,-0.512536,-0.519748,-0.823005,-0.70818,-0.676716,-0.781944,-0.643527,-0.709907,-0.767124,-0.653009,-0.663156,-0.792774,-0.379735,-0.503193,-0.702962,-0.627782,-0.677767,-0.634129,-0.547658,-0.522485,-0.837349,-0.69575,-0.665502,-0.778593,-0.642831,-0.704107,-0.782568,-0.635552,-0.649177,-0.783596,-0.388888,-0.508079,-0.720445,-0.617508,-0.658194,-0.644719
50%,5533.0,1.0,1.0,6.0,6.0,2019.0,0.004956,0.020142,-0.013123,-0.002511,-0.028699,-0.006812,0.019929,-0.014539,-0.012276,-0.115893,-0.064049,-0.054154,0.004274,-0.004009,-0.052016,-0.05156,0.017666,0.001647,-0.163655,-0.079942,-0.026246,-0.061488,-0.176442,-0.064786,-0.036001,-0.05078,0.014296,0.041164,-0.136867,0.0115,0.024719,-0.152414,-0.012643,-0.040174,0.011392,-0.008958,-0.035032,0.016952,-0.008791,0.2024,0.006446,-0.011664,0.205666,0.007897,-0.077373,0.019406,-0.050755,0.020985,0.013507,0.02595,-0.033496,0.040121,-0.060758,-0.031043,-0.059353,-0.070053,0.0087,0.012826,-0.007037,-0.069837,-0.024936,-0.153412,-0.077661,0.031502,-0.045895,0.024469,0.025493,0.029073,-0.024193,0.050534,-0.048285,-0.019652,-0.043005,-0.065111,0.011245,0.021695,-0.021114,-0.058927,-0.027204,-0.141398,-0.052194,-0.026801,-0.175749,0.021145,-0.104543,-0.10215,-0.017393,0.027352,-0.032686,0.022968,0.003378,-0.043878,0.015363,-0.021458,-0.065755,0.025985,-0.014015,-0.062618,-0.04861,-0.046813,-0.175256,0.007011,-0.114044,-0.101938,-0.022545,0.019269,-0.062482,0.017107,0.001745,-0.05025,0.009991,-0.027264,-0.091806,0.042275,-0.009559,-0.062576,-0.156497,-0.292682,-0.190684,-0.086576,-0.214567,-0.073667,-0.003997,-0.189477,-0.070268,-0.063057,-0.211398,-0.105154,0.094077,-0.258118,-0.108416,-0.012039,0.054271,0.051529,-0.164594,-0.299724,-0.217193,-0.065993,-0.213422,-0.062184,-0.021206,-0.211944,-0.064849,-0.064892,-0.220152,-0.124167,0.09331,-0.256483,-0.119779,-0.003268,0.065758,0.032673
75%,8299.5,1.0,1.0,6.0,6.0,2022.0,0.687611,0.699844,0.667225,0.667167,0.592584,0.667214,0.677601,0.66948,0.654426,0.681737,0.607753,0.616341,0.691397,0.660025,0.624925,0.634138,0.697865,0.665545,0.276875,0.614612,0.583499,0.598946,0.281611,0.622507,0.562681,0.584874,0.571083,0.600072,0.454141,0.583452,0.599693,0.455927,0.597115,0.489883,0.710379,0.5769,0.491121,0.683189,0.615912,0.275092,0.634585,0.607809,0.277684,0.610535,0.628402,0.571432,0.536259,0.615773,0.594476,0.607804,0.594153,0.61363,0.587078,0.512119,0.52744,0.551045,0.514126,0.484857,0.70063,0.556053,0.594597,0.547809,0.623697,0.578759,0.546291,0.594735,0.591592,0.598118,0.594511,0.617768,0.587112,0.546069,0.522445,0.55114,0.523864,0.483258,0.700981,0.558681,0.596569,0.54427,0.629812,0.612332,0.490375,0.707373,0.546916,0.663983,0.591891,0.567987,0.57152,0.657196,0.649558,0.550389,0.573096,0.490648,0.522045,0.656153,0.551839,0.648622,0.595041,0.610742,0.495397,0.6943,0.519588,0.636282,0.613865,0.551673,0.573007,0.657095,0.653758,0.545946,0.538443,0.492644,0.518626,0.630087,0.577876,0.662627,0.267918,0.064122,0.43959,0.624001,0.327402,0.663375,0.597616,0.399067,0.664612,0.593107,0.336413,0.643975,0.545097,0.073149,0.595403,0.650643,0.674063,0.715464,0.276262,0.058373,0.421859,0.644365,0.307221,0.648735,0.589319,0.36676,0.642893,0.581683,0.360809,0.619044,0.53979,0.056652,0.572464,0.679005,0.667003,0.705656
max,11066.0,7.0,6.0,110.0,95.0,2023.0,5.266224,6.550491,8.632676,7.764378,7.692966,6.838475,6.818847,7.537535,5.933064,8.569417,7.993861,6.06687,6.418611,8.263272,9.42517,6.721682,4.725067,7.000326,35.339311,9.644054,8.509757,12.100068,30.402325,8.346588,8.685722,11.803726,15.068646,8.192934,4.901095,15.100391,11.505798,5.525369,10.105199,6.970079,5.370676,13.572298,8.377894,5.52001,4.364128,0.293131,6.050614,4.324645,0.295556,5.980468,8.437699,11.637757,6.037501,10.385799,10.631795,5.876199,13.491336,13.754914,5.729268,13.447255,10.752525,6.533753,11.581937,13.635544,4.753634,12.301571,12.286702,4.634366,11.13823,16.060241,6.574505,10.976502,13.3067,5.890626,11.991832,10.076495,5.830524,10.823837,10.466475,6.773587,18.202078,23.885837,5.060695,13.878553,12.863922,4.744188,12.992264,12.044274,5.266463,9.804233,10.634668,5.19327,10.377193,17.301877,5.418539,12.105157,14.977234,6.558421,10.156463,10.573207,6.138286,9.365319,23.758604,4.956604,14.53923,14.834189,5.546291,9.187991,14.992438,5.421094,7.466225,12.539642,5.634909,9.998395,15.91514,5.870046,13.533266,11.051596,6.02483,11.038306,13.646339,5.473356,37.182035,17.426941,3.252697,10.037558,10.200384,4.025752,9.095647,7.882805,4.196784,12.534064,11.138146,4.394077,24.667049,28.039288,5.321372,4.87241,6.456921,5.149247,32.407121,17.453276,3.666459,8.721911,10.421005,4.10021,9.296581,10.479871,4.01818,12.259079,10.351797,4.794735,17.398866,21.891759,5.174453,5.577139,7.395014,4.936033


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11067 entries, 0 to 11066
Columns: 167 entries, id to away_pitcher_wpa_def_skew
dtypes: bool(1), float64(157), int64(1), object(8)
memory usage: 14.0+ MB


Checking for missing values. It seems like majority of column has missing values.

In [5]:
missing_df = pd.DataFrame({
    'type': df.dtypes,
    'missing': df.isnull().any(),
    'missing count': df.isnull().sum(),
    'missing percentage': (df.isnull().sum() / len(df)) * 100
})
missing_df.sort_values(by='missing percentage', ascending=False, inplace=True)
pd.set_option('display.max_rows', None)
missing_df

Unnamed: 0,type,missing,missing count,missing percentage
home_pitcher_earned_run_avg_skew,float64,True,2692,24.324569
home_pitcher_BB_batters_faced_skew,float64,True,2685,24.261317
home_pitcher_H_batters_faced_skew,float64,True,2680,24.216138
home_pitcher_leverage_index_avg_skew,float64,True,2676,24.179995
home_pitcher_wpa_def_skew,float64,True,2673,24.152887
away_pitcher_SO_batters_faced_skew,float64,True,2672,24.143851
home_pitcher_SO_batters_faced_skew,float64,True,2672,24.143851
away_pitcher_BB_batters_faced_skew,float64,True,2660,24.035421
away_pitcher_wpa_def_skew,float64,True,2647,23.917954
away_pitcher_leverage_index_avg_skew,float64,True,2646,23.908918


## Dealing Missing Column

### Season
We can infer in season column by 'date', 'home_team_season' or 'away_team_season' column

In [7]:
missing = df[df['season'].isna()]
missing[['id', 'date', 'season', 'home_team_abbr', 'home_team_season', 'away_team_season']].head(5)

Unnamed: 0,id,date,season,home_team_abbr,home_team_season,away_team_season
18,18,2017-04-20,,RAV,MIL_2017,STL_2017
36,36,2021-04-30,,RKN,SEA_2021,LAA_2021
62,62,2019-07-26,,SAJ,TOR_2019,TBR_2019
108,108,2021-06-18,,JBM,KCR_2021,BOS_2021
130,130,2018-05-08,,UPV,NYY_2018,BOS_2018


In [12]:
from datetime import datetime
import warnings



def fill_season(df):
    def fill(row):
        if not pd.isna(row['season']):
            return row['season']

        if isinstance(row['date'], str):
            d = datetime.strptime(row['date'], '%Y-%m-%d')
            return d.year

        if isinstance(row['home_team_season'], str):
            return int(row['home_team_season'][-4:])

        warnings.warn("unable to infer season. use mock season instead")
        return 1

    df['season'] = df.apply(fill, axis=1)
    

x = df.copy()
print("row with missing season:", len(x[x['season'].isna()]))
fill_season(x)
x[['id', 'date', 'season', 'home_team_abbr', 'home_team_season', 'away_team_season']].head(5)
print("row with missing season after fill:", len(x[x['season'].isna()]))


row with missing season: 553
row with missing season after fill: 0


### Team statistic

In [9]:
missing = df[df['home_batting_RBI_mean'].isna()]
missing[['id', 'date', 'home_batting_RBI_mean']].head(5)

Unnamed: 0,id,date,home_batting_RBI_mean
15,15,2017-04-06,
51,51,2017-07-06,
65,65,2018-03-29,
110,110,2017-04-03,
115,115,2018-06-24,


In [None]:
# fill in team' numeric statistic base on team's season performance
# - you should apply this AFTER fill in the missing season value
# - you should NOT apply this for pitcher stat (i.e. 'home_pitching_H_batters_faced_mean') 
#   as it should be infer differently
# - here we assume a team's home & away stat should be simialr
def fill_stat(df, key):
    def fill(row):
        if not pd.isna(row[key]):
            return row[key]

        team_col = 'away_team_abbr' if 'away_' in key else 'home_team_abbr'
        team_abbr = row[team_col]
        season = row['season']

        # Here we want to infer a team performance stat base on past season data. 
        # The same performance could be appear in both "home_" and "_away"
        # i.e. home_pitching_H_batters_faced_mean and away_pitching_H_batters_faced_mean
        # We would use both the 'home' and 'away' of version of stat and take the average
        stat_key = 'tmp_key'

        away_stat_key = key.replace('home_', 'away_')
        away_stat = df[(df['away_team_abbr'] == team_abbr) & 
                       (df[away_stat_key].notna())]
        away_stat.rename(columns={away_stat_key: stat_key}, inplace=True)
        
        home_stat_key = key.replace('away_', 'home_')
        home_stat = df[(df['home_team_abbr'] == team_abbr) & 
                       (df[home_stat_key].notna())]
        home_stat.rename(columns={home_stat_key: stat_key}, inplace=True)

        team_stat = pd.concat([away_stat, home_stat], axis=0)
        season_stat = team_stat[team_stat['season'] == season]

        if len(season_stat) > 0:
            return season_stat[stat_key].mean()

        # # TODO: not sure if cross season stat make sense.
        # if len(team_stat) > 0:
        #     return team_stat[stat_key].mean()
        return row[key]
# 
    df[key] = df.apply(fill, axis=1)
    missing_count = len(df[df[key].isna()])
    if missing_count > 0:
        warnings.warn(f"fill_stat: {key} column still has {missing_count} missing values")


x = df.copy()
fill_season(x)

key = "home_batting_RBI_mean"
print(f"row with missing {key}:", len(x[x[key].isna()]))
fill_stat(x, key)
print(f"row with missing {key} after:", len(x[x[key].isna()]))

row with missing home_batting_RBI_mean: 650
row with missing home_batting_RBI_mean after: 0


Here are list of the team performance statistic that we could apply fill_stat() with

In [32]:
float_col = list(df.select_dtypes(include='float').columns)
exclude = ['season', 'home_team_rest', 'away_team_rest']
team_stat_col = [_ for _ in float_col if 
                 'pitcher' not in _ and 
                 'pitching' not in _ and 
                 _ not in exclude]
team_stat_col

['home_batting_batting_avg_10RA',
 'home_batting_onbase_perc_10RA',
 'home_batting_onbase_plus_slugging_10RA',
 'home_batting_leverage_index_avg_10RA',
 'home_batting_RBI_10RA',
 'away_batting_batting_avg_10RA',
 'away_batting_onbase_perc_10RA',
 'away_batting_onbase_plus_slugging_10RA',
 'away_batting_leverage_index_avg_10RA',
 'away_batting_RBI_10RA',
 'home_team_errors_mean',
 'home_team_errors_std',
 'home_team_errors_skew',
 'away_team_errors_mean',
 'away_team_errors_std',
 'away_team_errors_skew',
 'home_team_spread_mean',
 'home_team_spread_std',
 'home_team_spread_skew',
 'away_team_spread_mean',
 'away_team_spread_std',
 'away_team_spread_skew',
 'home_team_wins_mean',
 'home_team_wins_std',
 'home_team_wins_skew',
 'away_team_wins_mean',
 'away_team_wins_std',
 'away_team_wins_skew',
 'home_batting_batting_avg_mean',
 'home_batting_batting_avg_std',
 'home_batting_batting_avg_skew',
 'home_batting_onbase_perc_mean',
 'home_batting_onbase_perc_std',
 'home_batting_onbase_perc

In [31]:
x = df.copy()
fill_season(x)
for col in tqdm.tqdm(team_stat_col):
    fill_stat(x, col)


100%|██████████| 67/67 [04:51<00:00,  4.34s/it]


## Task 2 dataset analysis

In [None]:
task2_df = pd.read_csv('data/task2/2024_test_data.csv')
task2_df

Unnamed: 0,id,home_team_abbr,away_team_abbr,is_night_game,home_pitcher,away_pitcher,home_team_rest,away_team_rest,home_pitcher_rest,away_pitcher_rest,season,home_batting_batting_avg_10RA,home_batting_onbase_perc_10RA,home_batting_onbase_plus_slugging_10RA,home_batting_leverage_index_avg_10RA,home_batting_RBI_10RA,away_batting_batting_avg_10RA,away_batting_onbase_perc_10RA,away_batting_onbase_plus_slugging_10RA,away_batting_leverage_index_avg_10RA,away_batting_RBI_10RA,home_pitching_earned_run_avg_10RA,home_pitching_SO_batters_faced_10RA,home_pitching_H_batters_faced_10RA,home_pitching_BB_batters_faced_10RA,away_pitching_earned_run_avg_10RA,away_pitching_SO_batters_faced_10RA,away_pitching_H_batters_faced_10RA,away_pitching_BB_batters_faced_10RA,home_pitcher_earned_run_avg_10RA,home_pitcher_SO_batters_faced_10RA,home_pitcher_H_batters_faced_10RA,home_pitcher_BB_batters_faced_10RA,away_pitcher_earned_run_avg_10RA,away_pitcher_SO_batters_faced_10RA,away_pitcher_H_batters_faced_10RA,away_pitcher_BB_batters_faced_10RA,home_team_season,away_team_season,home_team_errors_mean,home_team_errors_std,home_team_errors_skew,away_team_errors_mean,away_team_errors_std,away_team_errors_skew,home_team_spread_mean,home_team_spread_std,home_team_spread_skew,away_team_spread_mean,away_team_spread_std,away_team_spread_skew,home_team_wins_mean,home_team_wins_std,home_team_wins_skew,away_team_wins_mean,away_team_wins_std,away_team_wins_skew,home_batting_batting_avg_mean,home_batting_batting_avg_std,home_batting_batting_avg_skew,home_batting_onbase_perc_mean,home_batting_onbase_perc_std,home_batting_onbase_perc_skew,home_batting_onbase_plus_slugging_mean,home_batting_onbase_plus_slugging_std,home_batting_onbase_plus_slugging_skew,home_batting_leverage_index_avg_mean,home_batting_leverage_index_avg_std,home_batting_leverage_index_avg_skew,home_batting_wpa_bat_mean,home_batting_wpa_bat_std,home_batting_wpa_bat_skew,home_batting_RBI_mean,home_batting_RBI_std,home_batting_RBI_skew,away_batting_batting_avg_mean,away_batting_batting_avg_std,away_batting_batting_avg_skew,away_batting_onbase_perc_mean,away_batting_onbase_perc_std,away_batting_onbase_perc_skew,away_batting_onbase_plus_slugging_mean,away_batting_onbase_plus_slugging_std,away_batting_onbase_plus_slugging_skew,away_batting_leverage_index_avg_mean,away_batting_leverage_index_avg_std,away_batting_leverage_index_avg_skew,away_batting_wpa_bat_mean,away_batting_wpa_bat_std,away_batting_wpa_bat_skew,away_batting_RBI_mean,away_batting_RBI_std,away_batting_RBI_skew,home_pitching_earned_run_avg_mean,home_pitching_earned_run_avg_std,home_pitching_earned_run_avg_skew,home_pitching_SO_batters_faced_mean,home_pitching_SO_batters_faced_std,home_pitching_SO_batters_faced_skew,home_pitching_H_batters_faced_mean,home_pitching_H_batters_faced_std,home_pitching_H_batters_faced_skew,home_pitching_BB_batters_faced_mean,home_pitching_BB_batters_faced_std,home_pitching_BB_batters_faced_skew,home_pitching_leverage_index_avg_mean,home_pitching_leverage_index_avg_std,home_pitching_leverage_index_avg_skew,home_pitching_wpa_def_mean,home_pitching_wpa_def_std,home_pitching_wpa_def_skew,away_pitching_earned_run_avg_mean,away_pitching_earned_run_avg_std,away_pitching_earned_run_avg_skew,away_pitching_SO_batters_faced_mean,away_pitching_SO_batters_faced_std,away_pitching_SO_batters_faced_skew,away_pitching_H_batters_faced_mean,away_pitching_H_batters_faced_std,away_pitching_H_batters_faced_skew,away_pitching_BB_batters_faced_mean,away_pitching_BB_batters_faced_std,away_pitching_BB_batters_faced_skew,away_pitching_leverage_index_avg_mean,away_pitching_leverage_index_avg_std,away_pitching_leverage_index_avg_skew,away_pitching_wpa_def_mean,away_pitching_wpa_def_std,away_pitching_wpa_def_skew,home_pitcher_earned_run_avg_mean,home_pitcher_earned_run_avg_std,home_pitcher_earned_run_avg_skew,home_pitcher_SO_batters_faced_mean,home_pitcher_SO_batters_faced_std,home_pitcher_SO_batters_faced_skew,home_pitcher_H_batters_faced_mean,home_pitcher_H_batters_faced_std,home_pitcher_H_batters_faced_skew,home_pitcher_BB_batters_faced_mean,home_pitcher_BB_batters_faced_std,home_pitcher_BB_batters_faced_skew,home_pitcher_leverage_index_avg_mean,home_pitcher_leverage_index_avg_std,home_pitcher_leverage_index_avg_skew,home_pitcher_wpa_def_mean,home_pitcher_wpa_def_std,home_pitcher_wpa_def_skew,away_pitcher_earned_run_avg_mean,away_pitcher_earned_run_avg_std,away_pitcher_earned_run_avg_skew,away_pitcher_SO_batters_faced_mean,away_pitcher_SO_batters_faced_std,away_pitcher_SO_batters_faced_skew,away_pitcher_H_batters_faced_mean,away_pitcher_H_batters_faced_std,away_pitcher_H_batters_faced_skew,away_pitcher_BB_batters_faced_mean,away_pitcher_BB_batters_faced_std,away_pitcher_BB_batters_faced_skew,away_pitcher_leverage_index_avg_mean,away_pitcher_leverage_index_avg_std,away_pitcher_leverage_index_avg_skew,away_pitcher_wpa_def_mean,away_pitcher_wpa_def_std,away_pitcher_wpa_def_skew
0,0,DPS,SAJ,False,blairaa01,dunnju01,1.0,1.0,7.0,6.0,,-1.122458,-1.118116,-0.915578,-0.699805,,0.354735,0.259967,-0.303708,1.094313,-0.115893,1.332655,0.255916,0.638031,-0.396648,-0.759318,-1.104203,-1.410118,0.226804,-0.775331,3.022300,-0.973488,1.066371,-0.447844,-0.514754,-0.528817,0.389921,TEX_2024,TOR_2024,-1.052681,-0.673092,1.838264,0.075544,0.125877,0.211757,-0.351518,0.366117,,-0.336576,0.184996,-0.314981,-0.181406,,,-0.239891,0.283409,0.243983,-0.528007,0.400638,0.102232,,0.225211,0.169606,-0.794530,,0.013757,,-0.139931,0.836251,-0.247897,-0.260054,0.092159,-0.386119,-0.320083,0.534643,-0.246247,0.616449,-0.474843,-0.190175,0.108939,-0.464252,-0.330200,0.183616,0.019473,-0.614154,0.216435,0.582872,-0.115108,-0.466228,-0.259409,-0.345794,-0.505125,-0.135654,0.311502,0.586301,-0.030364,0.041309,0.760473,-0.306970,-0.244061,0.523701,0.838670,0.050384,0.924640,-0.019869,-0.024671,0.640099,,-0.025114,0.218565,1.446401,0.145135,0.136853,0.156962,-0.299678,0.058575,-0.530993,-0.191289,0.457603,-0.046686,-0.254139,-0.085242,0.556786,-0.594808,0.284266,1.098377,-0.212914,0.397355,-0.023137,-0.807241,,,3.140188,,,-0.997278,,,1.074709,,,,,,0.557217,,,0.144568,0.313271,1.367196,-0.061208,0.393631,0.221379,0.216385,,1.137587,-0.202847,-0.169020,-0.365553,-0.227047,-0.295909,0.686374,-0.203924,0.245775,-0.510284
1,1,JEM,GKO,True,mejiaje02,hatchto01,1.0,2.0,13.0,6.0,2024.0,0.856551,1.008800,1.526428,-1.210694,2.545189,-0.382132,-0.829936,-0.394291,0.008343,-0.027268,0.989677,-1.204719,0.502841,1.506506,0.491999,-0.643639,0.598117,0.730304,0.571317,-1.482033,1.058242,-1.847656,,-0.306880,0.607068,-0.885383,LAD_2024,COL_2024,-0.097982,0.088865,1.019290,-0.353554,0.029165,0.725645,,0.327669,0.270570,-1.328619,0.151831,0.482004,0.820063,0.132775,-0.868637,-1.003630,0.063161,1.060515,0.497257,-0.398442,0.797998,0.715206,-0.368947,0.918691,0.906208,0.063133,0.520191,-0.314547,0.290048,0.215041,0.970108,-0.520818,-0.581545,1.015375,,0.198411,-0.273807,0.322573,0.343204,-0.711362,-0.056071,-0.089238,-0.441442,0.236889,0.497038,-0.327582,-0.469287,0.992460,-0.367146,0.261266,0.986554,-0.327445,0.273577,1.060614,-0.316504,0.007320,0.114559,0.060014,0.937630,0.192129,,-0.027363,0.724446,-0.259323,1.034531,1.847946,-0.229924,-0.521095,-0.282503,0.256996,0.082995,,1.601860,1.101922,0.004018,-1.864758,-1.055770,-0.156593,1.977936,0.235234,0.520883,0.248370,0.569751,0.794789,,1.083414,1.083207,-1.123999,0.774567,0.909697,-0.014766,1.283109,1.822854,-0.376978,2.692938,1.039441,-0.720053,2.435389,0.370278,-1.490649,0.590453,2.190617,-0.082393,1.225809,,-0.125638,-1.754527,-3.017312,0.510350,0.702923,1.424974,-0.676274,-0.152144,0.966688,1.053577,0.384860,0.977567,-0.714865,-0.838365,-0.428443,-0.124948,-0.281637,1.065657,-0.282993,,0.191600
2,2,MZG,HAN,True,battepe01,iveyty01,1.0,1.0,6.0,,2024.0,0.970327,0.520975,0.720999,-0.230340,0.503830,0.537230,0.188604,0.051657,0.255779,0.149983,0.855792,-0.357729,,-1.313539,,-0.956135,1.296707,0.432862,-0.323396,0.398236,0.491088,,0.713173,-0.197103,1.157990,-0.561028,PHI_2024,WSN_2024,0.084519,0.103744,0.536988,0.501230,1.112946,0.598686,0.739924,0.643366,-1.005724,-0.506196,0.273083,-0.610710,0.750145,0.158887,-0.792418,-0.438186,0.253058,0.450643,0.697439,-0.643797,0.681727,0.587815,-0.357284,0.657647,0.553765,-0.372789,0.769045,-0.216323,0.739661,1.155456,0.488044,-0.501058,-1.096395,,-0.495816,-0.704631,-0.172553,0.344566,-0.199197,-0.383952,0.349166,0.380759,-0.735998,0.023606,0.254737,-0.175317,0.221491,0.602068,0.022081,-0.243664,0.644656,-0.334250,0.043106,-0.176783,-0.568188,0.248278,1.024477,0.540002,-0.176868,-0.762598,-0.265824,0.181802,-0.037613,-1.024143,-0.573942,-0.718447,-0.072050,0.405092,0.767556,0.642357,-0.274235,-0.586719,,0.215789,0.118056,,0.927612,0.766934,0.969461,0.326435,0.078836,-0.662829,0.245726,1.141985,-0.483935,-0.507466,,-0.661222,0.260568,-0.121066,-0.254825,-0.229131,1.069182,0.134214,-0.585050,-0.084149,-0.000867,0.329774,0.860986,-0.506950,-0.036807,0.188160,-0.054996,-0.544842,-1.091275,0.326671,0.659816,-0.457241,0.477057,0.097315,1.435626,-0.867326,-0.712695,-0.350107,1.285855,0.255066,0.787161,-0.220152,-0.530733,-1.110442,-0.549621,-0.021826,0.494930,-1.128590,-0.184698,0.851088
3,3,GKO,JEM,True,hellije01,sadleca02,1.0,1.0,6.0,6.0,2024.0,0.539358,0.010385,0.072213,-0.969057,0.060056,0.096487,0.765993,,0.248906,0.947614,1.484722,-1.145308,2.140446,-1.377145,-0.811012,0.295958,,-0.156639,-0.622773,-0.453615,-0.692836,0.514359,0.397412,-1.404006,-0.004266,1.109634,COL_2024,LAD_2024,-0.532047,-0.419173,0.154709,-0.711940,-0.093136,1.765351,-1.450646,0.563212,0.270581,1.243742,-0.023249,0.909474,-1.302190,-0.102661,1.410029,0.879359,,-0.932603,0.102236,0.804268,-0.368847,-0.463892,0.382646,-0.939606,-0.551357,0.627930,0.219928,-0.437634,-0.454563,1.478818,-0.401521,0.757664,0.950292,-0.455010,,0.221317,0.471547,0.056653,0.984445,0.851270,0.182997,0.658008,0.831818,0.505364,0.589240,-0.188589,0.559259,-0.027527,0.414222,-0.589195,-1.326480,0.623846,0.600468,-0.274643,1.812241,1.818456,,-2.236974,-0.388676,0.189621,2.083423,0.873187,1.499261,0.519136,0.428750,1.349211,-0.366767,1.014365,1.113404,,1.070568,0.571114,-1.171813,-0.733269,0.923470,,1.628657,0.167930,-1.366271,,,,1.829273,2.834492,-0.979342,-0.907174,0.576519,0.895148,,-0.122107,-0.386003,,-0.569329,,0.223143,0.090387,-0.198340,,0.830978,0.383825,0.200055,-0.354953,0.010850,-0.478474,1.348079,,0.568829,-0.405793,0.174273,0.684602,1.459065,-1.285392,-0.829102,-0.599678,-0.214839,,1.845729,1.178937,1.585955,0.066516,0.521931,-0.393085,0.712029,0.546615,-0.240162,-0.625477
4,4,UPV,SAJ,False,carraca01,bolsimi01,,,6.0,,2024.0,,0.264519,-0.401893,0.939378,-0.421756,-2.606014,-1.485175,-1.917281,-1.382012,-1.179401,-1.538677,-1.072715,-0.041415,0.109353,,-1.045221,1.409500,0.152017,-1.387443,-0.932273,-1.313120,-0.020023,0.349758,-0.607592,1.027879,1.577528,NYY_2024,TOR_2024,2.692678,1.135803,-0.490708,1.802002,1.558897,-0.291532,0.907161,,-2.699399,-1.819793,2.955421,0.677613,3.114723,-2.361969,-4.620156,-0.631137,0.205666,0.654447,-0.343097,-1.290051,-4.262657,0.427039,0.411147,-0.659542,-0.536749,-0.885461,-3.998644,1.575401,-1.296706,-2.633310,2.008011,-3.081502,-5.696891,-0.596939,-2.451749,-6.748238,-4.036012,2.942868,-2.828148,-2.299323,3.097025,-1.373503,-2.864297,2.785303,-1.511531,-2.544707,-0.460121,1.262706,-0.310299,-0.913545,1.157615,-1.819868,0.997428,0.258731,-2.359352,-2.303755,-0.981580,-1.496122,-4.322742,2.316898,0.110018,2.319334,-0.362328,0.049182,-1.312745,1.850079,4.913624,3.546516,-1.420525,2.691108,-3.678381,-5.669104,1.636160,1.609187,-1.697833,-1.437011,-1.449360,,,1.884261,-3.720390,0.068981,-3.741190,-0.676781,-3.998065,-1.715105,1.192414,-0.634581,-1.540666,1.536095,-1.451127,,,-0.978986,,,-1.358745,,,-0.053596,,,0.060234,,,-0.975674,,,0.380084,,,-0.640316,,,1.134152,,,1.609123,,,1.025560,,,-0.600823,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2423,2423,PDF,KFH,False,hudsoda01,neideni01,1.0,1.0,7.0,5.0,2024.0,-0.908698,-1.283977,-0.909733,-0.948345,-0.827492,-1.418567,-1.316500,-1.591613,0.853750,,0.317772,,-0.009712,0.017251,0.675799,0.165427,0.899957,-1.275748,3.286871,-2.572826,-0.030064,4.084134,,-0.722899,0.937912,-2.063351,ATL_2024,TBR_2024,-0.765314,-0.673697,-0.264258,-0.370693,-0.435640,-0.667207,0.514892,0.174860,-1.147182,-0.939880,-0.928879,-0.269424,0.505670,0.231700,-0.530592,-0.317038,0.273797,0.324135,0.048653,-0.489417,-0.029020,-0.259869,-0.159182,-0.201666,-0.111994,-0.611434,,-0.606925,-0.318543,-0.018825,,-0.564658,-0.058356,-0.063453,-0.801103,-0.792618,-0.661881,-0.989269,0.124642,-0.522980,-0.659080,-0.704085,-1.133825,-1.195853,-0.789585,0.971376,1.160882,0.834593,-0.124319,-0.168269,0.142942,-1.056053,-1.177161,-0.594657,-0.685257,0.189148,0.178201,0.553060,-0.514015,,-0.274314,0.604761,0.441087,-0.540266,,0.680417,-0.565781,,1.331727,0.636233,-0.215977,-0.719655,0.323714,-0.383180,-1.157775,-0.060093,-0.543109,-0.467432,0.593607,-0.749869,,-1.010769,-0.272751,,0.639708,0.421789,-0.093532,-0.350476,0.604783,0.276019,3.465821,,,-2.687823,,,0.006799,,,4.208892,,,1.065876,,,-3.852469,,,-0.241241,-0.437724,-1.125303,-0.690852,-0.932437,-0.087502,0.860442,-0.788773,-0.355111,-2.021044,-1.344937,1.476650,-0.512116,-0.686262,,0.191498,,0.187264
2424,2424,SAJ,VJV,True,jeffrje01,valdefr01,1.0,1.0,6.0,5.0,2024.0,0.091150,0.264054,1.273928,-1.417811,0.592584,-1.814546,-1.335963,-1.383737,-1.084500,,-0.735129,,-1.426933,-0.437689,0.727492,-0.896690,0.534093,1.159068,0.307234,0.023613,0.699569,0.062555,0.953261,,2.016528,-0.093571,TOR_2024,LAA_2024,0.102359,0.271989,0.339174,-0.268945,-0.176161,-0.111951,-0.533152,0.176525,-0.982533,-0.815346,-0.024865,-0.458891,-0.246081,0.280098,0.252793,-0.689212,0.187911,0.716511,-0.362096,0.320436,-1.048667,-0.251610,-0.177461,-0.812282,,-0.154265,-0.321570,-0.897569,0.160498,0.943010,-0.041397,-0.635060,-0.263025,-0.417272,-0.842465,-0.661022,-0.615263,0.141466,-0.192637,-0.564893,0.037259,0.257769,-0.780897,-0.580631,-0.290968,0.115544,0.448347,-0.279248,-0.695443,0.225807,-0.337084,-0.816418,-1.066623,-0.899748,0.432072,0.310980,0.005290,-0.377493,-0.126119,-0.753304,0.136288,,0.108607,-0.121198,-0.187957,0.743629,-0.850490,0.073489,1.265737,-0.295244,0.301375,0.144787,0.484527,0.420451,,-0.800640,0.388710,1.093259,-0.302300,-0.712368,0.326888,0.846476,0.662872,0.547592,-1.078154,-0.008312,1.051864,-0.355465,-0.542871,0.773330,0.027846,-0.021375,0.719938,0.011034,-0.794806,-0.570595,0.501385,-0.149337,1.117403,0.226989,-0.548653,-0.573483,,-0.037136,-0.192885,-0.344039,0.109630,0.214558,,0.232465,-1.070422,-2.227434,-0.269483,1.404788,2.187945,0.159867,-0.328963,-0.131224,0.386066,-0.677908,-0.919306,-0.354241,0.252490,-1.200417,1.247802,0.616256
2425,2425,RKN,DPS,True,corbipa01,hanifbr01,1.0,1.0,7.0,46.0,2024.0,0.701402,1.278730,0.736195,-0.251051,1.125113,-0.251287,0.081560,-0.314160,-0.383431,0.149983,-0.808683,1.263641,-0.907381,-1.299968,0.860419,0.721277,,0.211492,-0.011970,1.663469,-0.663065,-1.414912,-0.006744,0.321331,-0.059918,-0.708126,SEA_2024,TEX_2024,-0.401315,-0.146717,-0.231139,-1.001443,-0.658086,1.746474,0.267027,0.042721,0.219791,-0.345431,0.367860,-0.011266,0.050302,0.292323,-0.054818,-0.187460,0.288351,0.189645,-1.412298,-0.572812,0.261528,-0.483574,-0.555198,0.063938,-0.845210,-0.335779,,0.123644,0.808607,0.510688,-0.183727,-0.203117,0.078569,-0.566981,-0.360075,0.402098,-0.474690,0.295898,0.241957,,-0.089163,0.504943,-0.753233,-0.356936,0.281473,-0.050805,-0.088014,0.844395,-0.304180,-0.325949,-0.094105,-0.392739,-0.400442,0.553089,-0.844453,-0.618712,0.449212,0.706331,0.352667,-0.267205,-0.947214,-0.068876,0.255797,-1.702905,-0.917487,0.048213,-0.640058,-0.515768,-0.585841,0.261015,-0.937597,-0.127150,0.264047,0.588273,0.081419,0.077622,0.777468,-0.352597,-0.306137,0.491546,0.930451,,0.882799,0.025191,-0.060138,0.616271,0.111159,0.009203,0.199354,1.439309,-0.304253,-0.007174,,0.820918,-0.330437,,-0.746999,-0.073651,1.477145,-1.101970,-0.573467,0.660382,-0.537950,-0.435091,-0.002225,0.356645,0.542762,0.151129,0.000159,0.032209,0.797971,0.330939,0.422235,-0.135096,-0.025321,-0.146109,-0.910894,-0.771246,-0.523058,0.146930,0.157603,-0.764696,-0.709722,-0.438879,-0.397830,1.075658
2426,2426,FBW,VQC,True,diazjh01,sparkgl01,1.0,1.0,6.0,6.0,2024.0,-0.488072,-0.197753,-0.533320,-0.292475,,0.643972,0.253479,0.170111,-0.534642,0.858988,1.196290,-0.746386,0.483747,-0.130922,1.776957,0.439910,0.953727,0.551365,-0.639306,-0.403766,-1.444233,-0.288193,-0.391284,-0.292223,-0.522543,-0.555486,CLE_2024,PIT_2024,-0.286722,-0.651409,-0.927943,-0.136520,0.027644,-0.046985,0.464292,-0.385852,,-0.333287,-0.023001,-0.139165,,0.205082,-0.637964,-0.335269,0.271118,0.343117,-0.378134,-0.047019,-0.136400,-0.463908,0.457786,-0.463170,-0.342909,0.042950,-0.663943,-0.227997,-0.386314,0.942534,0.335392,0.153511,0.402891,0.033058,-0.122367,-0.666671,-0.495651,-0.349441,,-0.722664,-0.547369,0.664570,-0.804481,-0.542613,1.509146,0.805933,0.664442,0.349249,-0.490551,0.164233,-0.653470,-0.264936,-0.247168,-0.397134,-0.442397,-0.890814,-1.064981,0.308452,0.184860,-0.583575,-0.501571,-0.560052,0.246542,-0.086073,0.051070,0.417292,-0.144182,-0.730359,,0.577905,-0.259009,0.089634,-0.018924,0.543438,,-0.177085,0.074027,0.404068,0.330156,0.262546,-0.535928,-0.452106,0.167621,-0.192632,0.352189,,0.181633,-0.031882,,,-0.664156,-0.671246,0.388417,,0.417673,0.269847,-1.498286,-2.197351,-1.991809,-0.332111,0.861919,0.809192,0.772564,-0.808618,-0.626966,1.166174,-0.686083,-2.146693,-0.407866,-0.066175,0.148799,-0.591781,0.837182,0.247885,-0.776436,0.592520,-0.508469,0.009715,1.124424,0.742245,0.806562,2.978556,1.911847,0.486277,-0.044440,1.138141


Task 2 dataset contains games from 2024, while the train dataset contains games from 2016-2023. It is possible that the new session could contains team or players that do not appear in the previous session. Let check each categorial columns and see if the values shares

In [None]:
train_cols_val = {col: set(task2_df[col].unique()) for col in task2_df.columns if task2_df[col].dtype == 'object'}
task2_cols_val = {col: set(df[col].unique()) for col in df.columns if df[col].dtype == 'object'}

In [31]:
train_cols_val.keys()

dict_keys(['home_team_abbr', 'away_team_abbr', 'is_night_game', 'home_pitcher', 'away_pitcher', 'home_team_season', 'away_team_season'])

In [None]:
for col in train_cols_val.keys():
    print(col, train_cols_val[col] ^ task2_cols_val[col])

home_team_abbr set()
away_team_abbr set()
is_night_game set()
home_pitcher {'mcderch01', 'musgrjo01', 'boscawi01', 'cobbal01', 'imanash01', 'hollojo01', 'schulja02', 'fulmeca01', 'wielajo01', 'yarbrry01', 'felizmi01', 'geedi01', 'shielja02', 'latosma01', 'espinpa01', 'bernabr01', 'carasma01', 'garream01', 'gallayo01', 'hollade01', 'harriky01', 'reedco01', 'irvinja01', 'scottta02', 'scottch01', 'nanceto01', 'greenco01', 'oberhbr01', 'trivilo01', 'volstch01', 'hahnje01', 'gsellro01', 'mccauda01', 'junisja01', 'soriajo01', 'andersh01', 'armstsh01', 'owenshe02', 'tetreja01', 'bowmama01', 'koehlto01', 'parsowe01', 'kennebr02', 'ginnjt01', 'alberma01', 'roberni01', 'cavalca01', 'aquinja01', 'pivetni01', 'hernada02', 'sprinje01', 'myersto01', 'alzolad01', 'webblo01', 'wrighmi01', 'ynoahu01', 'rockeku01', 'iwakuhi01', 'oakstr01', 'liriafr01', 'gibsoky01', 'gonzagi01', 'martico02', 'allenlo01', 'tarnofr01', 'mchugco01', 'phelpda01', 'warread01', 'kleinph01', 'ponceco01', 'woodfja01', 'gilbelo01