## Baseball Prediction: 1 - Data Wrangling
In this notebook we will wrangle data downloaded from www.retrosheet.org into a dataframe suitable for model building.  Specifically, for each game, we will calculate some team statistics over their past 162 games.

At the end, we save our dataframe to a file.  This file will be the starting point for the next notebook, in which we build our first model.

To use this notebook, you must first download the game logs here: https://www.retrosheet.org/gamelogs/index.html
Towards the bottom of the page there are links for ZIP files containing multiple seasons.  Download the 5 zip files for 1980-1989, 1990-1999, ... , 2020-2022, decompress them, and then move all of the single season files to a single directory.  You will then need the path to that directory for the variable `fname` below.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

pd.set_option('display.max_columns',1000)
pd.set_option('display.max_rows',1000)

In [2]:
# Fill in the path to your data here...
fname = '/Users/brianlucena/Desktop/Work/baseball/data/game_data/'+'gl2022.txt'
df = pd.read_csv(fname, header=None)

In [3]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160
0,20220407,0,Thu,SDN,NL,1,ARI,NL,1,2,4,51,N,,,,PHO01,35508.0,198,1100000,000000004,29,4,1,0,0,2,0,0,0,7,0,5,1,0,2,0,7,5,4,4,2,0,24,8,0,0,1,0,26,3,0,0,1,3,0,0,1,7,0,6,0,0,1,0,6,6,2,2,0,0,27,9,0,0,2,0,vanol901,Larry Vanover,belld901,Dan Bellino,barbs901,Sean Barber,valej901,Junior Valentine,,(none),,(none),melvb001,Bob Melvin,lovut001,Tony Lovullo,mantj002,Joe Mantiply,suarr002,Robert Suarez,,(none),beers001,Seth Beer,darvy001,Yu Darvish,bumgm001,Madison Bumgarner,nolaa002,Austin Nola,2,machm001,Manny Machado,5,cronj001,Jake Cronenworth,4,voitl001,Luke Voit,10,myerw001,Wil Myers,9,hosme001,Eric Hosmer,3,profj001,Jurickson Profar,7,kim-h002,Ha-Seong Kim,6,grist001,Trent Grisham,8,varsd001,Daulton Varsho,8,martk001,Ketel Marte,4,perad001,David Peralta,7,walkc002,Christian Walker,3,smitp002,Pavin Smith,9,kellc002,Carson Kelly,2,beers001,Seth Beer,10,ellid002,Drew Ellis,5,perdg001,Gerardo Perdomo,6,,Y
1,20220407,0,Thu,CIN,NL,1,ATL,NL,1,6,3,54,N,,,,ATL03,40545.0,181,12003000,001000020,35,10,0,0,1,6,0,1,1,1,0,13,0,0,1,0,5,5,2,2,0,0,27,7,1,0,0,0,31,4,0,0,1,2,0,0,0,5,0,13,0,0,0,0,6,4,6,6,0,0,27,7,0,0,1,0,laynj901,Jerry Layne,wendh902,Hunter Wendelstedt,whitc901,Chad Whitson,hamaa901,Adam Hamari,,(none),,(none),belld002,David Bell,snitb801,Brian Snitker,mahlt001,Tyler Mahle,friem001,Max Fried,santt001,Tony Santillan,farmk001,Kyle Farmer,mahlt001,Tyler Mahle,friem001,Max Fried,indij001,Jonathan India,4,aquia001,Aristides Aquino,9,phamt001,Tommy Pham,7,vottj001,Joey Votto,3,stept001,Tyler Stephenson,2,senzn001,Nick Senzel,8,mousm001,Mike Moustakas,10,farmk001,Kyle Farmer,6,drurb001,Brandon Drury,5,rosae001,Eddie Rosario,9,olsom001,Matt Olson,3,rilea001,Austin Riley,5,ozunm001,Marcell Ozuna,7,albio001,Ozzie Albies,4,duvaa001,Adam Duvall,8,darnt001,Travis d'Arnaud,2,dicka001,Alex Dickerson,10,swand001,Dansby Swanson,6,,Y
2,20220407,0,Thu,MIL,NL,1,CHN,NL,1,4,5,51,D,,,,CHI11,35112.0,198,100210,00003020x,33,10,4,0,0,4,0,2,1,4,0,9,1,0,2,0,9,3,5,5,0,0,24,10,0,0,1,0,29,8,3,0,1,5,0,1,1,4,0,7,0,1,1,0,6,6,4,4,0,0,27,9,0,0,2,0,barrt901,Ted Barrett,barkl901,Lance Barksdale,lentn901,Nic Lentz,cejan901,Nestor Ceja,,(none),,(none),counc001,Craig Counsell,rossd001,David Ross,givem001,Mychal Givens,ashba003,Aaron Ashby,robed002,David Robertson,happi001,Ian Happ,burnc002,Corbin Burnes,hendk001,Kyle Hendricks,wongk001,Kolten Wong,4,adamw002,Willy Adames,6,yelic001,Christian Yelich,7,mccua001,Andrew McCutchen,10,tellr001,Rowdy Tellez,3,renfh001,Hunter Renfroe,9,narvo001,Omar Narvaez,2,cainl001,Lorenzo Cain,8,petej002,Jace Peterson,5,orter001,Rafael Ortega,10,madrn001,Nick Madrigal,4,contw001,Willson Contreras,2,happi001,Ian Happ,7,schwf001,Frank Schwindel,3,suzus001,Seiya Suzuki,9,heywj001,Jason Heyward,8,wisdp001,Patrick Wisdom,5,hoern001,Nico Hoerner,6,,Y
3,20220407,0,Thu,PIT,NL,1,SLN,NL,1,0,9,51,D,,,,STL10,46256.0,188,0,13000104x,30,6,0,0,0,0,0,0,0,2,0,9,0,1,2,0,5,6,9,9,0,0,24,7,2,0,1,0,31,8,2,0,3,9,0,2,1,7,0,5,1,0,1,0,8,4,0,0,1,0,27,11,0,0,3,0,reynj901,Jim Reynolds,ticht901,Todd Tichenor,muchm901,Mike Muchlinski,merzd901,Dan Merzel,,(none),,(none),sheld801,Derek Shelton,marmo801,Oliver Marmol,waina001,Adam Wainwright,brubj001,JT Brubaker,,(none),oneit001,Tyler O'Neill,brubj001,JT Brubaker,waina001,Adam Wainwright,voged001,Daniel Vogelbach,10,reynb001,Bryan Reynolds,8,hayek001,Ke'Bryan Hayes,5,tsuty001,Yoshi Tsutsugo,3,newmk001,Kevin Newman,6,tuckc001,Cole Tucker,9,gameb001,Ben Gamel,7,perer003,Roberto Perez,2,parkh001,Hoy Jun Park,4,carld002,Dylan Carlson,9,goldp001,Paul Goldschmidt,3,oneit001,Tyler O'Neill,7,arenn001,Nolan Arenado,5,pujoa001,Albert Pujols,10,dejop001,Paul DeJong,6,moliy001,Yadier Molina,2,badeh001,Harrison Bader,8,edmat001,Tommy Edman,4,,Y
4,20220407,0,Thu,NYN,NL,1,WAS,NL,1,5,1,54,N,,,,WAS11,35052.0,211,22100,000001000,35,12,2,0,0,5,0,0,3,4,0,9,0,1,2,0,10,5,1,1,0,0,27,10,1,0,1,0,32,6,1,0,1,1,1,0,0,2,0,10,0,0,1,0,7,6,5,5,1,0,27,10,0,0,2,0,carlm901,Mark Carlson,guccc901,Chris Guccione,bakej902,Jordan Baker,addir901,Ryan Additon,,(none),,(none),showb801,Buck Showalter,martd002,Dave Martinez,megit002,Tylor Megill,corbp001,Patrick Corbin,,(none),mccaj001,James McCann,megit002,Tylor Megill,corbp001,Patrick Corbin,marts002,Starling Marte,9,davij006,J.D. Davis,10,lindf001,Francisco Lindor,6,alonp001,Pete Alonso,3,escoe001,Eduardo Escobar,5,canor001,Robinson Cano,4,canhm001,Mark Canha,8,mcnej002,Jeff McNeil,7,mccaj001,James McCann,2,hernc005,Cesar Hernandez,4,sotoj001,Juan Soto,9,cruzn002,Nelson Cruz,10,bellj005,Josh Bell,3,ruizk001,Keibert Ruiz,2,thoml002,Lane Thomas,7,franm004,Maikel Franco,5,escoa003,Alcides Escobar,6,roblv001,Victor Robles,8,,Y


In [4]:
# Get colnames

In [5]:
colnames = ['date','dblheader_code','day_of_week','team_v','league_v','game_no_v',
           'team_h','league_h','game_no_h', 'runs_v', 'runs_h','outs_total','day_night',
            'completion_info','forfeit_info','protest_info','ballpark_id','attendance','game_minutes',
            'linescore_v','linescore_h',
           'AB_v','H_v','2B_v','3B_v','HR_v','RBI_v','SH_v','SF_v','HBP_v','BB_v','IBB_v','SO_v',
            'SB_v', 'CS_v','GIDP_v','CI_v','LOB_v',
            'P_num_v','ERind_v','ERteam_v','WP_v','balk_v',
            'PO_v','ASST_v','ERR_v','PB_v','DP_v','TP_v',
           'AB_h', 'H_h', '2B_h', '3B_h', 'HR_h', 'RBI_h', 'SH_h', 'SF_h', 'HBP_h', 'BB_h', 'IBB_h','SO_h',
            'SB_h', 'CS_h', 'GIDP_h', 'CI_h', 'LOB_h',
            'P_num_h', 'ERind_h', 'ERteam_h', 'WP_h', 'balk_h',
            'PO_h', 'ASST_h', 'ERR_h', 'PB_h', 'DP_h', 'TP_h',
            'ump_HB_id', 'ump_HB_name','ump_1B_id', 'ump_1B_name','ump_2B_id', 'ump_2B_name',
            'ump_3B_id', 'ump_3B_name','ump_LF_id', 'ump_LF_name','ump_RF_id', 'ump_RF_name',
            'mgr_id_v', 'mgr_name_v', 'mgr_id_h', 'mgr_name_h',
            'pitcher_id_w','pitcher_name_w','pitcher_id_l','pitcher_name_l','pitcher_id_s','pitcher_name_s',
            'GWRBI_id','GWRBI_name','pitcher_start_id_v','pitcher_start_name_v','pitcher_start_id_h','pitcher_start_name_h',
            'batter1_name_v', 'batter1_id_v', 'batter1_pos_v', 'batter2_name_v', 'batter2_id_v', 'batter2_pos_v',
            'batter3_name_v', 'batter3_id_v', 'batter3_pos_v', 'batter4_name_v', 'batter4_id_v', 'batter4_pos_v',
            'batter5_name_v', 'batter5_id_v', 'batter5_pos_v', 'batter6_name_v', 'batter6_id_v', 'batter6_pos_v',
            'batter7_name_v', 'batter7_id_v', 'batter7_pos_v', 'batter8_name_v', 'batter8_id_v', 'batter8_pos_v',
            'batter9_name_v', 'batter9_id_v', 'batter9_pos_v', 'batter1_name_h', 'batter1_id_h', 'batter1_pos_h',
            'batter2_name_h', 'batter2_id_h', 'batter2_pos_h', 'batter3_name_h', 'batter3_id_h', 'batter3_pos_h',
            'batter4_name_h', 'batter4_id_h', 'batter4_pos_h', 'batter5_name_h', 'batter5_id_h', 'batter5_pos_h',
            'batter6_name_h', 'batter6_id_h', 'batter6_pos_h', 'batter7_name_h', 'batter7_id_h', 'batter7_pos_h',
            'batter8_name_h', 'batter8_id_h', 'batter8_pos_h', 'batter9_name_h', 'batter9_id_h', 'batter9_pos_h',           
           'misc_info','acqui_info'
           ]


In [6]:
df.columns = colnames

In [7]:
df.sample(10)

Unnamed: 0,date,dblheader_code,day_of_week,team_v,league_v,game_no_v,team_h,league_h,game_no_h,runs_v,runs_h,outs_total,day_night,completion_info,forfeit_info,protest_info,ballpark_id,attendance,game_minutes,linescore_v,linescore_h,AB_v,H_v,2B_v,3B_v,HR_v,RBI_v,SH_v,SF_v,HBP_v,BB_v,IBB_v,SO_v,SB_v,CS_v,GIDP_v,CI_v,LOB_v,P_num_v,ERind_v,ERteam_v,WP_v,balk_v,PO_v,ASST_v,ERR_v,PB_v,DP_v,TP_v,AB_h,H_h,2B_h,3B_h,HR_h,RBI_h,SH_h,SF_h,HBP_h,BB_h,IBB_h,SO_h,SB_h,CS_h,GIDP_h,CI_h,LOB_h,P_num_h,ERind_h,ERteam_h,WP_h,balk_h,PO_h,ASST_h,ERR_h,PB_h,DP_h,TP_h,ump_HB_id,ump_HB_name,ump_1B_id,ump_1B_name,ump_2B_id,ump_2B_name,ump_3B_id,ump_3B_name,ump_LF_id,ump_LF_name,ump_RF_id,ump_RF_name,mgr_id_v,mgr_name_v,mgr_id_h,mgr_name_h,pitcher_id_w,pitcher_name_w,pitcher_id_l,pitcher_name_l,pitcher_id_s,pitcher_name_s,GWRBI_id,GWRBI_name,pitcher_start_id_v,pitcher_start_name_v,pitcher_start_id_h,pitcher_start_name_h,batter1_name_v,batter1_id_v,batter1_pos_v,batter2_name_v,batter2_id_v,batter2_pos_v,batter3_name_v,batter3_id_v,batter3_pos_v,batter4_name_v,batter4_id_v,batter4_pos_v,batter5_name_v,batter5_id_v,batter5_pos_v,batter6_name_v,batter6_id_v,batter6_pos_v,batter7_name_v,batter7_id_v,batter7_pos_v,batter8_name_v,batter8_id_v,batter8_pos_v,batter9_name_v,batter9_id_v,batter9_pos_v,batter1_name_h,batter1_id_h,batter1_pos_h,batter2_name_h,batter2_id_h,batter2_pos_h,batter3_name_h,batter3_id_h,batter3_pos_h,batter4_name_h,batter4_id_h,batter4_pos_h,batter5_name_h,batter5_id_h,batter5_pos_h,batter6_name_h,batter6_id_h,batter6_pos_h,batter7_name_h,batter7_id_h,batter7_pos_h,batter8_name_h,batter8_id_h,batter8_pos_h,batter9_name_h,batter9_id_h,batter9_pos_h,misc_info,acqui_info
1619,20220807,0,Sun,ATL,NL,110,NYN,NL,109,2,5,51,D,,,,NYC20,37717.0,178,2000,00401000x,29,2,0,0,1,2,0,0,0,1,0,19,0,0,0,0,1,5,5,5,1,0,24,7,0,0,0,0,34,10,4,0,0,4,0,0,0,3,1,8,0,0,0,0,8,3,2,2,0,0,27,2,0,0,0,0,buckc901,CB Bucknor,sches901,Stu Scheurwater,nelsj901,Jeff Nelson,gonzm901,Manny Gonzalez,,(none),,(none),snitb801,Brian Snitker,showb801,Buck Showalter,degrj001,Jacob deGrom,stris002,Spencer Strider,diaze006,Edwin Diaz,alonp001,Pete Alonso,stris002,Spencer Strider,degrj001,Jacob deGrom,swand001,Dansby Swanson,6,olsom001,Matt Olson,3,rilea001,Austin Riley,5,rosae001,Eddie Rosario,7,contw002,William Contreras,2,grosr001,Robbie Grossman,9,ozunm001,Marcell Ozuna,10,harrm004,Michael Harris,8,adrie001,Ehire Adrianza,4,nimmb001,Brandon Nimmo,8,marts002,Starling Marte,9,lindf001,Francisco Lindor,6,alonp001,Pete Alonso,3,voged001,Daniel Vogelbach,10,mcnej002,Jeff McNeil,4,canhm001,Mark Canha,7,guill001,Luis Guillorme,5,nidot001,Tomas Nido,2,,Y
1911,20220828,0,Sun,DET,AL,128,TEX,AL,127,9,8,54,D,,,,ARL03,24938.0,215,31203000,000002033,39,13,5,0,1,9,0,0,1,4,0,8,0,0,1,0,8,5,8,8,3,0,27,15,3,1,3,0,37,10,2,0,4,8,0,0,0,4,0,6,1,0,3,0,6,6,9,9,0,0,27,9,0,0,1,0,ticht901,Todd Tichenor,bacoj901,John Bacon,littw901,Will Little,drakr901,Rob Drake,,(none),,(none),hinca001,A.J. Hinch,beast801,Tony Beasley,hutcd001,Drew Hutchison,arihk001,Kohei Arihara,jimej003,Joe Jimenez,carpk001,Kerry Carpenter,hutcd001,Drew Hutchison,arihk001,Kohei Arihara,greer003,Riley Greene,8,reyev001,Victor Reyes,9,baezj001,Javier Baez,6,casth001,Harold Castro,3,haase001,Eric Haase,2,carpk001,Kerry Carpenter,10,candj002,Jeimer Candelario,5,clemk001,Kody Clemens,4,badda001,Akil Baddoo,7,semim001,Marcus Semien,4,seagc001,Corey Seager,6,lowen001,Nate Lowe,3,garca005,Adolis Garcia,9,calhk001,Kole Calhoun,7,tavel001,Leody Taveras,8,millb002,Brad Miller,10,vilom001,Meibrys Viloria,2,durae002,Ezequiel Duran,5,,Y
464,20220512,0,Thu,PHI,NL,32,LAN,NL,30,9,7,54,N,,,,LOS03,46539.0,209,130111002,001002040,37,12,2,2,2,8,0,2,0,1,0,10,2,0,0,0,4,6,7,7,0,0,27,5,0,0,0,0,39,12,3,0,1,7,0,0,0,5,0,10,0,0,0,0,10,3,9,9,1,0,27,5,1,0,0,0,hobep901,Pat Hoberg,emmep901,Paul Emmel,johna901,Adrian Johnson,mosce901,Edwin Moscoso,,(none),,(none),giraj001,Joe Girardi,robed001,Dave Roberts,bella001,Andrew Bellatti,hudsd001,Daniel Hudson,knebc001,Corey Knebel,,(none),wheez001,Zack Wheeler,andet002,Tyler Anderson,hoskr001,Rhys Hoskins,3,bohma001,Alec Bohm,5,harpb003,Bryce Harper,10,castn001,Nick Castellanos,9,seguj002,Jean Segura,4,realj001,J.T. Realmuto,2,schwk001,Kyle Schwarber,7,camaj001,Johan Camargo,6,quinr003,Roman Quinn,8,bettm001,Mookie Betts,9,freef001,Freddie Freeman,3,turnt001,Trea Turner,6,muncm001,Max Muncy,5,smitw003,Will Smith,2,riose001,Edwin Rios,10,bellc002,Cody Bellinger,8,taylc001,Chris Taylor,7,lux-g001,Gavin Lux,4,,Y
800,20220605,0,Sun,SDN,NL,54,MIL,NL,56,6,4,60,D,,,,MIL06,32285.0,210,300003,1000000201,40,9,2,0,1,6,0,0,0,2,0,11,0,0,0,0,7,5,3,3,0,0,30,10,0,0,1,0,37,8,0,0,2,4,0,0,0,4,0,10,0,0,1,0,8,5,3,3,0,1,30,7,1,0,0,0,rehaj901,Jeremie Rehak,porta901,Alan Porter,mahrn901,Nick Mahrley,littw901,Will Little,,(none),,(none),melvb001,Bob Melvin,counc001,Craig Counsell,hillt002,Tim Hill,gottt001,Trevor Gott,roget001,Taylor Rogers,cronj001,Jake Cronenworth,clevm001,Mike Clevinger,lauee001,Eric Lauer,profj001,Jurickson Profar,7,cronj001,Jake Cronenworth,4,machm001,Manny Machado,5,voitl001,Luke Voit,10,hosme001,Eric Hosmer,3,kim-h002,Ha-Seong Kim,6,nolaa002,Austin Nola,2,grist001,Trent Grisham,8,azocj001,Jose Azocar,9,wongk001,Kolten Wong,4,taylt002,Tyrone Taylor,8,yelic001,Christian Yelich,7,mccua001,Andrew McCutchen,9,tellr001,Rowdy Tellez,3,hiurk001,Keston Hiura,10,petej002,Jace Peterson,5,carav001,Victor Caratini,2,reyep001,Pablo Reyes,6,,Y
765,20220603,0,Fri,WAS,NL,54,CIN,NL,51,8,5,54,N,,,,CIN09,19032.0,189,14010200,200000030,40,12,2,0,5,8,0,0,0,2,0,9,0,0,0,0,7,5,4,4,2,0,27,4,1,0,1,0,33,6,0,0,2,5,0,0,0,5,0,11,0,0,0,0,6,6,7,7,0,0,27,6,2,0,0,0,ramoc901,Charlie Ramos,morag901,Gabe Morales,lives901,Shane Livensparger,fleta901,Andy Fletcher,,(none),,(none),martd002,Dave Martinez,belld002,David Bell,grayj004,Josiah Gray,minom001,Mike Minor,raint003,Tanner Rainey,thoml002,Lane Thomas,grayj004,Josiah Gray,minom001,Mike Minor,hernc005,Cesar Hernandez,4,thoml002,Lane Thomas,7,sotoj001,Juan Soto,9,cruzn002,Nelson Cruz,10,bellj005,Josh Bell,3,franm004,Maikel Franco,5,adamr004,Riley Adams,2,garcl006,Luis Garcia,6,roblv001,Victor Robles,8,senzn001,Nick Senzel,8,drurb001,Brandon Drury,5,phamt001,Tommy Pham,7,vottj001,Joey Votto,3,stept001,Tyler Stephenson,2,mousm001,Mike Moustakas,10,almoa002,Albert Almora,9,lopea005,Alejo Lopez,4,reynm003,Matt Reynolds,6,,Y
1918,20220829,0,Mon,NYA,AL,129,ANA,AL,129,3,4,51,N,,,,ANA01,44537.0,180,1100010,01012000x,32,6,1,0,2,3,1,0,0,3,2,9,0,0,0,0,6,3,4,4,1,0,24,11,0,0,1,0,32,9,0,0,3,4,0,0,1,0,0,8,0,0,1,0,5,4,3,3,0,0,27,6,0,0,0,0,wolcq901,Quinn Wolcott,porta901,Alan Porter,addir901,Ryan Additon,riggj901,Jeremy Riggs,,(none),,(none),boona001,Aaron Boone,nevip001,Phil Nevin,suarj001,Jose Suarez,montf001,Frankie Montas,hergj001,Jimmy Herget,ohtas001,Shohei Ohtani,montf001,Frankie Montas,suarj001,Jose Suarez,lemad001,DJ LeMahieu,5,judga001,Aaron Judge,8,benia002,Andrew Benintendi,7,stanm004,Giancarlo Stanton,10,torrg001,Gleyber Torres,4,rizza001,Anthony Rizzo,3,trevj001,Jose Trevino,2,kinei001,Isiah Kiner-Falefa,6,cabro002,Oswaldo Cabrera,9,fletd002,David Fletcher,4,troum001,Mike Trout,8,ohtas001,Shohei Ohtani,10,rengl001,Luis Rengifo,5,wardt002,Taylor Ward,9,fordm002,Mike Ford,3,adelj001,Jo Adell,7,thaim001,Matt Thaiss,2,velaa001,Andrew Velazquez,6,,Y
1250,20220708,0,Fri,PHI,NL,84,SLN,NL,86,2,0,54,N,,,,STL10,41100.0,143,1010,000000000,31,5,0,0,2,2,0,0,0,0,0,3,0,0,1,0,2,3,0,0,0,1,27,9,0,0,0,0,33,6,0,0,0,0,0,0,0,1,0,5,0,0,0,0,7,1,2,2,0,0,27,13,0,0,1,0,drakr901,Rob Drake,tumpj901,John Tumpane,ticht901,Todd Tichenor,bacoj901,John Bacon,,(none),,(none),thomr003,Robby Thompson,marmo801,Oliver Marmol,wheez001,Zack Wheeler,waina001,Adam Wainwright,handb001,Brad Hand,bohma001,Alec Bohm,wheez001,Zack Wheeler,waina001,Adam Wainwright,schwk001,Kyle Schwarber,7,hoskr001,Rhys Hoskins,3,castn001,Nick Castellanos,9,halld003,Darick Hall,10,realj001,J.T. Realmuto,2,gregd001,Didi Gregorius,6,stotb001,Bryson Stott,4,bohma001,Alec Bohm,5,herro001,Odubel Herrera,8,donob001,Brendan Donovan,7,yepej001,Juan Yepez,3,goldp001,Paul Goldschmidt,10,arenn001,Nolan Arenado,5,gormn001,Nolan Gorman,4,carld002,Dylan Carlson,8,edmat001,Tommy Edman,6,capec001,Conner Capel,9,kniza001,Andrew Knizner,2,,Y
1935,20220830,0,Tue,CHN,NL,130,TOR,AL,128,3,5,51,N,,,,TOR02,33759.0,183,101100,00001310x,32,7,1,0,2,3,0,0,1,3,0,12,0,1,1,0,6,4,5,5,0,0,24,13,0,1,0,0,30,6,1,0,2,5,0,0,2,4,0,3,0,0,0,0,7,5,3,3,0,0,27,7,0,0,2,0,fairc901,Chad Fairchild,mahrn901,Nick Mahrley,gibsh902,Tripp Gibson,onorb901,Brian O'Nora,,(none),,(none),rossd001,David Ross,schnj801,John Schneider,gausk001,Kevin Gausman,littb002,Brendon Little,romaj004,Jordan Romano,hernt002,Teoscar Hernandez,strom001,Marcus Stroman,gausk001,Kevin Gausman,madrn001,Nick Madrigal,4,contw001,Willson Contreras,2,happi001,Ian Happ,7,suzus001,Seiya Suzuki,9,reyef001,Franmil Reyes,10,hoern001,Nico Hoerner,6,velan001,Nelson Velazquez,8,higgp001,P.J. Higgins,3,morec001,Christopher Morel,5,sprig001,George Springer,10,guerv002,Vladimir Guerrero,3,gurrl001,Lourdes Gurriel,7,bichb001,Bo Bichette,6,chapm001,Matt Chapman,5,hernt002,Teoscar Hernandez,9,espis001,Santiago Espinal,4,jansd001,Danny Jansen,2,bradj001,Jackie Bradley,8,,Y
2106,20220912,0,Mon,HOU,AL,141,DET,AL,141,7,0,54,N,,,,DET05,13054.0,163,201020002,000000000,39,13,2,0,0,6,0,0,1,2,0,5,2,0,1,0,8,1,0,0,1,0,27,11,1,0,1,0,31,6,1,0,0,0,0,0,0,1,0,8,0,0,1,0,5,4,7,7,1,0,27,14,0,0,1,0,blasc901,Cory Blaser,rippm901,Mark Ripperger,cuzzp901,Phil Cuzzi,cejan901,Nestor Ceja,,(none),,(none),baked002,Dusty Baker,hinca001,A.J. Hinch,valdf001,Framber Valdez,rodre004,Eduardo Rodriguez,,(none),penaj004,Jeremy Pena,valdf001,Framber Valdez,rodre004,Eduardo Rodriguez,altuj001,Jose Altuve,4,penaj004,Jeremy Pena,6,alvay001,Yordan Alvarez,7,brega001,Alex Bregman,5,tuckk001,Kyle Tucker,9,gurry001,Yulieski Gurriel,3,manct001,Trey Mancini,10,mccoc001,Chas McCormick,8,maldm001,Martin Maldonado,2,greer003,Riley Greene,8,castw003,Willi Castro,9,baezj001,Javier Baez,6,haase001,Eric Haase,2,torks001,Spencer Torkelson,3,candj002,Jeimer Candelario,10,kreir001,Ryan Kreidler,5,schoj001,Jonathan Schoop,4,reyev001,Victor Reyes,7,,Y
937,20220615,0,Wed,OAK,AL,64,BOS,AL,63,1,10,51,N,,,,BOS07,31877.0,188,1000,12120202x,33,7,1,0,1,1,0,0,0,1,0,6,0,0,1,0,6,5,9,9,0,0,24,7,1,0,0,0,36,13,3,0,2,9,0,1,0,7,0,7,0,0,0,0,10,5,1,1,0,0,27,16,0,0,1,0,barkl901,Lance Barksdale,drecb901,Bruce Dreckman,cejan901,Nestor Ceja,barrt901,Ted Barrett,,(none),,(none),kotsm001,Mark Kotsay,coraa001,Alex Cora,wincj001,Josh Winckowski,kaprj001,James Kaprielian,,(none),verda001,Alex Verdugo,kaprj001,James Kaprielian,wincj001,Josh Winckowski,kempt001,Tony Kemp,4,laurr001,Ramon Laureano,8,brows003,Seth Brown,7,bethc001,Christian Bethancourt,3,vogts001,Stephen Vogt,10,andre001,Elvis Andrus,6,murps001,Sean Murphy,2,barrl001,Luis Barrera,9,bridj001,Jonah Bride,5,duraj001,Jarren Duran,8,dever001,Rafael Devers,5,martj006,J.D. Martinez,10,bogax001,Xander Bogaerts,6,verda001,Alex Verdugo,7,stort001,Trevor Story,4,cordf003,Franchy Cordero,3,plawk001,Kevin Plawecki,2,bradj001,Jackie Bradley,9,,Y


## Create a Table with every game since 1980

In [8]:
df = pd.DataFrame()
for year in range(1980,2023):
    fname = '/Users/brianlucena/Desktop/Work/baseball/data/game_data/gl' +str(year)+'.txt'
    df_temp = pd.read_csv(fname, header=None)
    df_temp.columns = colnames
    df_temp['season'] = year
    df = pd.concat((df, df_temp))


In [9]:
df.shape

(96276, 162)

In [10]:
df.info(max_cols=200)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 96276 entries, 0 to 2429
Data columns (total 162 columns):
 #    Column                Non-Null Count  Dtype  
---   ------                --------------  -----  
 0    date                  96276 non-null  int64  
 1    dblheader_code        96276 non-null  int64  
 2    day_of_week           96276 non-null  object 
 3    team_v                96276 non-null  object 
 4    league_v              96276 non-null  object 
 5    game_no_v             96276 non-null  int64  
 6    team_h                96276 non-null  object 
 7    league_h              96276 non-null  object 
 8    game_no_h             96276 non-null  int64  
 9    runs_v                96276 non-null  int64  
 10   runs_h                96276 non-null  int64  
 11   outs_total            96276 non-null  int64  
 12   day_night             96276 non-null  object 
 13   completion_info       83 non-null     object 
 14   forfeit_info          1 non-null      object 
 15   p

In [11]:
# Create columns for outcomes

In [12]:
## Calculate a few useful columns
df['run_diff'] = df['runs_h']-df['runs_v']
df['home_victory'] = (df['run_diff']>0).astype(int)
df['run_total'] = df['runs_h'].copy()+df['runs_v'].copy()
df['date_dblhead'] = (df['date'].astype(str) + df['dblheader_code'].astype(str)).astype(int)


In [13]:
# Do some basic exploration

In [14]:
df.home_victory.mean()

0.538368856205077

### Big picture - have outcome, need features
- Would like to start simple, base features on team results over past *n* games
- Need to do a bit of "data wrangling"
- First, we need to have each team's games in chronological order
- Then can use pandas 'rolling' functionality to aggregate over windows in the past
- Immediate goal, get batting average, on-base percentage, and slugging percentage
- Need "running values" based on the past, not including the current game

In [15]:
# Let's look at a single team's games

In [16]:
df_mets = df.loc[((df.team_v=='NYN') | (df.team_h=='NYN')), :]

In [17]:
df_mets.shape

(6736, 166)

In [18]:
df_mets.head(100)

Unnamed: 0,date,dblheader_code,day_of_week,team_v,league_v,game_no_v,team_h,league_h,game_no_h,runs_v,runs_h,outs_total,day_night,completion_info,forfeit_info,protest_info,ballpark_id,attendance,game_minutes,linescore_v,linescore_h,AB_v,H_v,2B_v,3B_v,HR_v,RBI_v,SH_v,SF_v,HBP_v,BB_v,IBB_v,SO_v,SB_v,CS_v,GIDP_v,CI_v,LOB_v,P_num_v,ERind_v,ERteam_v,WP_v,balk_v,PO_v,ASST_v,ERR_v,PB_v,DP_v,TP_v,AB_h,H_h,2B_h,3B_h,HR_h,RBI_h,SH_h,SF_h,HBP_h,BB_h,IBB_h,SO_h,SB_h,CS_h,GIDP_h,CI_h,LOB_h,P_num_h,ERind_h,ERteam_h,WP_h,balk_h,PO_h,ASST_h,ERR_h,PB_h,DP_h,TP_h,ump_HB_id,ump_HB_name,ump_1B_id,ump_1B_name,ump_2B_id,ump_2B_name,ump_3B_id,ump_3B_name,ump_LF_id,ump_LF_name,ump_RF_id,ump_RF_name,mgr_id_v,mgr_name_v,mgr_id_h,mgr_name_h,pitcher_id_w,pitcher_name_w,pitcher_id_l,pitcher_name_l,pitcher_id_s,pitcher_name_s,GWRBI_id,GWRBI_name,pitcher_start_id_v,pitcher_start_name_v,pitcher_start_id_h,pitcher_start_name_h,batter1_name_v,batter1_id_v,batter1_pos_v,batter2_name_v,batter2_id_v,batter2_pos_v,batter3_name_v,batter3_id_v,batter3_pos_v,batter4_name_v,batter4_id_v,batter4_pos_v,batter5_name_v,batter5_id_v,batter5_pos_v,batter6_name_v,batter6_id_v,batter6_pos_v,batter7_name_v,batter7_id_v,batter7_pos_v,batter8_name_v,batter8_id_v,batter8_pos_v,batter9_name_v,batter9_id_v,batter9_pos_v,batter1_name_h,batter1_id_h,batter1_pos_h,batter2_name_h,batter2_id_h,batter2_pos_h,batter3_name_h,batter3_id_h,batter3_pos_h,batter4_name_h,batter4_id_h,batter4_pos_h,batter5_name_h,batter5_id_h,batter5_pos_h,batter6_name_h,batter6_id_h,batter6_pos_h,batter7_name_h,batter7_id_h,batter7_pos_h,batter8_name_h,batter8_id_h,batter8_pos_h,batter9_name_h,batter9_id_h,batter9_pos_h,misc_info,acqui_info,season,run_diff,home_victory,run_total,date_dblhead
8,19800410,0,Thu,CHN,NL,1,NYN,NL,1,2,5,51,D,,,,NYC17,12219.0,143,1100,01000400x,34,7,3,0,0,2,0,0,0,1,0,3,0,0,1,0,6,3,4,4,0,0,24,12,1,0,1,0,29,8,1,0,0,4,0,1,0,6,2,5,0,2,1,0,7,2,2,2,0,0,27,9,1,0,1,0,kiblj901,John Kibler,froeb901,Bruce Froemming,tatat901,Terry Tata,rennd901,Dutch Rennert,,(none),,(none),gomep101,Preston Gomez,torrj101,Joe Torre,swanc001,Craig Swan,reusr001,Rick Reuschel,allen001,Neil Allen,,(none),reusr001,Rick Reuschel,swanc001,Craig Swan,randl101,Lenny Randle,4,dejei001,Ivan De Jesus,6,buckb001,Bill Buckner,3,kingd001,Dave Kingman,7,hendk101,Ken Henderson,9,ontis101,Steve Ontiveros,5,lezcc101,Carlos Lezcano,8,blact101,Tim Blackwell,2,reusr001,Rick Reuschel,1,tavef101,Frank Taveras,6,madde101,Elliott Maddox,5,mazzl001,Lee Mazzilli,3,hends001,Steve Henderson,7,jorgm001,Mike Jorgensen,9,steaj001,John Stearns,2,moraj101,Jerry Morales,8,flynd001,Doug Flynn,4,swanc001,Craig Swan,1,,Y,1980,3,1,7,198004100
19,19800411,0,Fri,CHN,NL,2,NYN,NL,2,7,5,54,D,,,,NYC17,4460.0,168,23100100,012000101,37,11,1,0,5,7,0,0,0,7,1,5,1,0,1,0,10,3,5,5,1,0,27,17,0,0,1,0,38,13,3,0,0,5,1,0,0,3,0,8,1,0,1,0,10,5,7,7,0,0,27,12,0,0,1,0,froeb901,Bruce Froemming,tatat901,Terry Tata,rennd901,Dutch Rennert,brocf901,Fred Brocklander,,(none),,(none),gomep101,Preston Gomez,torrj101,Joe Torre,lampd001,Dennis Lamp,burrr001,Ray Burris,suttb001,Bruce Sutter,kingd001,Dave Kingman,lampd001,Dennis Lamp,burrr001,Ray Burris,randl101,Lenny Randle,4,dejei001,Ivan De Jesus,6,buckb001,Bill Buckner,3,kingd001,Dave Kingman,7,hendk101,Ken Henderson,9,martj001,Jerry Martin,8,ontis101,Steve Ontiveros,5,footb101,Barry Foote,2,lampd001,Dennis Lamp,1,tavef101,Frank Taveras,6,madde101,Elliott Maddox,5,mazzl001,Lee Mazzilli,3,hends001,Steve Henderson,7,jorgm001,Mike Jorgensen,9,steaj001,John Stearns,2,moraj101,Jerry Morales,8,flynd001,Doug Flynn,4,burrr001,Ray Burris,1,,Y,1980,-2,0,12,198004110
31,19800412,0,Sat,CHN,NL,3,NYN,NL,3,6,3,54,D,,,,NYC17,10781.0,149,420,010001100,37,14,1,0,2,6,0,1,0,3,1,2,1,1,3,0,8,2,2,2,0,0,27,11,2,0,1,0,29,6,0,0,2,3,0,1,1,5,1,6,1,1,0,0,6,4,5,5,2,0,27,16,2,0,4,0,tatat901,Terry Tata,rennd901,Dutch Rennert,brocf901,Fred Brocklander,kiblj901,John Kibler,,(none),,(none),gomep101,Preston Gomez,torrj101,Joe Torre,krukm001,Mike Krukow,allen001,Neil Allen,suttb001,Bruce Sutter,footb101,Barry Foote,krukm001,Mike Krukow,haust101,Tom Hausman,randl101,Lenny Randle,4,dejei001,Ivan De Jesus,6,buckb001,Bill Buckner,3,kingd001,Dave Kingman,7,hendk101,Ken Henderson,9,martj001,Jerry Martin,8,ontis101,Steve Ontiveros,5,footb101,Barry Foote,2,krukm001,Mike Krukow,1,tavef101,Frank Taveras,6,madde101,Elliott Maddox,5,mazzl001,Lee Mazzilli,3,hends001,Steve Henderson,7,younj001,Joel Youngblood,9,steaj001,John Stearns,2,moraj101,Jerry Morales,8,flynd001,Doug Flynn,4,haust101,Tom Hausman,1,,Y,1980,-3,0,9,198004120
45,19800413,0,Sun,CHN,NL,4,NYN,NL,4,0,5,51,D,,,,NYC17,11273.0,153,0,01101200x,30,5,0,0,0,0,0,0,0,3,0,6,0,0,2,0,6,3,4,4,0,0,24,14,1,2,0,0,33,11,4,0,0,4,1,1,0,4,0,4,0,1,0,0,10,2,0,0,0,0,27,12,0,0,2,0,brocf901,Fred Brocklander,rennd901,Dutch Rennert,kiblj901,John Kibler,froeb901,Bruce Froemming,,(none),,(none),gomep101,Preston Gomez,torrj101,Joe Torre,falcp001,Pete Falcone,hernw001,Guillermo Hernandez,allen001,Neil Allen,,(none),hernw001,Guillermo Hernandez,falcp001,Pete Falcone,randl101,Lenny Randle,5,dejei001,Ivan De Jesus,6,buckb001,Bill Buckner,3,kingd001,Dave Kingman,7,vailm001,Mike Vail,9,martj001,Jerry Martin,8,footb101,Barry Foote,2,tysom101,Mike Tyson,4,hernw001,Guillermo Hernandez,1,hends001,Steve Henderson,7,tavef101,Frank Taveras,6,mazzl001,Lee Mazzilli,3,younj001,Joel Youngblood,9,steaj001,John Stearns,2,moraj101,Jerry Morales,8,madde101,Elliott Maddox,5,flynd001,Doug Flynn,4,falcp001,Pete Falcone,1,,Y,1980,5,1,5,198004130
64,19800415,0,Tue,MON,NL,4,NYN,NL,5,7,3,54,D,,,,NYC17,3207.0,150,510001000,010020000,37,12,3,0,2,6,2,1,0,2,0,8,2,0,1,0,8,2,3,3,0,0,27,14,1,0,0,0,33,8,4,0,1,2,0,0,0,3,0,4,0,0,0,0,6,3,3,3,0,0,27,12,6,0,1,0,quicj901,Jim Quick,engeb901,Bob Engel,dalej901,Jerry Dale,brocf901,Fred Brocklander,,(none),,(none),willd104,Dick Williams,torrj101,Joe Torre,roges001,Steve Rogers,swanc001,Craig Swan,normf101,Fred Norman,dawsa001,Andre Dawson,roges001,Steve Rogers,swanc001,Craig Swan,leflr101,Ron LeFlore,7,scotr101,Rodney Scott,4,dawsa001,Andre Dawson,8,valee001,Ellis Valentine,9,parrl002,Larry Parrish,5,cartg001,Gary Carter,2,cromw101,Warren Cromartie,3,speic001,Chris Speier,6,roges001,Steve Rogers,1,tavef101,Frank Taveras,6,mankp101,Phil Mankowski,5,mazzl001,Lee Mazzilli,3,younj001,Joel Youngblood,9,moraj101,Jerry Morales,8,steaj001,John Stearns,2,hends001,Steve Henderson,7,flynd001,Doug Flynn,4,swanc001,Craig Swan,1,,Y,1980,-4,0,10,198004150
75,19800416,0,Wed,MON,NL,5,NYN,NL,6,2,3,51,D,,,,NYC17,2052.0,151,100001000,00300000x,34,9,4,0,0,2,0,1,0,4,0,9,2,1,1,0,10,2,3,3,0,0,24,9,1,0,1,0,35,11,1,0,0,3,0,0,0,1,0,2,0,0,0,0,9,3,2,2,0,0,27,10,2,0,2,0,engeb901,Bob Engel,dalej901,Jerry Dale,brocf901,Fred Brocklander,rungp901,Paul Runge,,(none),,(none),willd104,Dick Williams,torrj101,Joe Torre,burrr001,Ray Burris,lee-b101,Bill Lee,allen001,Neil Allen,moraj101,Jerry Morales,lee-b101,Bill Lee,burrr001,Ray Burris,leflr101,Ron LeFlore,7,scotr101,Rodney Scott,4,dawsa001,Andre Dawson,8,valee001,Ellis Valentine,9,parrl002,Larry Parrish,5,cartg001,Gary Carter,2,cromw101,Warren Cromartie,3,almob001,Bill Almon,6,lee-b101,Bill Lee,1,hends001,Steve Henderson,7,tavef101,Frank Taveras,6,mazzl001,Lee Mazzilli,3,younj001,Joel Youngblood,9,steaj001,John Stearns,2,moraj101,Jerry Morales,8,madde101,Elliott Maddox,5,flynd001,Doug Flynn,4,burrr001,Ray Burris,1,,Y,1980,1,1,5,198004160
84,19800417,0,Thu,NYN,NL,7,CHN,NL,6,1,4,51,D,,,,CHI11,33313.0,118,100000,00002110x,33,8,0,0,0,1,1,0,0,0,0,4,1,0,2,0,6,2,4,4,0,0,24,14,0,0,0,0,31,8,3,0,2,4,1,0,0,1,1,0,0,0,0,0,5,2,1,1,1,0,27,17,1,0,2,0,steld901,Dick Stello,palld901,Dave Pallone,grege901,Eric Gregg,varge901,Ed Vargo,,(none),,(none),torrj101,Joe Torre,gomep101,Preston Gomez,lampd001,Dennis Lamp,haust101,Tom Hausman,suttb001,Bruce Sutter,lezcc101,Carlos Lezcano,haust101,Tom Hausman,lampd001,Dennis Lamp,hends001,Steve Henderson,7,tavef101,Frank Taveras,6,mazzl001,Lee Mazzilli,3,younj001,Joel Youngblood,9,steaj001,John Stearns,2,moraj101,Jerry Morales,8,madde101,Elliott Maddox,5,flynd001,Doug Flynn,4,haust101,Tom Hausman,1,randl101,Lenny Randle,5,dejei001,Ivan De Jesus,6,buckb001,Bill Buckner,3,kingd001,Dave Kingman,7,martj001,Jerry Martin,9,footb101,Barry Foote,2,lezcc101,Carlos Lezcano,8,tysom101,Mike Tyson,4,lampd001,Dennis Lamp,1,,Y,1980,3,1,5,198004170
108,19800419,0,Sat,NYN,NL,8,CHN,NL,7,9,12,51,D,,,,CHI11,20328.0,175,400302000,00010407x,33,9,4,0,0,5,1,2,0,6,0,4,0,0,2,0,6,5,12,12,0,0,24,8,1,0,2,0,38,15,0,0,5,12,0,0,0,4,1,8,0,0,1,0,6,5,6,6,1,0,27,12,1,0,2,0,davis901,Satch Davidson,grege901,Eric Gregg,varge901,Ed Vargo,steld901,Dick Stello,,(none),,(none),torrj101,Joe Torre,gomep101,Preston Gomez,tidrd001,Dick Tidrow,allen001,Neil Allen,suttb001,Bruce Sutter,kingd001,Dave Kingman,falcp001,Pete Falcone,krukm001,Mike Krukow,tavef101,Frank Taveras,6,steaj001,John Stearns,2,mazzl001,Lee Mazzilli,3,younj001,Joel Youngblood,9,hends001,Steve Henderson,7,moraj101,Jerry Morales,8,madde101,Elliott Maddox,5,flynd001,Doug Flynn,4,falcp001,Pete Falcone,1,randl101,Lenny Randle,5,dejei001,Ivan De Jesus,6,buckb001,Bill Buckner,3,kingd001,Dave Kingman,7,martj001,Jerry Martin,9,footb101,Barry Foote,2,lezcc101,Carlos Lezcano,8,tysom101,Mike Tyson,4,krukm001,Mike Krukow,1,,Y,1980,3,1,21,198004190
122,19800420,0,Sun,NYN,NL,9,CHN,NL,8,3,6,51,D,,,,CHI11,23554.0,162,100110000,20001030x,35,9,3,0,0,2,0,0,0,7,0,3,1,1,1,0,12,3,4,4,0,0,24,9,2,0,1,0,32,11,1,0,1,5,1,0,0,3,1,6,2,1,0,0,6,2,1,1,0,0,27,8,4,1,1,0,palld901,Dave Pallone,grege901,Eric Gregg,varge901,Ed Vargo,davis901,Satch Davidson,,(none),,(none),torrj101,Joe Torre,gomep101,Preston Gomez,reusr001,Rick Reuschel,kobek101,Kevin Kobel,tidrd001,Dick Tidrow,dejei001,Ivan De Jesus,swanc001,Craig Swan,reusr001,Rick Reuschel,tavef101,Frank Taveras,6,steaj001,John Stearns,2,mazzl001,Lee Mazzilli,3,younj001,Joel Youngblood,8,hends001,Steve Henderson,7,jorgm001,Mike Jorgensen,9,madde101,Elliott Maddox,5,flynd001,Doug Flynn,4,swanc001,Craig Swan,1,dejei001,Ivan De Jesus,6,ontis101,Steve Ontiveros,5,buckb001,Bill Buckner,3,kingd001,Dave Kingman,7,biitl101,Larry Biittner,9,martj001,Jerry Martin,8,footb101,Barry Foote,2,tysom101,Mike Tyson,4,reusr001,Rick Reuschel,1,,Y,1980,3,1,9,198004200
136,19800421,0,Mon,NYN,NL,10,PHI,NL,9,3,0,54,N,,,,PHI12,23856.0,151,110010,000000000,31,6,0,0,0,3,1,0,1,4,1,6,3,0,0,0,7,2,0,0,1,0,27,8,1,0,0,0,32,5,0,1,0,0,0,0,0,5,0,4,1,0,0,0,10,2,3,3,0,0,27,11,1,0,0,0,willc901,Charlie Williams,pryop901,Paul Pryor,fiels901,Steve Fields,westj901,Joe West,,(none),,(none),torrj101,Joe Torre,greed101,Dallas Green,burrr001,Ray Burris,carls001,Steve Carlton,allen001,Neil Allen,younj001,Joel Youngblood,burrr001,Ray Burris,carls001,Steve Carlton,tavef101,Frank Taveras,6,steaj001,John Stearns,2,mazzl001,Lee Mazzilli,3,younj001,Joel Youngblood,9,hends001,Steve Henderson,7,moraj101,Jerry Morales,8,madde101,Elliott Maddox,5,flynd001,Doug Flynn,4,burrr001,Ray Burris,1,rosep001,Pete Rose,3,mcbrb101,Bake McBride,9,maddg001,Garry Maddox,8,schmm001,Mike Schmidt,5,luzig001,Greg Luzinski,7,boonb001,Bob Boone,2,bowal001,Larry Bowa,6,agual001,Luis Aguayo,4,carls001,Steve Carlton,1,,Y,1980,-3,0,3,198004210


In [19]:
# Write a function to create a team-specific data frame, given the team

In [20]:
def strip_suffix(x, suff):
    if x.endswith(suff):
        return(x[:-len(suff)])
    else:
        return(x)

visit_cols = [col for col in df.columns if not col.endswith('_h')]
visit_cols_stripped = [strip_suffix(col,'_v') for col in visit_cols]
home_cols = [col for col in df.columns if not col.endswith('_v')]
home_cols_stripped = [strip_suffix(col,'_h') for col in home_cols]

## This subsets the game level df by team, to aggregate team statistics easily
## We also create rolling sums with an offset, so that the rollsum number represents
## statistics up to, but not including, the game in question

def create_team_df(team):
    df_team_v = df[(df.team_v==team)]
    opponent = df_team_v['team_h']
    df_team_v = df_team_v[visit_cols]
    df_team_v.columns = visit_cols_stripped
    df_team_v['home_game'] = 0
    df_team_v['opponent'] = opponent

    df_team_h = df[(df.team_h==team)]
    opponent = df_team_h['team_v']
    df_team_h = df_team_h[home_cols]
    df_team_h.columns = home_cols_stripped
    df_team_h['home_game'] = 1
    df_team_h['opponent'] = opponent


    df_team = pd.concat((df_team_h, df_team_v))
    df_team.sort_values(['date', 'game_no'],inplace=True)
    
    for winsize in [162,30]:
        suff = str(winsize)
        for raw_col in ['AB','H','2B','3B','HR','BB','runs','SB','CS','ERR']:
            new_col = 'rollsum_'+raw_col+'_'+suff
            df_team[new_col] = df_team[raw_col].rolling(winsize, closed='left').sum()

        df_team['rollsum_BATAVG_'+suff] = df_team['rollsum_H_'+suff] / df_team['rollsum_AB_'+suff]
        df_team['rollsum_OBP_'+suff] = (df_team['rollsum_H_'+suff] + df_team['rollsum_BB_'+suff]) / (
                                    df_team['rollsum_AB_'+suff]+df_team['rollsum_BB_'+suff])
        df_team['rollsum_SLG_'+suff] = (df_team['rollsum_H_'+suff] + df_team['rollsum_2B_'+suff] 
                                 + 2*df_team['rollsum_3B_'+suff]+ 
                                3*df_team['rollsum_HR_'+suff] ) / (df_team['rollsum_AB_'+suff])
        df_team['rollsum_OBS_'+suff] = df_team['rollsum_OBP_'+suff] + df_team['rollsum_SLG_'+suff]
    
    df_team['season_game'] = df_team['season']*1000 + df_team['game_no']
    df_team.set_index('season_game', inplace=True)
    return(df_team)

In [21]:
df_mets = create_team_df('NYN')

In [22]:
df_mets.sample(10)

Unnamed: 0_level_0,date,dblheader_code,day_of_week,team,league,game_no,runs,outs_total,day_night,completion_info,forfeit_info,protest_info,ballpark_id,attendance,game_minutes,linescore,AB,H,2B,3B,HR,RBI,SH,SF,HBP,BB,IBB,SO,SB,CS,GIDP,CI,LOB,P_num,ERind,ERteam,WP,balk,PO,ASST,ERR,PB,DP,TP,ump_HB_id,ump_HB_name,ump_1B_id,ump_1B_name,ump_2B_id,ump_2B_name,ump_3B_id,ump_3B_name,ump_LF_id,ump_LF_name,ump_RF_id,ump_RF_name,mgr_id,mgr_name,pitcher_id_w,pitcher_name_w,pitcher_id_l,pitcher_name_l,pitcher_id_s,pitcher_name_s,GWRBI_id,GWRBI_name,pitcher_start_id,pitcher_start_name,batter1_name,batter1_id,batter1_pos,batter2_name,batter2_id,batter2_pos,batter3_name,batter3_id,batter3_pos,batter4_name,batter4_id,batter4_pos,batter5_name,batter5_id,batter5_pos,batter6_name,batter6_id,batter6_pos,batter7_name,batter7_id,batter7_pos,batter8_name,batter8_id,batter8_pos,batter9_name,batter9_id,batter9_pos,misc_info,acqui_info,season,run_diff,home_victory,run_total,date_dblhead,home_game,opponent,rollsum_AB_162,rollsum_H_162,rollsum_2B_162,rollsum_3B_162,rollsum_HR_162,rollsum_BB_162,rollsum_runs_162,rollsum_SB_162,rollsum_CS_162,rollsum_ERR_162,rollsum_BATAVG_162,rollsum_OBP_162,rollsum_SLG_162,rollsum_OBS_162,rollsum_AB_30,rollsum_H_30,rollsum_2B_30,rollsum_3B_30,rollsum_HR_30,rollsum_BB_30,rollsum_runs_30,rollsum_SB_30,rollsum_CS_30,rollsum_ERR_30,rollsum_BATAVG_30,rollsum_OBP_30,rollsum_SLG_30,rollsum_OBS_30
season_game,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1
1995130,19950916,0,Sat,NYN,NL,130,10,51,D,,,,NYC17,18351.0,187,05103001x,32,11,3,0,1,8,0,0,0,10,0,9,1,1,1,0,8,5,7,7,0,1,27,6,1,1,0,0,poncl901,Larry Poncino,ripps901,Steve Rippley,quicj901,Jim Quick,davib902,Bob Davidson,,(none),,(none),greed101,Dallas Green,telgd001,Dave Telgheder,greet001,Tommy Greene,franj001,John Franco,telgd001,Dave Telgheder,telgd001,Dave Telgheder,bufod001,Damon Buford,7,vizcj001,Jose Vizcaino,6,everc001,Carl Everett,9,kentj001,Jeff Kent,4,brogr001,Rico Brogna,3,hundt001,Todd Hundley,2,thomr004,Ryan Thompson,8,bogat001,Tim Bogar,5,telgd001,Dave Telgheder,1,,Y,1995,2,1,18,199509160,1,PHI,5639.0,1483.0,243.0,37.0,147.0,478.0,721.0,57.0,46.0,129.0,0.26299,0.320582,0.397411,0.717993,990.0,242.0,34.0,3.0,27.0,94.0,133.0,5.0,11.0,24.0,0.244444,0.309963,0.366667,0.67663
1990154,19900925,0,Tue,NYN,NL,154,3,54,N,,,,MON02,11959.0,188,001000200,36,9,0,1,0,2,0,0,0,3,0,11,1,0,0,0,9,3,1,1,0,0,27,5,1,0,0,0,demud901,Dana DeMuth,grege901,Eric Gregg,crawj901,Jerry Crawford,harvd901,Doug Harvey,,(none),,(none),harrb101,Bud Harrelson,penaa001,Alejandro Pena,nabhc001,Chris Nabholz,,(none),magad001,Dave Magadan,ferns001,Sid Fernandez,millk001,Keith Miller,8,herrt001,Tom Herr,4,teuft001,Tim Teufel,3,mcrek001,Kevin McReynolds,7,tablp001,Pat Tabler,9,jeffg001,Gregg Jefferies,5,johnh001,Howard Johnson,6,obric001,Charlie O'Brien,2,ferns001,Sid Fernandez,1,,Y,1990,-2,0,4,199009250,0,MON,5518.0,1416.0,277.0,20.0,175.0,536.0,786.0,113.0,34.0,138.0,0.256615,0.322431,0.409206,0.731638,982.0,228.0,46.0,2.0,29.0,101.0,134.0,13.0,5.0,21.0,0.232179,0.303786,0.37169,0.675476
1989002,19890405,0,Wed,NYN,NL,2,1,54,D,,,,NYC17,17873.0,149,010000000,31,5,2,0,0,1,0,0,0,2,0,3,0,0,0,0,5,3,2,2,0,0,27,14,3,0,1,0,monte901,Ed Montague,marsr901,Randy Marsh,darlg901,Gary Darling,wendh901,Harry Wendelstedt,,(none),,(none),johnd105,Davey Johnson,delej001,Jose DeLeon,ojedb001,Bob Ojeda,worrt001,Todd Worrell,guerp001,Pedro Guerrero,ojedb001,Bob Ojeda,dyksl001,Lenny Dykstra,8,jeffg001,Gregg Jefferies,4,hernk001,Keith Hernandez,3,strad001,Darryl Strawberry,9,mcrek001,Kevin McReynolds,7,cartg001,Gary Carter,2,johnh001,Howard Johnson,5,elstk001,Kevin Elster,6,ojedb001,Bob Ojeda,1,,Y,1989,-2,0,4,198904050,1,SLN,5474.0,1411.0,257.0,25.0,155.0,552.0,722.0,144.0,55.0,116.0,0.257764,0.325755,0.398794,0.724549,1005.0,279.0,61.0,4.0,39.0,103.0,163.0,27.0,14.0,24.0,0.277612,0.344765,0.462687,0.807452
2011004,20110405,0,Tue,NYN,NL,4,7,54,N,,,,PHI13,45365.0,176,006001000,39,13,1,0,0,6,0,0,1,4,0,7,3,0,1,0,10,4,1,1,0,0,27,6,0,0,1,0,guccc901,Chris Guccione,wintm901,Mike Winters,everm901,Mike Everitt,estam901,Mike Estabrook,,(none),,(none),collt801,Terry Collins,younc003,Chris Young,hamec001,Cole Hamels,,(none),wrigd002,David Wright,younc003,Chris Young,reyej001,Jose Reyes,6,pagaa001,Angel Pagan,8,wrigd002,David Wright,5,beltc001,Carlos Beltran,9,hairs001,Scott Hairston,7,davii001,Ike Davis,3,emaub001,Brad Emaus,4,nickm001,Mike Nickeas,2,younc003,Chris Young,1,,Y,2011,-6,0,8,201104050,0,PHI,5473.0,1364.0,269.0,38.0,130.0,501.0,659.0,129.0,44.0,87.0,0.249223,0.312186,0.383519,0.695705,1024.0,262.0,55.0,3.0,31.0,92.0,134.0,16.0,8.0,18.0,0.255859,0.317204,0.40625,0.723454
2001073,20010621,0,Thu,NYN,NL,73,3,54,N,,,,NYC17,32668.0,185,300000000,35,7,1,0,1,3,0,0,0,3,0,8,0,0,0,0,8,4,8,8,1,0,27,14,2,0,2,0,guccc901,Chris Guccione,eddid901,Doug Eddings,coope901,Eric Cooper,gormb901,Brian Gorman,,(none),,(none),valeb102,Bobby Valentine,blanm001,Matt Blank,appik001,Kevin Appier,,(none),guerv001,Vladimir Guerrero,appik001,Kevin Appier,mcewj001,Joe McEwing,9,zeilt001,Todd Zeile,3,piazm001,Mike Piazza,2,agbab001,Benny Agbayani,7,ventr001,Robin Ventura,5,escoa002,Alex Escobar,8,relad001,Desi Relaford,4,velaj001,Jorge Velandia,6,appik001,Kevin Appier,1,,Y,2001,-7,0,13,200106210,1,MON,5440.0,1391.0,252.0,19.0,170.0,608.0,723.0,58.0,44.0,118.0,0.255699,0.330522,0.402757,0.73328,1027.0,272.0,51.0,2.0,29.0,123.0,144.0,12.0,10.0,12.0,0.264849,0.343478,0.403116,0.746594
1992130,19920901,0,Tue,NYN,NL,130,1,54,N,,,,NYC17,21539.0,191,100000000,33,8,3,0,0,1,1,0,0,4,0,5,1,0,1,0,10,3,4,4,0,0,27,5,0,0,0,0,darlg901,Gary Darling,wendh901,Harry Wendelstedt,marsr901,Randy Marsh,demud901,Dana DeMuth,,(none),,(none),torbj101,Jeff Torborg,niedd001,David Nied,whitw001,Wally Whitehurst,rearj001,Jeff Reardon,justd001,David Justice,whitw001,Wally Whitehurst,colev001,Vince Coleman,8,schod001,Dick Schofield,6,bassk001,Kevin Bass,7,murre001,Eddie Murray,3,bonib001,Bobby Bonilla,9,kentj001,Jeff Kent,4,donnc001,Chris Donnels,5,hundt001,Todd Hundley,2,whitw001,Wally Whitehurst,1,,Y,1992,-3,0,5,199209010,1,ATL,5389.0,1282.0,252.0,16.0,103.0,593.0,608.0,143.0,56.0,131.0,0.237892,0.31344,0.347931,0.661371,1025.0,245.0,40.0,5.0,24.0,101.0,111.0,22.0,13.0,24.0,0.239024,0.307282,0.358049,0.665331
1985114,19850817,0,Sat,NYN,NL,114,4,54,N,,,,PIT07,10200.0,150,000000400,36,10,0,2,1,4,0,0,0,1,0,4,0,1,0,0,6,2,3,3,0,0,27,6,0,1,1,0,ripps901,Steve Rippley,rennd901,Dutch Rennert,brocf901,Fred Brocklander,monte901,Ed Montague,,(none),,(none),johnd105,Davey Johnson,ferns001,Sid Fernandez,tunnl001,Lee Tunnell,mcdor001,Roger McDowell,pacit001,Tom Paciorek,ferns001,Sid Fernandez,dyksl001,Lenny Dykstra,8,backw001,Wally Backman,4,hernk001,Keith Hernandez,3,cartg001,Gary Carter,2,strad001,Darryl Strawberry,9,fostg001,George Foster,7,knigr001,Ray Knight,5,santr001,Rafael Santana,6,ferns001,Sid Fernandez,1,,Y,1985,-1,0,7,198508170,0,PIT,5481.0,1400.0,249.0,31.0,128.0,538.0,703.0,121.0,55.0,123.0,0.255428,0.32198,0.38223,0.70421,1006.0,301.0,58.0,6.0,27.0,99.0,169.0,25.0,11.0,24.0,0.299205,0.361991,0.449304,0.811295
2005100,20050726,0,Tue,NYN,NL,100,3,51,N,,,,DEN02,22518.0,175,001000101,35,9,1,0,1,3,1,0,0,2,0,10,1,1,1,0,8,3,1,1,1,0,24,17,1,0,1,0,barkl901,Lance Barksdale,herna901,Angel Hernandez,gibsg901,Greg Gibson,guccc901,Chris Guccione,,(none),,(none),randw001,Willie Randolph,franj003,Jeff Francis,ishik001,Kazuhisa Ishii,fuenb001,Brian Fuentes,shear001,Ryan Shealy,ishik001,Kazuhisa Ishii,reyej001,Jose Reyes,6,camem001,Mike Cameron,9,beltc001,Carlos Beltran,8,floyc001,Cliff Floyd,7,wrigd002,David Wright,5,piazm001,Mike Piazza,2,woodc001,Chris Woodward,3,cairm001,Miguel Cairo,4,ishik001,Kazuhisa Ishii,1,,Y,2005,1,1,7,200507260,0,COL,5459.0,1371.0,278.0,29.0,171.0,470.0,693.0,120.0,32.0,116.0,0.251145,0.310508,0.406668,0.717176,1022.0,267.0,58.0,8.0,33.0,67.0,150.0,32.0,5.0,23.0,0.261252,0.306703,0.430528,0.737232
2021098,20210726,1,Mon,NYN,NL,98,0,42,N,,,,NYC20,0.0,132,0000000,24,5,1,0,0,0,0,0,0,2,0,4,0,0,2,0,5,2,2,2,0,0,21,7,0,0,1,0,knigb901,Brian Knight,moorm901,Malachi Moore,navaj901,Jose Navas,eddid901,Doug Eddings,,(none),,(none),rojal801,Luis Rojas,mullk001,Kyle Muller,strom001,Marcus Stroman,smitw002,Will Smith,pedej001,Joc Pederson,strom001,Marcus Stroman,villj001,Jonathan Villar,6,alonp001,Pete Alonso,3,confm001,Michael Conforto,9,davij006,J.D. Davis,5,smitd008,Dominic Smith,7,pillk001,Kevin Pillar,8,nidot001,Tomas Nido,2,guill001,Luis Guillorme,4,strom001,Marcus Stroman,1,,Y,2021,-2,0,2,202107261,1,ATL,5261.0,1325.0,245.0,13.0,203.0,522.0,696.0,48.0,29.0,87.0,0.251853,0.319384,0.419122,0.738506,952.0,240.0,37.0,0.0,40.0,98.0,132.0,5.0,4.0,18.0,0.252101,0.321905,0.417017,0.738922
1982096,19820725,0,Sun,NYN,NL,96,2,58,D,,,,SAN01,12614.0,163,0001000010,34,4,0,0,1,2,0,0,0,1,0,7,1,0,1,0,3,3,3,3,0,0,28,15,1,0,3,0,froeb901,Bruce Froemming,quicj901,Jim Quick,willc901,Charlie Williams,kiblj901,John Kibler,,(none),,(none),bambg101,George Bamberger,delel001,Luis DeLeon,allen001,Neil Allen,,(none),gwynt001,Tony Gwynn,pulec001,Charlie Puleo,wilsm001,Mookie Wilson,8,bailb001,Bob Bailor,4,fostg001,George Foster,7,kingd001,Dave Kingman,3,valee001,Ellis Valentine,9,steaj001,John Stearns,2,brooh001,Hubie Brooks,5,gardr001,Ron Gardenhire,6,pulec001,Charlie Puleo,1,,Y,1982,1,1,5,198207250,0,SDN,5462.0,1352.0,216.0,37.0,99.0,457.0,603.0,150.0,66.0,185.0,0.247528,0.305626,0.354998,0.660624,1018.0,251.0,49.0,6.0,16.0,94.0,116.0,25.0,15.0,37.0,0.246562,0.310252,0.353635,0.663886


In [23]:
# Make a dictionary that maps a team name to it's data frame

In [24]:
# Create the team level dataframe for each team - put in dict for easy access
team_data_dict = {}
for team in df.team_v.unique():
    team_data_dict[team] = create_team_df(team)

In [25]:
# Go through the rows of the main dataframe, and augment it with home and visiting teams' features

In [26]:
## Create a variety of summarized statistics for each game
## For each game, we look up the home and visiting team in the team
## data dictionary, and then look up the game, and pull the relevant stats

BATAVG_162_h = np.zeros(df.shape[0])
BATAVG_162_v = np.zeros(df.shape[0])
OBP_162_h = np.zeros(df.shape[0])
OBP_162_v = np.zeros(df.shape[0])
SLG_162_h = np.zeros(df.shape[0])
SLG_162_v = np.zeros(df.shape[0])
OBS_162_h = np.zeros(df.shape[0])
OBS_162_v = np.zeros(df.shape[0])
SB_162_h = np.zeros(df.shape[0])
SB_162_v = np.zeros(df.shape[0])
CS_162_h = np.zeros(df.shape[0])
CS_162_v = np.zeros(df.shape[0])
ERR_162_h = np.zeros(df.shape[0])
ERR_162_v = np.zeros(df.shape[0])
BATAVG_30_h = np.zeros(df.shape[0])
BATAVG_30_v = np.zeros(df.shape[0])
OBP_30_h = np.zeros(df.shape[0])
OBP_30_v = np.zeros(df.shape[0])
SLG_30_h = np.zeros(df.shape[0])
SLG_30_v = np.zeros(df.shape[0])
OBS_30_h = np.zeros(df.shape[0])
OBS_30_v = np.zeros(df.shape[0])
SB_30_h = np.zeros(df.shape[0])
SB_30_v = np.zeros(df.shape[0])
CS_30_h = np.zeros(df.shape[0])
CS_30_v = np.zeros(df.shape[0])
ERR_30_h = np.zeros(df.shape[0])
ERR_30_v = np.zeros(df.shape[0])
i=0
for index, row in df.iterrows():
    if i%1000==0:
        print(i)
    home_team = row['team_h']
    visit_team = row['team_v']
    game_index_v = row['season']*1000 + row['game_no_v']
    game_index_h = row['season']*1000 + row['game_no_h']
    BATAVG_162_h[i] = team_data_dict[home_team].loc[game_index_h,'rollsum_BATAVG_162']
    BATAVG_162_v[i] = team_data_dict[visit_team].loc[game_index_v,'rollsum_BATAVG_162']
    OBP_162_h[i] = team_data_dict[home_team].loc[game_index_h,'rollsum_OBP_162']
    OBP_162_v[i] = team_data_dict[visit_team].loc[game_index_v,'rollsum_OBP_162']
    SLG_162_h[i] = team_data_dict[home_team].loc[game_index_h,'rollsum_SLG_162']
    SLG_162_v[i] = team_data_dict[visit_team].loc[game_index_v,'rollsum_SLG_162']
    OBS_162_h[i] = team_data_dict[home_team].loc[game_index_h,'rollsum_OBS_162']
    OBS_162_v[i] = team_data_dict[visit_team].loc[game_index_v,'rollsum_OBS_162']
    SB_162_h[i] = team_data_dict[home_team].loc[game_index_h,'rollsum_SB_162']
    SB_162_v[i] = team_data_dict[visit_team].loc[game_index_v,'rollsum_SB_162']
    CS_162_h[i] = team_data_dict[home_team].loc[game_index_h,'rollsum_CS_162']
    CS_162_v[i] = team_data_dict[visit_team].loc[game_index_v,'rollsum_CS_162']
    ERR_162_h[i] = team_data_dict[home_team].loc[game_index_h,'rollsum_ERR_162']
    ERR_162_v[i] = team_data_dict[visit_team].loc[game_index_v,'rollsum_ERR_162']
    BATAVG_30_h[i] = team_data_dict[home_team].loc[game_index_h,'rollsum_BATAVG_30']
    BATAVG_30_v[i] = team_data_dict[visit_team].loc[game_index_v,'rollsum_BATAVG_30']
    OBP_30_h[i] = team_data_dict[home_team].loc[game_index_h,'rollsum_OBP_30']
    OBP_30_v[i] = team_data_dict[visit_team].loc[game_index_v,'rollsum_OBP_30']
    SLG_30_h[i] = team_data_dict[home_team].loc[game_index_h,'rollsum_SLG_30']
    SLG_30_v[i] = team_data_dict[visit_team].loc[game_index_v,'rollsum_SLG_30']
    OBS_30_h[i] = team_data_dict[home_team].loc[game_index_h,'rollsum_OBS_30']
    OBS_30_v[i] = team_data_dict[visit_team].loc[game_index_v,'rollsum_OBS_30']
    SB_30_h[i] = team_data_dict[home_team].loc[game_index_h,'rollsum_SB_30']
    SB_30_v[i] = team_data_dict[visit_team].loc[game_index_v,'rollsum_SB_30']
    CS_30_h[i] = team_data_dict[home_team].loc[game_index_h,'rollsum_CS_30']
    CS_30_v[i] = team_data_dict[visit_team].loc[game_index_v,'rollsum_CS_30']
    ERR_30_h[i] = team_data_dict[home_team].loc[game_index_h,'rollsum_ERR_30']
    ERR_30_v[i] = team_data_dict[visit_team].loc[game_index_v,'rollsum_ERR_30']
    i+=1
    

0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
21000
22000
23000
24000
25000
26000
27000
28000
29000
30000
31000
32000
33000
34000
35000
36000
37000
38000
39000
40000
41000
42000
43000
44000
45000
46000
47000
48000
49000
50000
51000
52000
53000
54000
55000
56000
57000
58000
59000
60000
61000
62000
63000
64000
65000
66000
67000
68000
69000
70000
71000
72000
73000
74000
75000
76000
77000
78000
79000
80000
81000
82000
83000
84000
85000
86000
87000
88000
89000
90000
91000
92000
93000
94000
95000
96000


In [27]:
## We then put the constructed arrays into the main game level dataframe
df['BATAVG_162_h'] = BATAVG_162_h
df['BATAVG_162_v'] = BATAVG_162_v
df['OBP_162_h'] = OBP_162_h
df['OBP_162_v'] = OBP_162_v
df['SLG_162_h'] = SLG_162_h
df['SLG_162_v'] = SLG_162_v
df['OBS_162_h'] = OBS_162_h
df['OBS_162_v'] = OBS_162_v
df['SB_162_h'] = SB_162_h
df['SB_162_v'] = SB_162_v
df['CS_162_h'] = CS_162_h
df['CS_162_v'] = CS_162_v
df['ERR_162_h'] = ERR_162_h
df['ERR_162_v'] = ERR_162_v
df['BATAVG_30_h'] = BATAVG_30_h
df['BATAVG_30_v'] = BATAVG_30_v
df['OBP_30_h'] = OBP_30_h
df['OBP_30_v'] = OBP_30_v
df['SLG_30_h'] = SLG_30_h
df['SLG_30_v'] = SLG_30_v
df['OBS_30_h'] = OBS_30_h
df['OBS_30_v'] = OBS_30_v
df['SB_30_h'] = SB_30_h
df['SB_30_v'] = SB_30_v
df['CS_30_h'] = CS_30_h
df['CS_30_v'] = CS_30_v
df['ERR_30_h'] = ERR_30_h
df['ERR_30_v'] = ERR_30_v


In [28]:
df.shape

(96276, 194)

In [29]:
df.sample(5)

Unnamed: 0,date,dblheader_code,day_of_week,team_v,league_v,game_no_v,team_h,league_h,game_no_h,runs_v,runs_h,outs_total,day_night,completion_info,forfeit_info,protest_info,ballpark_id,attendance,game_minutes,linescore_v,linescore_h,AB_v,H_v,2B_v,3B_v,HR_v,RBI_v,SH_v,SF_v,HBP_v,BB_v,IBB_v,SO_v,SB_v,CS_v,GIDP_v,CI_v,LOB_v,P_num_v,ERind_v,ERteam_v,WP_v,balk_v,PO_v,ASST_v,ERR_v,PB_v,DP_v,TP_v,AB_h,H_h,2B_h,3B_h,HR_h,RBI_h,SH_h,SF_h,HBP_h,BB_h,IBB_h,SO_h,SB_h,CS_h,GIDP_h,CI_h,LOB_h,P_num_h,ERind_h,ERteam_h,WP_h,balk_h,PO_h,ASST_h,ERR_h,PB_h,DP_h,TP_h,ump_HB_id,ump_HB_name,ump_1B_id,ump_1B_name,ump_2B_id,ump_2B_name,ump_3B_id,ump_3B_name,ump_LF_id,ump_LF_name,ump_RF_id,ump_RF_name,mgr_id_v,mgr_name_v,mgr_id_h,mgr_name_h,pitcher_id_w,pitcher_name_w,pitcher_id_l,pitcher_name_l,pitcher_id_s,pitcher_name_s,GWRBI_id,GWRBI_name,pitcher_start_id_v,pitcher_start_name_v,pitcher_start_id_h,pitcher_start_name_h,batter1_name_v,batter1_id_v,batter1_pos_v,batter2_name_v,batter2_id_v,batter2_pos_v,batter3_name_v,batter3_id_v,batter3_pos_v,batter4_name_v,batter4_id_v,batter4_pos_v,batter5_name_v,batter5_id_v,batter5_pos_v,batter6_name_v,batter6_id_v,batter6_pos_v,batter7_name_v,batter7_id_v,batter7_pos_v,batter8_name_v,batter8_id_v,batter8_pos_v,batter9_name_v,batter9_id_v,batter9_pos_v,batter1_name_h,batter1_id_h,batter1_pos_h,batter2_name_h,batter2_id_h,batter2_pos_h,batter3_name_h,batter3_id_h,batter3_pos_h,batter4_name_h,batter4_id_h,batter4_pos_h,batter5_name_h,batter5_id_h,batter5_pos_h,batter6_name_h,batter6_id_h,batter6_pos_h,batter7_name_h,batter7_id_h,batter7_pos_h,batter8_name_h,batter8_id_h,batter8_pos_h,batter9_name_h,batter9_id_h,batter9_pos_h,misc_info,acqui_info,season,run_diff,home_victory,run_total,date_dblhead,BATAVG_162_h,BATAVG_162_v,OBP_162_h,OBP_162_v,SLG_162_h,SLG_162_v,OBS_162_h,OBS_162_v,SB_162_h,SB_162_v,CS_162_h,CS_162_v,ERR_162_h,ERR_162_v,BATAVG_30_h,BATAVG_30_v,OBP_30_h,OBP_30_v,SLG_30_h,SLG_30_v,OBS_30_h,OBS_30_v,SB_30_h,SB_30_v,CS_30_h,CS_30_v,ERR_30_h,ERR_30_v
800,20130531,0,Fri,SEA,AL,55,MIN,AL,52,3,0,54,N,,,,MIN04,31430.0,173,3000,000000000,32,8,0,0,1,3,1,0,1,4,0,5,1,0,3,0,8,3,0,0,0,0,27,14,0,0,1,0,33,7,2,0,0,0,0,0,0,2,0,6,0,0,1,0,8,3,3,3,0,0,27,12,1,0,3,0,barrl901,Lance Barrett,mcclt901,Tim McClelland,hudsm901,Marvin Hudson,bellw901,Wally Bell,,(none),,(none),wedge001,Eric Wedge,gardr001,Ron Gardenhire,iwakh001,Hisashi Iwakuma,pelfm001,Mike Pelfrey,wilht001,Tom Wilhelmsen,morak001,Kendrys Morales,iwakh001,Hisashi Iwakuma,pelfm001,Mike Pelfrey,chave002,Endy Chavez,9,bay-j001,Jason Bay,7,seagk001,Kyle Seager,5,morak001,Kendrys Morales,3,ibanr001,Raul Ibanez,10,saunm001,Michael Saunders,8,frann001,Nick Franklin,4,sucrj001,Jesus Sucre,2,ryanb002,Brendan Ryan,6,carrj001,Jamey Carroll,5,dozib001,Brian Dozier,4,mauej001,Joe Mauer,2,willj004,Josh Willingham,7,mornj001,Justin Morneau,3,doumr001,Ryan Doumit,10,parmc001,Chris Parmelee,9,hicka001,Aaron Hicks,8,florp001,Pedro Florimon,6,,Y,2013,-3,0,3,201305310,0.26011,0.234846,0.320739,0.295808,0.392125,0.37441,0.712865,0.670218,128.0,85.0,37.0,31.0,98.0,70.0,0.241706,0.240506,0.306759,0.310954,0.390521,0.407011,0.69728,0.717965,13.0,11.0,5.0,4.0,7.0,12.0
1828,20020817,0,Sat,MIL,NL,122,PIT,NL,123,0,5,51,N,,,,PIT08,25277.0,161,0,00000500x,31,6,1,0,0,0,0,0,0,1,1,5,2,1,1,0,5,3,5,5,0,0,24,8,0,0,0,0,28,6,2,0,0,5,1,0,2,6,1,4,0,1,0,0,8,4,0,0,0,0,27,12,1,0,1,0,nelsj901,Jeff Nelson,kulpr901,Ron Kulpa,joycj901,Jim Joyce,scotd901,Dale Scott,,(none),,(none),roysj001,Jerry Royster,mccll001,Lloyd McClendon,meadb001,Brian Meadows,cabrj001,Jose Cabrera,,(none),hyzda001,Adam Hyzdu,ostij001,Jimmy Osting,meadb001,Brian Meadows,sanca003,Alex Sanchez,8,youne001,Eric Young,4,hammj001,Jeffrey Hammonds,7,sexsr001,Richie Sexson,3,staim001,Matt Stairs,9,hernj001,Jose Hernandez,6,lorem001,Mark Loretta,5,fabrj001,Jorge Fabregas,2,ostij001,Jimmy Osting,1,reesp001,Pokey Reese,4,wilsj002,Jack Wilson,6,kendj001,Jason Kendall,2,ramia001,Aramis Ramirez,5,younk001,Kevin Young,3,wilsc003,Craig Wilson,9,hyzda001,Adam Hyzdu,7,mackr001,Rob Mackowiak,8,meadb001,Brian Meadows,1,,Y,2002,5,1,5,200208170,0.241496,0.258266,0.307481,0.320316,0.374178,0.40687,0.681659,0.727186,87.0,101.0,61.0,47.0,122.0,97.0,0.247734,0.246964,0.323982,0.303371,0.390735,0.371457,0.714717,0.674828,10.0,17.0,3.0,8.0,29.0,18.0
539,20030511,0,Sun,TOR,AL,38,ANA,AL,36,4,2,54,D,,,,ANA01,32129.0,150,120000010,010000010,32,7,1,1,1,4,0,1,0,3,0,10,1,0,1,0,5,2,2,2,0,0,27,9,0,0,0,0,33,6,0,0,1,2,0,0,0,0,0,7,2,0,0,0,4,3,4,4,0,0,27,5,1,0,1,0,timmt901,Tim Timmons,reilm901,Mike Reilly,hohnb901,Bill Hohn,coope901,Eric Cooper,,(none),,(none),toscc801,Carlos Tosca,sciom001,Mike Scioscia,hallr001,Roy Halladay,washj001,Jarrod Washburn,polic001,Cliff Politte,wellv001,Vernon Wells,hallr001,Roy Halladay,washj001,Jarrod Washburn,stews002,Shannon Stewart,7,bordm001,Mike Bordick,6,wellv001,Vernon Wells,8,delgc001,Carlos Delgado,3,wilst003,Tom Wilson,2,phelj001,Josh Phelps,10,hudso001,Orlando Hudson,4,wertj001,Jayson Werth,9,bergd002,Dave Berg,5,ecksd001,David Eckstein,6,kenna001,Adam Kennedy,4,salmt001,Tim Salmon,9,andeg001,Garret Anderson,7,glaut001,Troy Glaus,5,fullb001,Brad Fullmer,10,spies001,Scott Spiezio,3,molib001,Bengie Molina,2,davaj001,Jeff DaVanon,8,,Y,2003,-2,0,6,200305110,0.282524,0.268611,0.33748,0.331534,0.435484,0.439692,0.772963,0.771226,119.0,56.0,55.0,21.0,80.0,110.0,0.277778,0.287476,0.335426,0.354256,0.417154,0.466793,0.75258,0.821049,21.0,7.0,16.0,5.0,16.0,27.0
24,19910410,0,Wed,SLN,NL,2,CHN,NL,2,0,2,51,D,,,,CHI11,11204.0,129,0,00110000x,29,7,1,0,0,0,1,0,0,1,0,6,0,0,2,0,4,3,2,2,0,0,24,9,0,0,0,0,31,8,2,0,0,2,0,1,1,0,0,4,1,0,0,0,7,2,0,0,0,0,27,13,0,0,3,0,grege901,Eric Gregg,wintm901,Mike Winters,westj901,Joe West,rungp901,Paul Runge,,(none),,(none),torrj101,Joe Torre,zimmd101,Don Zimmer,maddg002,Greg Maddux,delej001,Jose DeLeon,smitd001,Dave Smith,bellg001,George Bell,delej001,Jose DeLeon,maddg002,Greg Maddux,thomm001,Milt Thompson,8,smito001,Ozzie Smith,6,gilkb001,Bernard Gilkey,7,guerp001,Pedro Guerrero,3,josef001,Felix Jose,9,zeilt001,Todd Zeile,5,pagnt001,Tom Pagnozzi,2,oquej001,Jose Oquendo,4,delej001,Jose DeLeon,1,waltj001,Jerome Walton,8,sandr001,Ryne Sandberg,4,gracm001,Mark Grace,3,bellg001,George Bell,7,dawsa001,Andre Dawson,9,berrd002,Damon Berryhill,2,dunss001,Shawon Dunston,6,scotg001,Gary Scott,5,maddg002,Greg Maddux,1,,Y,1991,2,1,2,199104100,0.26219,0.255405,0.312073,0.319719,0.391677,0.356907,0.70375,0.676626,151.0,221.0,49.0,75.0,122.0,128.0,0.238671,0.248992,0.296089,0.309546,0.359517,0.370968,0.655606,0.680514,17.0,27.0,12.0,20.0,22.0,26.0
1559,19860820,0,Wed,NYN,NL,121,LAN,NL,121,7,5,54,N,,,,LOS03,36738.0,188,22200001,000050000,38,13,0,0,0,7,0,0,0,4,2,6,0,1,1,0,8,3,5,5,0,0,27,6,0,0,0,0,34,8,1,0,1,5,1,0,0,5,0,8,1,0,0,0,8,4,5,5,0,1,27,15,3,0,1,0,poncl901,Larry Poncino,willb901,Bill Williams,bonig901,Greg Bonin,pullf901,Frank Pulli,,(none),,(none),johnd105,Davey Johnson,lasot101,Tom Lasorda,ferns001,Sid Fernandez,powed001,Dennis Powell,orosj001,Jesse Orosco,ferns001,Sid Fernandez,ferns001,Sid Fernandez,powed001,Dennis Powell,wilsm001,Mookie Wilson,8,teuft001,Tim Teufel,4,hernk001,Keith Hernandez,3,mitck001,Kevin Mitchell,7,strad001,Darryl Strawberry,9,knigr001,Ray Knight,5,heare001,Ed Hearn,2,santr001,Rafael Santana,6,ferns001,Sid Fernandez,1,sax-s001,Steve Sax,4,russb001,Bill Russell,6,madlb001,Bill Madlock,5,marsm001,Mike Marshall,9,cabee001,Enos Cabell,3,treva001,Alex Trevino,2,willr001,Reggie Williams,7,gonzj001,Jose Gonzalez,8,powed001,Dennis Powell,1,,Y,1986,-2,0,12,198608200,0.259535,0.264947,0.322532,0.336705,0.37777,0.409964,0.700302,0.74667,165.0,121.0,73.0,52.0,176.0,139.0,0.264322,0.259728,0.324723,0.332456,0.378894,0.39786,0.703618,0.730316,26.0,21.0,11.0,12.0,30.0,31.0


In [30]:
df.to_csv('df_bp1.csv', index=False)
