# Notebook Mission

The intented use of this notebook is to provide a space where the user can run all of the code written in the `common_tasks`, `data_loader`, `set_piece_extractor`, and `feature_engineering` scripts that are found in the `src.data` sub-module. The end result should be a collection of, what we call in this project, set piece sequences which are a collection of all of the events that corresponding to the attacking effort started by a particular set piece event. This collection of events is saved in the `compiled_sequences` sub-directory found in the `data/interim` directory.

# Neccessary Import Statements

In [1]:
# Allows for changes made to scripts in src to be included in the work
# done in this notebook.
%load_ext autoreload
%autoreload 2

In [6]:
# Data Manipulation
import pandas as pd

# Custom Modules
from src.data import common_tasks as ct
from src.data import data_loader as dl
from src.data import set_piece_extractor as spe
from src.data import feature_engineering as fe

# Load in Full Original Data

In [4]:
full_df = ct.EVENTS_DF
full_df.head(n=15)

Unnamed: 0,eventId,subEventName,tags,playerId,positions,matchId,eventName,teamId,matchPeriod,eventSec,subEventId,id
0,8,Simple pass,[{'id': 1801}],253784,"[{'y': 51, 'x': 50}, {'y': 46, 'x': 31}]",2500686,Pass,3799,1H,1.935181,85,176505119
1,8,High pass,[{'id': 1801}],29474,"[{'y': 46, 'x': 31}, {'y': 74, 'x': 68}]",2500686,Pass,3799,1H,3.599295,83,176505121
2,1,Air duel,"[{'id': 703}, {'id': 1801}]",253784,"[{'y': 74, 'x': 68}, {'y': 54, 'x': 72}]",2500686,Duel,3799,1H,6.827043,10,176505122
3,1,Air duel,"[{'id': 701}, {'id': 1802}]",56441,"[{'y': 26, 'x': 32}, {'y': 46, 'x': 28}]",2500686,Duel,3772,1H,6.985577,10,176505017
4,1,Ground attacking duel,"[{'id': 702}, {'id': 1801}]",366760,"[{'y': 54, 'x': 72}, {'y': 55, 'x': 73}]",2500686,Duel,3799,1H,9.511272,11,176505124
5,1,Ground defending duel,"[{'id': 702}, {'id': 1801}]",28529,"[{'y': 46, 'x': 28}, {'y': 45, 'x': 27}]",2500686,Duel,3772,1H,10.041222,12,176505018
6,1,Ground loose ball duel,"[{'id': 701}, {'id': 1802}]",366760,"[{'y': 55, 'x': 73}, {'y': 62, 'x': 70}]",2500686,Duel,3799,1H,10.499829,13,176505127
7,1,Ground loose ball duel,"[{'id': 703}, {'id': 1801}]",56441,"[{'y': 45, 'x': 27}, {'y': 38, 'x': 30}]",2500686,Duel,3772,1H,11.074277,13,176505019
8,7,Touch,[],26389,"[{'y': 38, 'x': 30}, {'y': 40, 'x': 27}]",2500686,Others on the ball,3772,1H,12.260977,72,176505020
9,7,Touch,[{'id': 1401}],26265,"[{'y': 80, 'x': 67}, {'y': 60, 'x': 73}]",2500686,Others on the ball,3799,1H,13.344344,72,176505132


Evidently, we have north of 3 million total events in our data set!

# Extract Out All Set Piece Events

To help us do this, we can load in the CSV file that describes the mapping between event IDs and the types of events.

In [5]:
event_ID_mapper = dl.event_id_mapper()
event_ID_mapper

Unnamed: 0,event,subevent,event_label,subevent_label
0,1,10,Duel,Air duel
1,1,11,Duel,Ground attacking duel
2,1,12,Duel,Ground defending duel
3,1,13,Duel,Ground loose ball duel
4,2,20,Foul,Foul
5,2,21,Foul,Hand foul
6,2,22,Foul,Late card foul
7,2,23,Foul,Out of game foul
8,2,24,Foul,Protest
9,2,25,Foul,Simulation


Investigating this DataFrame, we notice that all of the set pieces (which includes free kicks, corners, goal kicks, penalties, and throw in's) all fall under the `event` ID 3. Thus, to extract all of the set piece events, a good place to start is to take all of the events who's type has been classified with `event` ID 3. This is why we extract out all of the row instances where its `event` entry is 3 in the `set_piece_initating_events_extractor` function that can be found in the `src/data/set_piece_extractor` module.

In [20]:
sp_beginning_ids = spe.set_piece_initating_events_extractor()
print(len(sp_beginning_ids))

181171


Evidently, we have 181171 set pieces that we will have to accumulate sequences for!

In [None]:
sp_sequences_df = spe.set_piece_sequences_compiler(do_backup=True)
# sp_sequences_df.head(n=15)