# Feature Engineering II

**Now that we have some baseline results from the logistic regression model, we can kick it up a notch with some more features.**

Let's also include the following features:

- Game seconds
- Game period
- Coordinates (x,y, separate columns)
- Shot distance
- Shot angle
- Shot type


In [None]:
import pandas as pd
import numpy as np
import json
import os as os
from datetime import datetime, time, date
from tidy_data import *

In [None]:
# Load the dataset
df = pd.read_csv("nhl_data_train.csv").copy()

# keep only events that directly happen during the game
df = df[df['Event'].isin(['SHOT', 'GOAL', 'FACEOFF', 'HIT', 'GIVEAWAY', 'MISSED_SHOT',
                                  'BLOCKED_SHOT', 'PENALTY', 'TAKEAWAY'])]

# transform data type into a datetime.time object 
df['GameTime'] = df['GameTime'].apply(lambda x: datetime.strptime(x, '%M:%S').time())
df.reset_index()

In [None]:
# add distance and angle columns
df = add_distance(df)
df = add_angle(df)

df

Now, to each shot, add information from the previous events. To each shot, we added the following information from the immediately preceding event as four new features:

- Last event type
- Coordinates of the last event (x, y, separate columns)
- Time from the last event (seconds)
- Distance from the last event


In [None]:
df = add_previous_events(df)

With this new information, we will try to quantify a few more interesting things about the state of the play with the following three features:

- Rebound (bool): True if the last event was also a shot, otherwise False
- Change in shot angle; only include if the shot is a rebound, otherwise 0.
- “Speed”: defined as the distance from the previous event, divided by the time since the previous event. 


In [None]:
df = add_rebound(df)
df = angle_change(df)
df = add_speed(df)

In [None]:
# keep only shots and goals
df = df[df['Event'].isin(['SHOT', 'GOAL'])]

# keep only the selected columns
df = df[['ShotType', 'Period', 'GameTime', 'XCoord', 'YCoord', 'isEmptyNet', 'isGoal',
         'DistanceToGoal', 'ShootingAngle', 'LastEvent', 'LastEvent_XCoord', 'LastEvent_YCoord', 
        'TimeLastEvent', 'DistanceLastEvent', 'Rebound', 'AngleChange', 'Speed']]
df

In [None]:
# save dataset for the advanced models
df.reset_index()
df.to_csv('advanced_models_data.csv')

In [None]:
# def add_powerplay_time(df):
#     time_pp_started = np.zeros(df.shape[0])
#     time = 0
#     penalty_time = {'minor': 120, 'double': 240, 'major': 300}
#     max_time = 0
#     type_penalty = ''
#     team_penalized = ''

#     i = 0
#     for j, row in df.iterrows():
#         if time >= max_time:
#             time_pp_started[i]
        
#         if row['Event'] == 'PENALTY':
#             time = 0  # row['GameTime']
#             max_time += penalty_time[type_penalty]
            
            
    
#     df['TimePPStarted'] = time_pp_started
#     return df