## Feature Engineering

This notebook takes the results from the Get_All_Player_Data Notebook that is saved as a csv file called all_batters_game_data.csv in the data folder. The goal of this notebook is to create additional features that are not in the data that is scraped from baseball reference. 

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('../data/all_batters_game_data.csv')

In [None]:
df = df.rename(columns = {'Unnamed: 5':'At'})

In [None]:
def label_is_home(row):
    if row['At'] == '@':
        return 0
    else:
        return 1

In [None]:
df['is_home'] = df.apply(lambda row: label_is_home(row), axis=1)

In [None]:
del df['At']

In [None]:
print(df.shape)
df = df.dropna()
print(df.shape)

### Functions 

These are functions that are used to create some additional features for the data. The thought is that it is important to know what kind of a streak the batter is on, as players in baseball tend to have hot and cold streaks

In [None]:
def label_consecutive_games_above_average(row, avg_points):
    if row['DFS(FD)'] > avg_points:
        return 1 
    else:
        return 0

In [None]:
def label_streak(y):
    return y * (y.groupby((y != y.shift()).cumsum()).cumcount() + 1)

In [None]:
def label_got_hit(row):
    if int(row['H']) >= 1:
        return 1
    else:
        return 0

### Create New Features

In [None]:
df['got_hit'] = df.apply(lambda row: label_got_hit(row), axis=1)

In [None]:
df['got_hit_prev_day'] = df['got_hit'].shift().fillna(0)

In [None]:
# the current hit streak should be dependent on the previous day
df['hit_streak'] = label_streak(df['got_hit_prev_day'])

In [None]:
df['prev_points'] = df['DFS(FD)'].shift().fillna(0)

In [None]:
df['points_ma'] = df['prev_points'].rolling(window=3).mean().fillna(0)

In [None]:
avg_points = df['DFS(FD)'].mean()
df['above_avg_points'] = df.apply(lambda row: label_consecutive_games_above_average(row,avg_points), axis=1).shift().fillna(0)

In [None]:
df['above_avg_streak'] = label_streak(df['above_avg_points'])

### Export To New CSV 

In [None]:
df.to_csv('../data/all_batters_with_feats.csv')