# normalize.ipynb
This notebook gets the statistics ready to be combined and weighted in our model.  
Values are scaled to the range [0,1] using the [formula](https://en.wikipedia.org/wiki/Feature_scaling#Rescaling)
$$x'=\frac{x - \text{min}(x)}{\text{max}(x)-\text{min}(x)}$$
Opponent stats are inverted so low opponent stats positively impact a team's score.

## import regular csv

In [None]:
import pandas as pd
import numpy as np
df = pd.read_csv("../csvs/kaggle/regular_season_stats.csv")
pd.set_option('display.max_columns', len(df.columns))

## invert opponent stats & seeds

In [None]:
opp = df.columns.get_loc('OFGM')
df.iloc[:,opp:] = 1 / df.iloc[:,opp:]
seed = df.columns.get_loc('Seed')
df['Seed'] = 1 / df['Seed']
df = df.replace([np.inf, -np.inf], 0)

## invert turnover stats
low turnovers wins  
high opponent turnovers wins

In [None]:
to_cols = [col for col in df.columns if 'TO' in col]
df[to_cols] = 1 / df[to_cols]

## rescale to [0,1]

In [None]:
start = df.columns.get_loc('Season') + 1
sub = df.iloc[:,start:]
df.iloc[:,start:] = (sub - sub.min()) / (sub.max() - sub.min())

## save processed csv

In [None]:
df.to_csv("..\\csvs\\kaggle\\normalized_stats.csv", encoding='utf-8', index=False)
df