# Load/Pre-process Cleaned Dataset

Now that we have our cleaned dataset, we can load that and start building the model using PyTorch. Here we get rid of some unnecessary fields and then we also need to map the positions into integer numbers so that the neural network can deal with just the numbers.

Reasoning on discarding unnecessary fields:
* 'PPG' (Fantasy points per game) is derived from other fields and can be discarded.

* The Player's Name and Team will be kept for display, but when training the model we will discard this because we don't want the model trying to guess based on the name of the player. There are repeat names in this dataset. For example, this dataset will include RB Derek Henry in all 4 of his years. We don't want the model picking up on names like this and trying to use that to guess the players position.

In [8]:
import pandas as pd
import torch
from torch.utils.data import Dataset, DataLoader

df = pd.read_csv('cleaned_data/combined.csv')

df = df.drop(['Player', 'Team', 'PPG'], axis=1)

# convert position = 'rb' to 0, position = 'wr' to 1
df['position'] = df['position'].map({'rb': 0, 'wr': 1})

# convert position into dtype float64
df['position'] = df['position'].astype('int64')

df.dtypes

data_tensor = torch.tensor(df.values)

data_tensor.shape

torch.Size([1016, 14])