In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

<p>Import libraries.</p>

In [2]:
class linearRegression:
    def __init__(self, learning_rate=0.01, iterations=1000):
        self.learning_rate = learning_rate
        self.iterations = iterations
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        for _ in range(self.iterations):
            y_predicted = np.dot(X, self.weights) + self.bias
            dw = (1 / n_samples) * np.dot(X.T, (y_predicted - y))
            db = (1 / n_samples) * np.sum(y_predicted - y)
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict(self, X):
        return np.dot(X, self.weights) + self.bias

<p>__init__ function: Initialize class variables.</p>
<p>fit function: fit the model, create feature matrix and target variable. Use gradient descent to determine weights and bias. Update weights and bias at bottom of loop. </p>
<p>predict function: predict values using the trained weights and bias.</p>

In [3]:
plays = pd.read_csv("data/plays.csv")

Load data

In [4]:
def playType(description):
    if "pass" in description.lower():
        return "Pass"
    elif "run" in description.lower():
        return "Run"
    return "Unknown"

def side(description):
    if "right" in description.lower():
        return "Right"
    elif "left" in description.lower():
        return "Left"
    return "Center"

<p>Extract play type and side data.</p>

In [5]:
plays["playType"] = plays["playDescription"].apply(playType)
plays["playSide"] = plays["playDescription"].apply(side)

<p>Use functions to create new columns.</p>

In [6]:
plays["yardsGained"] = plays["expectedPointsAdded"]

<p>Yards gained is target variable.</p>

In [7]:
filterPlays = plays[
   (plays["playType"].isin(["Pass", "Run"])) &
   (plays["playSide"].isin(["Right", "Left"])) 
]

<p>Filter out Unkown and Center side plays.</p>

In [8]:
filterPlays["encodePlayType"] = filterPlays["playType"].map({"Run": 0, "Pass": 1})
filterPlays["encodeSideType"] = filterPlays["playSide"].map({"Left": 0, "Right": 1})

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filterPlays["encodePlayType"] = filterPlays["playType"].map({"Run": 0, "Pass": 1})
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filterPlays["encodeSideType"] = filterPlays["playSide"].map({"Left": 0, "Right": 1})


In [9]:
X = filterPlays[["encodePlayType", "encodeSideType"]].values
y = filterPlays["yardsGained"].values

<p>Create feature matrix and target variable.</p>

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=42)

<p>Create training and test data splits.</p>

In [11]:
linReg = linearRegression(learning_rate=0.01, iterations=1000)
linReg.fit(X_train, y_train)

<p>Train linear regression model.</p>

In [12]:
y_pred = linReg.predict(X_test)

<p>Make a prediction!!</p>

In [13]:
tester = np.mean((y_test - y_pred) ** 2)

<p>Test model effectiveness using mean squared error.</p>

In [14]:
print("Mean Squared Error: ", tester)
print("Weights: ", linReg.weights)
print("Bias: ", linReg.bias)

Mean Squared Error:  1.7789727395176655
Weights:  [ 0.31871904 -0.02917914]
Bias:  0.3083797422715831


<p>Print the results!!</p>

# Interpreting the Results
<p>A Mean Sequare Error of 1.779 suggest the model's predictions are somewhat close to the actual values but there is still a little bit of error involved.</p>
<p>First Weight: 0.319. Since the first weight is a positive value and a "Pass" was encoded as 1, this means that a "Pass" tends to increase yards gained or expected pointed added compared to a "Run," which was encoded as a 0.</p>
<p>Second Weight: -0.029. Since the second weight is a negative value and a play on the "Right" side of the field was encoded as a 1, plays on the "Right" side of the field tend to decrease the yards gained or expected points added compared to plays on the "Left" side of the field that were encoded as "0". There is a very small bias though.</p>
<p>Bias: 0.308. When both play type and play side are encoded to 0 ("Run" play to the "Left" side), the linear regression predicts a value of 0.308 for the yards gained or expected points added.</p>
<p>To conclude, "Pass" plays have a stronger, positive influence on the yards gained or expected points added. Plays on the "Left" side of the field have a slightly more postiive effect on yards gained or expected points added than plays on the "Right" side of the field.</p>