# The Perceptron Algorithm

This Jupyter Notebook is dedicated to understanding and implementing the perceptron algorithm on soccer data. You can find the dataset [2022-2023 Soccer Player Stats Dataset](https://www.kaggle.com/datasets/vivovinco/20222023-football-player-stats?resource=download).

The following packages are required to run the attached code:

- [Plotly](https://plotly.com/python/)

- [Plotly Express](https://plotly.com/python/plotly-express/)

- [Pandas](https://pandas.pydata.org/docs/)

- [Matplotlib.pylab](https://matplotlib.org/2.0.2/api/pyplot_api.html)

- [Numpy](https://numpy.org/doc/)

- [Seaborn](https://seaborn.pydata.org/)

## Setting Up:

***
Import the necessary modules and the data.
***

In [176]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#Import the dataset. 
soccer = pd.read_csv("/Users/pstern/Desktop/INDE-577/Datasets/soccer_stats.csv", encoding='ISO-8859-1', delimiter=';')

***
Get the predictors (Goals, Assists, Passes into the box, and offsides penalties) as well as what we are predicting (Offense or Defense).
***

In [177]:
# Filter the data such that we are only including players who have player more than 15 games worth of time.
data = soccer[soccer['90s'] >= 15.0]

# Just use the first 100 data points.
data = data[:100]

# For simplification, we will only use a few predictors.
predictors = data[['Goals', 'Assists', 'PPA', 'PasOff']]

# Now do the same for the position.
y = data[['Pos']]

# Convert each to a numpy array.
y = y.values
X = predictors.values

***
The perceptron algorithm performs binary classification, so we need to make position binary.
***

In [178]:
# Convert position into a binary variable where -1 is an attacker and 1 is a defender.
# I considered forwards, forward/defenders, and forward/midfielders as attackers.
y = np.where((y == 'FW') | (y == 'FWDF') | (y == 'FWMF') | (y == 'MFFW'), -1, 1)

***
Because the players played different amounts of time, we should also normalize the data.
***

In [179]:
# Get the number of 90s played.
ninetys = data['90s'].values

# Divide each row by the number of 90s played to normalize.
for i in range(X.shape[1]):
    if ninetys[i] != 0:
        X[:, i] = X[:, i] / ninetys[i]


## Implementing the Algorithm:

***
Implement a perceptron class that, for each epoch, makes predictions and calculates the number of errors.
***

In [180]:
class Perceptron(object):
    # Initialize Perceptron object.
    def __init__(self, eta = .5, epochs=50):
        self.eta = eta
        self.epochs = epochs
        
    def train(self, X, y):
        # Initialize random weights.
        self.weight = np.random.rand(1 + X.shape[1])
        self.errors_ = []
        
        # Iterate through the epochs.
        for _ in range(self.epochs):
            errors = 0

            # Iterate through each sample in the training set.
            for xi, target in zip(X, y):

                # Update weights based on the difference between predicted and actual class.
                update = self.eta * (self.predict(xi) - target)
                self.weight[:-1] -= update * xi
                self.weight[-1] -= update

                #Keep track of the number of errors.
                errors += int(update != 0)

            # If there are no errors in this epoch, return.
            if errors == 0:
                return self
            else:
                self.errors_.append(errors)
            
        return self
    
    def net_input(self, X):
        # Calculate net input (sum of weighted inputs plus bias).
        return np.dot(X, self.weight[:-1]) + self.weight[-1]
    
    def predict(self, X):
        # Predict class labels based on net input.
        return np.where(self.net_input(X) >= 0.0, 1, -1)


## Applying the Algorithm:

***
Create an instance of the class to train the model.
***

In [181]:
# Instantiate an instance of the Perceptron class.
ptron = Perceptron(epochs = 1000)

# Train the model.
ptron.train(X, y)

<__main__.Perceptron at 0x13360cd50>

***
Make predictions.
***

In [183]:
# Predict make predictions based on training.
y_hat = ptron.predict(X)

# Compare our predictions with reality.
sum = 0
for i in range(len(y_hat)):
    if y_hat[i] == y[i][0]:
        sum += 1

sum

0.86