# Linear Regression: Predicting Player Market Values

## What is Linear Regression?
Linear regression is a fundamental supervised learning algorithm that models the relationship between a dependent variable (target) and one or more independent variables (features). It works by:
1. Finding the best-fitting line through the data points
2. Minimizing the mean squared error between predictions and actual values
3. Providing continuous value predictions

## Our Task
We'll use linear regression to predict EA FC 24 player market values based on their attributes. This regression task will demonstrate how we can estimate continuous values (market prices) based on player statistics.

## Implementation

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

def identity(z):
    return z

def mean_squared_error(y_hat, y):
    return 0.5 * (y_hat - y)**2

class SingleNeuron(object):
    def __init__(self, activation_function, cost_function):
        self.activation_function = activation_function
        self.cost_function = cost_function

    def train(self, X, y, alpha = 0.005, epochs = 50):
        self.w_ = np.random.rand(1 + X.shape[1])
        self.errors_ = []
        N = X.shape[0]

        for _ in range(epochs):
            errors = 0
            for xi, target in zip(X, y):
                error = (self.predict(xi) - target)
                self.w_[:-1] -= alpha*error*xi
                self.w_[-1] -= alpha*error
                errors += self.cost_function(self.predict(xi), target)
            self.errors_.append(errors/N)
        return self

    def predict(self, X):
        preactivation = np.dot(X, self.w_[:-1]) + self.w_[-1]
        return self.activation_function(preactivation)

## Understanding Our Dataset
Let's examine the key attributes we'll use to predict player market values:
- Overall Rating (primary indicator of player quality)
- Key Performance Stats (pace, shooting, passing, dribbling, defending, physical)
- Age (younger players often have higher market value potential)

We'll use these attributes to predict the market value of players. The relationship between these features and market value is expected to be roughly linear, making linear regression a suitable choice.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

players_df = pd.read_csv("../data/players_data.csv")

# Select features for prediction
features = ["overall", "age", "pace", "shooting", "passing", "dribbling", "defending", "physic"]
X = players_df[features].values
y = players_df['value_eur'].values

# Scale features and target
scaler_X = StandardScaler()
scaler_y = StandardScaler()

X_scaled = scaler_X.fit_transform(X)
y_scaled = scaler_y.fit_transform(y.reshape(-1, 1)).ravel()

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_scaled, test_size=0.3, random_state=42)

print(f"Training set shape: {X_train.shape}")
print(f"Test set shape: {X_test.shape}")

## Model Training

In [None]:
# Set the Seaborn theme
sns.set_theme()

# Instantiate and train the linear regression model
clf = SingleNeuron(activation_function=identity, cost_function=mean_squared_error)
clf.train(X_train, y_train, epochs=1000, alpha=0.01)

## Feature Importance Analysis
Let's analyze which features have the strongest influence on player market values.

In [None]:
# Feature importance plot
plt.figure(figsize=(10, 6))
feature_importance = pd.DataFrame({
    'Feature': features,
    'Weight': clf.w_[:-1]  # excluding bias weight
})
feature_importance = feature_importance.sort_values('Weight', ascending=True, key=abs)

plt.barh(feature_importance['Feature'], feature_importance['Weight'])
plt.title('Feature Importance in Market Value Prediction')
plt.xlabel('Weight Magnitude')
plt.show()

## Model Performance Evaluation

In [None]:
from sklearn.metrics import r2_score, mean_squared_error

# Make predictions
y_pred = clf.predict(X_test)

# Calculate R-squared score
r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)

print(f"R-squared score: {r2:.4f}")
print(f"Mean squared error: {mse:.4f}")

# Plot actual vs predicted values
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
plt.xlabel('Actual Market Value (Standardized)')
plt.ylabel('Predicted Market Value (Standardized)')
plt.title('Actual vs Predicted Market Values')
plt.tight_layout()
plt.show()