# 1. Introduction to Plus-Minus Rating in Sports Analysis

## Objectives:
- Understand the concept of the plus-minus rating system.
- Explore the adaptation of the plus-minus rating from basketball and ice-hockey to soccer.
- Learn about the evolution of player rating systems and the role of machine learning.
- Discover new methodologies for rating soccer players using expected goals (xG) and expected points (xP).

## Introduction to Plus-Minus Rating:
The plus-minus (PM) rating system is a straightforward yet insightful method originally used in sports like basketball and ice-hockey to evaluate a player's impact on their team's performance. It answers a fundamental question: How does a team perform with a player on the field compared to without them?

## Adaptation to Soccer:
Soccer presents unique challenges for player evaluation due to its team-based nature and the complexity of player contributions. This lesson introduces how the PM rating system has been adapted for soccer, aiming to identify the best players in European football and observe their performance trends over time.

## Evolution of Player Ratings:
Traditionally, player ratings in individual sports like tennis and chess were simpler to calculate than in team sports. The advent of machine learning has shifted focus towards more sophisticated rating systems, such as the TrueSkill rating developed by Microsoft. This system, a generalization of the Elo ratings, has been applied beyond traditional sports to rate video game players.

## Challenges in Team Sports Ratings:
Rating players in team sports has always been more complex, often relying on assigning values to specific actions considered important within the game. However, this approach lacks context and fails to capture the nuances of how actions impact the game's outcome.

## The Promise of PM Ratings:
Unlike traditional rating systems that focus on quantifying individual actions, the PM rating system measures a player's overall contribution to the team's success. This is done by examining the change in a specific target metric, such as goals, while the player is in the game, without focusing on the player's individual actions.

## Methodological Innovations in Soccer:
To adapt the PM rating system for soccer, this lesson will cover two innovative approaches:
1. **Expected Goals Plus-Minus (xGPM) Rating:** Utilizes a model to estimate the likelihood of a shot resulting in a goal.
2. **Expected Points Plus-Minus (xPPM) Rating:** Employs an in-play model to estimate the probabilities of match outcomes (win, draw, loss) based on the current state of the game.

## 2. Data

We collected data from 11 European leagues over the last 8 seasons.
For every game in our data set, we collect:

- match date,
- starting line-ups,
- timings of any goals,
- timings and player names of any substitutions and red cards.

For the xG expected goals model developed, additional information regardingshots is needed. Specifically:
- the shot time,
- the shooter (x, y) coordinates,
- the type of shot(penalty, free-kick, header or open play),
- and the subjective “big chance” qualifier describing the shot situation are extracted from Opta F24 feed.

On top of that, Goal-Keeper skills as reported by **EA SPORTS FIFA** video games are collected:
- the keeper’s diving,
- ball handling,
- ball kicking,
- positioning,
- reflexes skills.

Mapping players between the Opta feed and EA SPORTS is done using the Google research tools to match players’ names and using the date of birth and the player’s country of birth for validation. Players not found by this method are mapped manually.

# 3. The Plus-Minus Rating
In this section we will first describe the basic plus-minus statistic, before presenting modifications that have been introduced in the literature. In what follows we will define everything in terms of soccer.

## 3.1 The Basic Plus-Minus Statistic

In its simplest form plus-minus statistic can be used to answer the question: **_“whathappens when the player is on the pitch, compared to when he is off it?”_**.

Goals (or points in basket ball) have been the preferred way to measure **“what happened”** and the raw plus-minus score **_calculates the player’s contribution to the goal difference of his team (per ninety minutes) whilst he is on the pitch_**.

#### Example:

Let's simplify the understanding with a clear example.
Let's calculate the PM rating of a player participating in two matches:

- **First Match Scenario:**
  - The player participates for the first 60 minutes.
  - During this period, the team concedes one goal and scores none, resulting in a 1-0 loss.
- **Second Match Scenario:**
  - The same player enters the game with 30 minutes remaining, with the team already leading 3-0.
  - By the end of these 30 minutes, the team advances their lead to 5-0.

To calculate the player's plus-minus rating, we consider the goal difference while they were on the field across both matches:

- In the first match, the goal difference per minute is $(-1/60)$.
- In the second match, the goal difference per minute is $(2/30)$, as the team scored 2 additional goals during the player's time on the field.

Now multiplying these per-minute differences by 90 to normalize to a full match duration, we obtain:

${PM Rating ON } = \left( \frac{-1}{60} + \frac{2}{30} \right) \times 90 = +4.5$
The net plus-minus statistic can be used to measure the importance of a player to his team.
This is simply the PM statistic when the player is on the pitch minus the PM statistic when the player is not on the pitch. 

In our example, the plus-minus statistic without the player is:

${PM Rating OFF } = \left( \frac{0}{30} + \frac{3}{60} \right) \times 90 = +4.5$

so that the net plus-minus statistic is 0.

**It appears then that the team is equally successful with and without the player**.

This is of course a very simplistic picture and several pieces of information are not taken into account.

#### Limitations of Basic Plus-Minus:
While the basic plus-minus statistic provides a snapshot of a player's impact on their team's goal difference, it simplifies the complex dynamics of a soccer game. Several critical factors are not considered in this simplistic approach, such as:

- The strength of teammates and opponents on the field.
- Game situations like numerical disadvantages due to red cards.
- The influence of playing at home versus away.

These omissions mean that the basic plus-minus rating can lack context, potentially leading to misleading evaluations of a player's true contribution to their team.

Further, if one was to use this simple plusminus rating to compare players from different teams, the results would be almost meaningless.

#### The Problem with Comparing Players Across Teams:
Using the basic plus-minus rating to compare players from different teams can be problematic. For instance, consider two players:

- One plays for the league champions.
- The other plays for the team at the bottom of the league.

If both players have a plus-minus rating of 0, it's challenging to determine who is the better player based solely on this statistic. Intuitively, most would argue that maintaining a rating of 0 on a weaker team might indicate a higher level of individual performance, as the player has managed to neutralize their team's disadvantages.


## 3.2 Regularized Adjusted Plus-Minus

The adjusted plus-minus player metric was first described in [Rosenbaum (2004)](http://www.82games.com/comm30.htm) who presented the plus-minus statistic as a **Regression** problem.

Doing so means ‘adjustments’ can be made to the basic plus-minus statistic to account for the strengths of the other players on a team, and of the opposition players. 

#### Defining Play Segments:
The set up is again simple. Define a segment of play to be one wherethe same set of players (usually two sides of 11 players) are on the pitch. A new segment is defined every time a new set of players are on the pitch. This may occur when a substitution is made, or when a sending off occurs, or for a different match.

#### Formulating the Regression Model:
The goal difference per 90 minutes during each segment is treated as the dependent variable $y$, represented by a series of observations across segments $y_1, ..., y_T$. 

Let there be $N$ players in total (in the whole league), then the independent
variables form a $T × N$ design matrix $X$ of dummy variables defined as

- $(x_{tj} = 1)$ if player $j$ plays for the home team in the segment.
- $(x_{tj} = -1)$ if player $j$ plays for the away team in the segment.
- $(x_{tj} = 0)$ if the player $j$ does not participate in the segment.

Each player in the league is identified by a unique numeric index $j$, and the adjusted plus-minus statistic emerges as the solution to the regression model: 

$y_t \approx X_t\alpha$

where $\alpha$ alpha is a 1-dimensional _(n x 1)_ vector of parameters **measuring each player's contribution to the goal difference**..

#### Segement Calculation
To illustrate the difference in segment calculation between basketball and soccer due to the substitution rules and the impact on the matrix $X^{T}X$  used in the regression model for the Adjusted Plus-Minus statistic, let's perform a simplified example calculation. 

We'll compare a hypothetical basketball game with a soccer match under the following assumptions:
- In basketball, there are $5 players$ per team on the court, with up to $7 substitutes$ allowed, and substitutions are unlimited. We'll assume an average game has $30 segments$ due to frequent substitutions.
- In soccer, each team starts with $11 players$ and is allowed up to $5 substitutions$, which are often not fully utilized. We'll assume an average game results in $10 segments$ because substitutions are less frequent and the number of players on the pitch is larger.

For both sports, the matrix $X^{T}X$ is crucial for estimating player contributions $\alpha$ through regression analysis. The "well-behaved" nature of this matrix, meaning it's suitably conditioned for **inversion or solving**, depends significantly on the relationship between the **number of segments** (observations) and the number of players (variables) involved.

Further, in soccer over the course of a match and season, **the same groupings of players will play together**. For example, a partnership between two centre backs is commonplace in soccer meaning they are on the pitch together for nearly every minute of play during an entire season. 

The result of all of this is that although the matrix $X^{T}X$ is well-behaved for basketball, **it is not so for soccer**, and is singular, or near-singular, so that attempts to estimate  using ordinary least squares for example will fail.

# 4. New Plus-Minus Ratings for Soccer

Recent advancements in the plus-minus metric for soccer have been inspired by the need for more nuanced measures of team success, especially in low-scoring sports like ice-hockey. 

The dependent variable is often called the **_‘target’_** as it is in some sense what the players should be targeting to improve during the match.

Recognizing that traditional goal counts may not fully capture a player's contribution to their team, researchers have explored alternative dependent variables, or 'targets', that players influence during a match.

1. **Expected Goals (xG) Plus-Minus:** This variant of the plus-minus metric uses the difference in expected goals between two teams as the target variable. Expected goals provide a more precise measure of a team's offensive and defensive performance by **quantifying the quality of scoring opportunities**, rather than simply counting goals.

2. **Expected Points (xP) Plus-Minus:** Another innovation is the plus-minus metric based on changes in expected points. This approach **evaluates a player's impact on the team's likelihood of winning points during a match**, offering a broader perspective on contribution to team success beyond goal scoring opportunities alone.

## 4.1 Expected Goals Plus-Minus
Expected Goals (xG) is a statistical measure that evaluates the quality of shots taken in a soccer game. It assigns a probability to each shot, indicating how likely it was to result in a goal. This concept has gained popularity because it offers a more informative perspective on team performance than just counting actual goals.

#### Categories of Shots
Shots in soccer arise from various situations, leading to their categorization into four types for the xG model:
1. **Penalty**: Shots taken from the penalty spot.
2. **Free Kick**: Shots resulting from free kicks.
3. **Header**: Shots attempted with the head.
4. **Open Play**: All footed shots not arising from set pieces.

Differentiating shot types is crucial since the nature and expected success rate of shots vary significantly across categories.

#### Designing the xG Model
To accurately predict the xG values, separate models are fitted for each shot category. This approach allows for tailored feature selection for each shot type, enhancing the model's accuracy by eliminating irrelevant data (e.g., shot location for penalties).

The breakdown of shots by types is in Table 2 below

| Type      | Shots   | Goals | Baseline Error |
|-----------|---------|-------|----------------|
| Free Kick | 21,368  | 1,282 | 0.056          |
| Header    | 99,620  | 11,438| 0.102          |
| Open Play | 476,123 | 43,834| 0.084          |
| Penalty   | 6,498   | 4,912 | 0.185          |
| **Total** | **603,609** | **61,466** | **0.091**      |


#### Features for Model Training
The models consider several normalized features, each ranging from 0 to 1, including:
- **Horizontal Pitch Coordinate ($x$)**
- **Adjusted Vertical Coordinate ($y_{adj}$)** (0 corresponds to a central position, 1 to either side of the pitch.)
- **Goal View Angle**
- **Inverse Distance to Goal**
- **Time of Play**
- **Goal Value**
- **Big Chance Indicator**
- **Goalkeeping Skills**

These features capture the shot's context, providing a comprehensive dataset for training the xG models.

#### Machine Learning Models Tested
Four main types of machine learning models are evaluated for their effectiveness in predicting xG values:
- **Logistic Regression**
- **Random Forest**
- **Gradient Boosting**
- **Neural Network (Multi-Layer Perceptron)**

#### Model Performance and Results
The models are compared against a baseline error for each shot type, with the goal of minimizing this error. The results indicate that while all models perform relatively close to the baseline, some variations exist across different shot types and models, with Neural Networks showing notable performance in open play scenarios.

| Model               | Penalty | Free Kick | Header | Open Play |
|---------------------|---------|-----------|--------|-----------|
| Baseline            | 0.1845  | 0.0564    | 0.1016 | 0.0836    |
| Logistic Regression | 0.1847  | 0.0560    | 0.0927 | 0.0718    |
| Random Forest       | 0.1845  | **0.0555**    | **0.0893** | 0.0714    |
| Gradient Boosting   | **0.1844**  | 0.0556    | 0.0894 | 0.0714    |
| Neural Network      | 0.1845  | 0.0564    | 0.0950 | **0.0673**    |



