# **Attacking Productivity in Premier League Players**
The following analysis looks at the goals (G), assists (A), expected goals (xG), and expected assisted goals (xAG) to better understand the varying attacking profiles of premier league players in the 2022-2023 season.
<br><br>
To focus on key attacking players, I filter the dataset down to the 190 players who have more than 3 attacking contributions (goals + assists). Additionally, statistics are calculated by player AND club, making it possible for some players to have two separate set of metrics. For example, January transfer Leandro Trossard has an xG for his performance at Brighton and at Arsenal.

## Some context
Before we begin, lets quickly discuss the last two metrics.


#### **What is xG?**
xG, or expected assists, is the probability that a shot will result in a goal based on the characteristics of that shot and the events leading up to it. Some of these characteristics/variables include:

> **Location of shooter**: How far was it from the goal and at what angle on the pitch? <br>
> **Body part**: Was it a header or off the shooter's foot? <br>
> **Type of pass**: Was it from a through ball, cross, set piece, etc? <br>
> **Type of attack**: Was it from an established possession? Was it off a rebound? Did the defense have time to get in position? Did it follow a dribble?

xG is a valuable tool that enables us to discern variations in the quality of goalscoring chances. While a tap-in goal and a long-range volley goal will appear the same on the scoresheet, their respective xG will differ significantly.
<br><br>
To exclude penalties, some analysts may use npxG instead. Penalties have an xG of 0.79.

#### **What is xA and xAG? How do they differ?** 
xA, or expected assists, is the likelihood that a given completed pass will become a goal assist. This statistic assigns a probability to all passes based on the type of the pass, the location on the pitch, the phase of play, and the distance covered. Players receive xA for every completed pass regardless of whether a shot occurred or not.
<br><br>
In order to just isolate the xG on passes that assist a shot, there's expected assisted goals (xAG). Players receive xAG only when a shot is taken after a completed pass.
<br><br>
xAG is a good indication of a player's ability to set up scoring chances without having to rely on the actual result of the shot or the shooter's luck/ability. For our purposes of analysing direct goal contributions, I prefer xAG over xA.

## Data preparation

In [1]:
# import libaries
import pandas as pd
import seaborn as sns
import warnings
import plotly.express as px
import plotly.graph_objects as go
import urllib
from bs4 import BeautifulSoup
from google.colab import files

# ignore warnings
warnings.filterwarnings('ignore')

In [2]:
def xStatsHelper(id):
  # locate and read table contents
  source = urllib.request.urlopen(f'https://fbref.com/en/squads/{id}').read()
  soup = BeautifulSoup(source,'lxml')
  table = soup.find_all('table')[0]
  df = pd.read_html(str(table),flavor='bs4',header=0,skiprows=1)[0]

  # drop last 2 rows
  df = df[:-2]

  # create column for club
  start_index = id.rfind('/') + 1
  end_index = id.rfind('-Stats')
  df['Club'] = id[start_index:end_index]

  # return data of players with minutes played and matching position
  return df

def xStats(id_list):
    dfs = [xStatsHelper(i) for i in id_list]
    df = pd.concat(dfs, ignore_index=True)
    return df

In [3]:
# list of URLs to scrape
club_list = ['b8fd03ef/Manchester-City-Stats','18bb7c10/Arsenal-Stats',
             '19538871/Manchester-United-Stats','b2b47a98/Newcastle-United-Stats',
             '822bd0ba/Liverpool-Stats','d07537b9/Brighton-and-Hove-Albion-Stats',
             '8602292d/Aston-Villa-Stats','361ca564/Tottenham-Hotspur-Stats',
             'cd051869/Brentford-Stats','fd962109/Fulham-Stats',
             '47c64c55/Crystal-Palace-Stats','cff3d9bb/Chelsea-Stats',
             '8cec06e1/Wolverhampton-Wanderers-Stats','7c21e445/West-Ham-United-Stats',
             '4ba7cbea/Bournemouth-Stats','e4a775cb/Nottingham-Forest-Stats',
             'd3fd31cc/Everton-Stats','a2d435b3/Leicester-City-Stats',
             '5bfb9659/Leeds-United-Stats','33c895d4/Southampton-Stats']

# load data
df = xStats(club_list)

In [4]:
# rename columns
df.columns = df.columns.str.replace('.1', ' Per 90')

# clean Nation and Club columns 
df['Nation'] = df['Nation'].str[-3:]
df['Club'] = df['Club'].str.replace('-',' ')

# filter players with matches played
df = df[df['MP'] > 0]

# filter players with 3+ goal contributions
df_3ga = df[df['G+A']>=3]

In [5]:
# dictionary of club colours
color_mapping = {
    'Manchester City':'#6CABDD','Arsenal':'#EF0107',
    'Manchester United':'#EF3829','Newcastle United':'#BBBCBC',
    'Liverpool':'#C8102E','Brighton and Hove Albion':'#0057B8',
    'Aston Villa':'#670E36','Tottenham Hotspur':'#FFFFFF',
    'Brentford':'#E30613','Fulham':'#FFFFFF',
    'Crystal Palace':'#A7A5A6','Chelsea':'#034694',
    'Wolverhampton Wanderers':'#F6B000','West Ham United':'#1BB1E7',
    'Bournemouth':'#A89968','Nottingham Forest':'#E53233',
    'Everton':'#003399','Leicester City':'#003090',
    'Leeds United':'#FFCD00','Southhampton':'#0DB14B'
}

In [50]:
def plotExpectedStats(df,x_input,y_input,line,axisnames):
  # create scatter pot
  fig = px.scatter(df,x=x_input,y=y_input,title=f'{axisnames[0]} vs {axisnames[1]}')

  # set theme
  fig.update_layout(template='plotly_dark')

  # edit hover text and marker colour
  fig.update_traces(
      hovertemplate='<b>Player:</b> %{customdata[0]}<br><b>Club:</b> %{customdata[1]}<br><b>' + x_input + ':</b> %{x}<br><b>' + y_input + ':</b> %{y}',
      customdata=df[['Player','Club']].values,
      marker=dict(color=[color_mapping.get(club,'gray') for club in df['Club']])
  )

  # add dotted y=x line
  if line:
    fig.add_shape(
        type='line', 
        x0=df[x_input].min(), y0=df[x_input].min(), 
        x1=df[x_input].max(), y1=df[x_input].max(),
        line=dict(dash='dot', color='gray')
    )

  # set x-axis and y-axis labels
  fig.update_xaxes(title=axisnames[0])
  fig.update_yaxes(title=axisnames[1])

  # return fig
  return fig

## xG vs xAG: Who were the top attacking threats over the season?

xG and xAG, when considered together, serve as a valuable measure of a player's total attacking contribution, encompassing both their goal-scoring prowess and their ability to create scoring opportunities for their teammates.

In [51]:
plotExpectedStats(df_3ga,'xG','xAG',False,['Expected Goals','Expected Assisted Goals'])

High xG players on the bottom right consistently gets into favorable positions to score goals during matches. Here we find many top forwards such as Haaland, Kane, Salah, and Toney. Joining that group this season are Watkins and Wilson, who have embraced an upswing in individual form and team competitiveness.
<br><br>
High xAG players on the top left consistently provide quality passes, crosses, or setups that have a higher probability of resulting in goals. Here we find playmakers like KDB. We also see the cross merchants TAA and Trippier.
<br><br>
High xG and xAG players on the top right consistently do it all. They are influential players who excel in both aspects of offensive play, creating and converting a significant number of goal-scoring chances. Here we find a couple players in great form.
<br><br>
**James Maddison (8.7,9.3)**: Man, if this guy just stayed fit he would probably be right up there this season. With 8.7 xG and 9.3, James Maddison fought hard to keep Leicester out of relegation; too bad none of his other teammates seem to care though. A quality player that definitely will get pick up by another team this summer.
<br><br>
**Solly March (8.1,8.4) & Karou Mitoma (8.1,6.4)**: These two were on fire this season, playing a large part in Brighton reaching their highest ever table position and earning European football. If Mitoma had a better start to the season pre World Cup, his numbers could be even higher.
<br><br>
**Bruno Fernandes (9.3,16.7)**: An absolute nutty season by Bruno Fernandes sees him with 16.7 xAG, more than everybody in the league, even KDB. This is both surprising and unsurprising. Just from watching some United games, it is clear how much of the team's energy and offense center around Fernandes. He is still a moany bastard though. 
<br><br>
**Bukayo Saka (11.2,8.5), Martin Odegaard (10,8.1), Gabriel Martinelli (9.3,9.1)**: Under Mikel Arteta, the young gunners have found another gear this season and led Arsenal to a title challenge. Great form combined with an all-round quality squad meant that these three found many goal scoring or chance creating oppportunities.
<br><br>
**Heung-min Son (10.1,6.5)**: Despite having one of his most underwhelming seasons in Tottenham, Son still put in some solid performances to end the season with 10.1 xG and 6.5 xAG. While not much compared to the 15.9 xG and 7.5 xAG from his golden boot season last year, his attacking threat is still a fair bit higher than many others. 

## xGp90 vs xAGp90: Who were the top attacking threats over 90 minutes?
xGp90 and xAGp90 are standardised rate-based metrics that measure the expected goals a player is expected to score or assist per 90 minutes of play. They are calculated by dividing the total xG or xAG by the total minutes played and then multiplying it by 90.
<br><br>
xGp90 and xAGp90 gives us insight into the efficiency or consistency of a player's attacking contribution and allows for fair comparisons between players who may have played different amounts of minutes.

In [53]:
plotExpectedStats(df_3ga,'xG Per 90','xAG Per 90',False,['Expected Goals Per 90','Expected Assisted Goals Per 90'])

Conclusions are not too different from the previous xG vs xAG plot. Top players like Haaland and KDB are still a cut above the rest. However, it does highlight some players who perform well given limited playtime.
<br><br>
**Eddie Nketiah (0.68,0.11), Leandro Trossard (0.18,0.46), Fabio Viera (0.15,0.44)**: Two great signings that scored or created chances whenever called upon. Nketiah's goals and Trossard's linkup play during Gabriel Jesus's absence definitely kept Arsenal in the title contention.
<br><br>
**Reiss Nelson (0.46,0.58)**: Despite playing just 212 minutes, Nelson was an efficient impact sub that drastically improved Arsenal's offense, averaging an 0.46 xG and 0.58 xAG per 90 minutes. He is just shy from scoring 1 goal AND 1 assist every 2 games! His impressive late-game performance and 97th minute screamer that led to a 3-2 comeback against Bournemouth immediately comes to mind.
<br><br>
**Callum Wilson (0.84,0.13)**: Wilson is the closest to Haaland in being a consistent scoring threat. With 0.84 xG per 90 minutes, he is expected to score almost a goal per match.

## xG vs G: Who were the clinal finishers?
While expected goals provide valuable insights into the quality of scoring opportunities, actual goals are the tangible output that directly impacts match results. If a high xG striker consistently finds himself in goal scoring positions but misses half the shots, is he still still a good striker? Probably not. A comparison of xG and G build a more complete picture of an individual's goal scoring ability. 
<br><br>
G > xG (illustrated by a datapoint above the linear trend line) means a player has been efficient in converting their chances and outperforming the statistical predictions of goal-scoring based on the quality of chances created. 
<br><br>
On the other hand, G < xG (illustrated by a datapoint below the linear trend line) means a player is underperforming statistical predictions due to poor finishing or decision-making in key moments. 
<br><br>
Of course, strong goalkeeper performances and sheer luck can play a role, but in a result-driven environment over the course of of a season, that is rarely a convincing excuse.
<br><br>
Below the plot, I analyse several key players to see if they have over or underperformed.

In [54]:
plotExpectedStats(df_3ga,'xG','Gls',True,['Expected Goals','Goals'])

**Erling Haaland (28.4,36)**: There is not much to say that have not been said already about Haaland. The robot not only puts himself in an absurd number of goal-scoring positions but also converts them frequently. With 36 G to 28.4 xG, Haaland has scored 7.6 goals more than expected. More terrifyingly, this is just his first (and hopefully best) season in the Premier League.
<br><br>
**Harry Kane (21.5,30)**: The only player that has outperformed more than Haaland is Harry Kane, who scored 8.5 goals more than expected. Year after year, Harry Kane keeps Spurs from being a mid-bottom table team; he should get a trophy for that.
<br><br>
**Ivan Toney (18.7,20)**: A decent number 9 who has capitalised on chances well. Considering the scarce striker market right now, he could have had a big move this summer if not for the suspension. What a shame.
<br><br>
**Mohamed Salah (21.7,19)**: A much quieter season for last years golden boot winners. Salah still scored many goals (especially compared to Son) but was less clinical in his finishes. Given his 21.7 xG, he should have at least 2 more goals.
<br><br>
**Darwin Nuńez (12.1,9)**: I always thought the memes about his end product is somewhat unjustified. Firstly, there are plenty of people who were more underwhelming like Havertz. Secondly, at least Nuńez is getting himself into a decent number of goal scoring positions as reflected by his 12.1 xG. He has already shown great improvement and will only get better. Him missing left right and center earlier this season was pretty hilarious though. 
<br><br>
**Gabriel Jesus (14,11)**: Because of his involvement in build up play, Jesus is often not in the box for key goal-scoring moments. And when he does find his way there, he is not as clinical as others strikers. Although Jesus brings a lot to the team through linkup play, ball-carrying ability, and overall creativity, Arteta should definitely look elsewhere for a more traditional goal-scoring number 9.

## xAG vs A: Who were lucky with assists?
Expected assisted goals do not always lead to actual assists. Once a player makes the final pass, he relies on his teammate to convert that opportunity. A comparison of actual assists to expected assisted goals (xAG) allows us to see who has been lucky and who has not.

A > xAG  (illustrated by a datapoint above the linear trend line) imply greater luck. Thanks to great finishing ability from teammates, a player achieved more assists than expected based on the quality of his passes.

On the other hand A < xAG  (illustrated by a datapoint below the linear trend line) imply poorer luck. A player can supply great chances for assists but ultimately not get them because of his teammate inability to finish the job and convert.

In [55]:
plotExpectedStats(df_3ga,'xAG','Ast',True,['Expected Assisted Goals','Assists'])

**Kevin De Bruyne (13.7,16)**: Surrounded by a team full of high quality players (which now includes superhuman Erling Haaland), it is not too surprising that KDB enjoyed more assists than expected.

**Bruno Fernandes (16.7,8)**: With 16.7 xAG but just 8 assists, it is ridiculous how unlucky Fernandes has been this season. Had his teammates been more precise in the box, he could have had more than double the assists. Man U needs a striker.

**Trent Alexander Arnold (11.5,9), Kieran Trippier (12.4,7)**: As fullbacks responsible for crosses and corner kicks, both naturally have high xAG's. However, below average finishes by teammates means both did not get as many assists as they should (Trippier a bit unluckier).

**Mohammed Salah (7.7,12)**: Although Salah has not been as clinical in putting ball in net this season, his teammates are.

**Leandro Trossard (4.7,10)**: Like KDB, Trossard has strong teammates capable of turning his passes into goals. Because of players like Saka, Martinelli, and Odegaard, Trossard ended the campaign with over double the assists than expected.

## Final thoughts
This a a good place to pause. Hopefully you found it somewhat interesting and now see how much a couple metrics (G, A, xG, xAG) can tell us about nuances in attacking productivity. I have some done some team-specific analysis but will share that in a separate document. Thanks for reading and let me know what you think!

## Data
All data is sourced from <a href='https://fbref.com/en/'>FNREF</a>. I scraped content from the player statistics tables of the 20 Premier League clubs, performed necesssary data transformations, and visualised key metrics into interactive plotly charts. Everything done in Python and by me.

## TLDR;
* Nelson, Trossard, Viera, Nketiah made good use of limited PL minutes.
* Bruno Fernandes needs more luck and a proper striker to finish his passes.
* Kane is good.
* Martinelli, Saka, and Ødegaard are good.
* Haaland is very good.