# **Attacking Productivity in Premier League in Premier League Clubs**

The following analysis is a continuation of this previous study on indivudal player activing productivity. It looks at goal differences (G-xG) and assist differences (A-xA) of players to better understand the varying attacking profiles of Premier League clubs in the 2022-2023 season, identifying the clinical finishers and unlucky playmakers.
<br><br>
Again, to focus on key attacking players, I filter the dataset down to the 190 players who have more than 3 attacking contributions (goals + assists). Additionally, statistics are calculated by player AND club, making it possible for some players to have two separate set of metrics. For example, January transfer Leandro Trossard has an xG for his performance at Brighton and at Arsenal.

## What are goal differences and assist differences again?

As a recap, goal difference refers to the difference between a player's actual goals (G) and their expected goals (xG). A high G-xG value value indicates that a player has scored more goals than what would have been expected based on the quality of their chances. It suggests that the player has been efficient or clinical in converting their opportunities into goals.
<br><br>
Similarly, assist difference represents the difference between a player's actual assists (A) and their expected assists (xAG). A high A-xAG value indicates that a player has gotten more assists than what would have been expected based on the quality of their chances created. It suggests that the player has strong teammates to convert their passes into assists by scoring.
<br><br>
Below are scatterplots with assist difference on the x axis and goal difference on the y axis. Data points on the top right quadrant represent players who get more goals and assists than expected. Remember, these only tell us whether players overperformed or underperformed their expected metrics; it does not consider how many actual goals or assists a player contributed.

## Data preparation
The same document but with code chunks visible is on <a href='https://github.com/huimarco/FootballAnalytics/blob/main/AttackingProductivityPlayers.ipynb'>github</a>.


In [None]:
# import libaries
import pandas as pd
import seaborn as sns
import warnings
import plotly.express as px
import plotly.graph_objects as go
import urllib
from bs4 import BeautifulSoup
import nbconvert

# ignore warnings
warnings.filterwarnings('ignore')

In [None]:
# dictionary of club colours
color_mapping = {
    'Manchester City':'#6CABDD','Arsenal':'#EF0107',
    'Manchester United':'#EF3829','Newcastle United':'#BBBCBC',
    'Liverpool':'#C8102E','Brighton and Hove Albion':'#0057B8',
    'Aston Villa':'#670E36','Tottenham Hotspur':'#FFFFFF',
    'Brentford':'#E30613','Fulham':'#FFFFFF',
    'Crystal Palace':'#A7A5A6','Chelsea':'#034694',
    'Wolverhampton Wanderers':'#F6B000','West Ham United':'#1BB1E7',
    'Bournemouth':'#A89968','Nottingham Forest':'#E53233',
    'Everton':'#003399','Leicester City':'#003090',
    'Leeds United':'#FFCD00','Southhampton':'#0DB14B'
}

In [None]:
def xStatsHelper(id):
  # locate and read table contents
  source = urllib.request.urlopen(f'https://fbref.com/en/squads/{id}').read()
  soup = BeautifulSoup(source,'lxml')
  table = soup.find_all('table')[0]
  df = pd.read_html(str(table),flavor='bs4',header=0,skiprows=1)[0]

  # drop last 2 rows
  df = df[:-2]

  # create column for club
  start_index = id.rfind('/') + 1
  end_index = id.rfind('-Stats')
  df['Club'] = id[start_index:end_index]

  # return data of players with minutes played and matching position
  return df

def xStats(id_list):
    dfs = [xStatsHelper(i) for i in id_list]
    df = pd.concat(dfs, ignore_index=True)
    return df

In [None]:
# list of URLs to scrape
club_list = ['b8fd03ef/Manchester-City-Stats','18bb7c10/Arsenal-Stats',
             '19538871/Manchester-United-Stats','b2b47a98/Newcastle-United-Stats',
             '822bd0ba/Liverpool-Stats','d07537b9/Brighton-and-Hove-Albion-Stats',
             '8602292d/Aston-Villa-Stats','361ca564/Tottenham-Hotspur-Stats',
             'cd051869/Brentford-Stats','fd962109/Fulham-Stats',
             '47c64c55/Crystal-Palace-Stats','cff3d9bb/Chelsea-Stats',
             '8cec06e1/Wolverhampton-Wanderers-Stats','7c21e445/West-Ham-United-Stats',
             '4ba7cbea/Bournemouth-Stats','e4a775cb/Nottingham-Forest-Stats',
             'd3fd31cc/Everton-Stats','a2d435b3/Leicester-City-Stats',
             '5bfb9659/Leeds-United-Stats','33c895d4/Southampton-Stats']

# load data
df = xStats(club_list)

In [None]:
# rename columns
df.columns = df.columns.str.replace('.1', ' Per 90')

# clean Nation and Club columns
df['Nation'] = df['Nation'].str[-3:]
df['Club'] = df['Club'].str.replace('-',' ')

# filter players with matches played
df = df[df['MP'] > 0]

# filter players with 3+ goal contributions
df_3ga = df[df['G+A']>=3]

In [None]:
def plotQuadrantClub(df,club):
  '''
  A function to create a scatterplot of goals difference vs assist differences for specified club.
  '''
  # create columns for goal and assist differences
  df['Gls Difference'] = df['Gls'] - df['xG']
  df['Ast Difference'] = df['Ast'] - df['xAG']

  # filter data for players from the club Arsenal
  club_players = df[df['Club']==club]

  # create scatterplot
  fig = px.scatter(df, x='Ast Difference', y='Gls Difference',title=club)

  # set theme
  fig.update_layout(template='plotly_dark')

  # edit hover text and marker colour
  fig.update_traces(
      hovertemplate='<b>Player:</b> %{customdata[0]}<br><b>Club:</b> %{customdata[1]}<br><b>G-xG:</b> %{y}<br><b>A-xAG:</b> %{x}',
      customdata=df[['Player', 'Club']].values
  )

  # add text labels for Arsenal players
  fig.add_scatter(
      x=club_players['Ast Difference'],
      y=club_players['Gls Difference'],
      mode='markers+text',
      text=club_players['Player'],
      textposition='top center',
      marker=dict(color='red'),
      showlegend=False
  )

  # add quadrant colors
  fig.add_shape(
      type='rect',
      x0=df['Ast Difference'].min(), y0=df['Gls Difference'].min(),
      x1=0, y1=0,
      fillcolor='rgba(255, 0, 0, 0.3)', opacity=0.3, line=dict(width=0)
  )
  fig.add_shape(
      type='rect',
      x0=0, y0=df['Gls Difference'].min(),
      x1=df['Ast Difference'].max(), y1=0,
      fillcolor='rgba(255, 255, 0, 0.3)',
      opacity=0.3, line=dict(width=0)
  )
  fig.add_shape(
      type='rect',
      x0=df['Ast Difference'].min(), y0=0,
      x1=0, y1=df['Gls Difference'].max(),
      fillcolor='rgba(0, 0, 255, 0.3)', opacity=0.3, line=dict(width=0)
  )
  fig.add_shape(
      type='rect',
      x0=0, y0=0,
      x1=df['Ast Difference'].max(), y1=df['Gls Difference'].max(),
      fillcolor='rgba(0, 255, 0, 0.3)', opacity=0.3, line=dict(width=0)
  )

  # set axis labels
  fig.update_xaxes(title='A-xAG')
  fig.update_yaxes(title='G-xG')

  # return plot
  return fig

In [None]:
def plotQuadrant(df):
    '''
    A function to create a scatterplot of goals difference vs assist differences.
    '''
    # create columns for goal and assist differences
    df['Gls Difference'] = df['Gls'] - df['xG']
    df['Ast Difference'] = df['Ast'] - df['xAG']

    # create scatterplot
    fig = go.Figure()

    # add traces for each club
    for club in df['Club'].unique():
        club_data = df[df['Club'] == club]
        fig.add_trace(go.Scatter(
            x=club_data['Ast Difference'],
            y=club_data['Gls Difference'],
            mode='markers',
            name=club,
            hovertemplate='<b>Player:</b> %{customdata[0]}<br><b>Club:</b> %{customdata[1]}<br><b>G-xG:</b> %{y}<br><b>A-xAG:</b> %{x}',
            customdata=club_data[['Player', 'Club']].values,
            marker=dict(
                color=color_mapping.get(club, 'gray'),
                size=club_data['G+A']
            )
        ))

    # add quadrant colors
    fig.add_shape(type='rect', x0=df['Ast Difference'].min(), y0=df['Gls Difference'].min(),
                  x1=0, y1=0, fillcolor='rgba(255, 0, 0, 0.3)', opacity=0.3, line=dict(width=0))
    fig.add_shape(type='rect', x0=0, y0=df['Gls Difference'].min(),
                  x1=df['Ast Difference'].max(), y1=0, fillcolor='rgba(255, 255, 0, 0.3)',
                  opacity=0.3, line=dict(width=0))
    fig.add_shape(type='rect', x0=df['Ast Difference'].min(), y0=0,
                  x1=0, y1=df['Gls Difference'].max(), fillcolor='rgba(0, 0, 255, 0.3)',
                  opacity=0.3, line=dict(width=0))
    fig.add_shape(type='rect', x0=0, y0=0,
                  x1=df['Ast Difference'].max(), y1=df['Gls Difference'].max(),
                  fillcolor='rgba(0, 255, 0, 0.3)', opacity=0.3, line=dict(width=0))

    # set axis labels
    fig.update_xaxes(title='A-xAG')
    fig.update_yaxes(title='G-xG')

    # add legend
    fig.update_layout(legend=dict(
        title='Club',
        orientation='h',
        yanchor='bottom',
        y=1.02,
        xanchor='right',
        x=1
    ))

    # set theme
    fig.update_layout(template='plotly_dark')

    # return plot
    return fig

## Brief analysis of top 4 clubs this season

In [None]:
plotQuadrantClub(df_3ga,'Manchester City')

The strength of the Manchester City squad is evident at first glance. Most goal-contributing players are in the green, outperforming both their expected goals and expected assists throughout the season.
<br><br>
Some notes on individual players:
* Erling Haaland is an absolute monster in the box, scoring over 7 goals more than expected in his very first Premier League season
* Phil Foden is not far behind with a 5.1 goal difference. The youngerster is still very impressive despite playing fewer minutes this year.
* Riyad Mahrez had better seasons.

In [None]:
plotQuadrantClub(df_3ga,'Arsenal')

What immediately stands out when analysing goal contribution at Arsenal is the number of players in the conversation. Arsenal this season has 14 individuals with 3+ goal contributions, more than any other Premier League club. Under Arteta's possesion-based, attack-minded system, goals and assists are not only abundant but also well distributed throughout the squad.
<br><br>
Some notes on individual players:
* Martinelli had his best season at Arsenal so far, scoring 5.7 more goals than expected. He could be luckier with assists too. There were many times when nobody was in the box to finish off his cutback passes.
* Eddie Nketiah and Gabriel Jesus are not as clinical as they can be, considering their position as strikers.
* Leandro Trossard got over 5 more assists than expected thanks to some brilliant passes and strong teammates to convert chances.

In [None]:
plotQuadrantClub(df_3ga,'Manchester United')

Unlike the two clubs above, Manchester United has not had a particularly clinical or overperforming goalscorer this season. Most players have barely scored more goals than expected.
<br><br>
Some notes on individual players:
* Bruno Fernandes has been laughably unlucky, receiving almost 9 assists fewer than expected given the chances he creates. Somebody give him a good striker.
* A purple patch post world cup brings Marcus Rashford to the top.
* Defensive midfielder Casemiro has overperformed his expected goals more than forwards like Garnacho, Martial, and Antony.

In [None]:
plotQuadrantClub(df_3ga,'Newcastle United')

Nothing particularly striking to note about Newcastle United. Smart tactics and several solid performers earned them a fourth place finish this year. Eddie Howe has done well with a squad without any big names, bringing many Champions League football for the first time.
<br><br>
Some notes on individual players:
* Although Miguel Almirón has dropped off a bit since the beginning of the season, he is still up there as the most clinical finisher.
* Trippier has been fairly unlucky with assists. A stronger forward in the box during set pieces could have meant over 5 more assists.
* Willock has a far lower goal difference than I expected.  

## Over and under performers

This is an interactive plot of every single player with 3+ goals contributions (G+A) this year on one scatterplot. Data points are sized based on actual goal contribution achieved.

Feel free to check out some of the players and team that I did not cover. Double click on a club in the legend to isolate players from that club. Single click to filter them out.

In [None]:
plotQuadrant(df_3ga)

# **Miscellaneous**

Since there is some extra space here, I will include some extra plots I did for fun.

## Where are premier league players from?

In [None]:
# compute the top 10 players for each Nation based on the 'Min' column
top_players = (
    df.groupby('Nation')
    .apply(lambda x: x.nlargest(10, 'Min')['Player'].tolist())
    .reset_index(name='Top Players')
    .rename(columns={'Nation': 'label'})
)

# create the treemap
fig = px.treemap(
    df['Nation'].value_counts().reset_index(),
    path=['index'], values='Nation',
    title='22/23 Season Premier League Player Nationalities'
)

# define the hovertemplate to display the top 10 player names
fig.update_traces(
    hovertemplate='There are <b>%{value}</b> players from %{label}<br><br>' +
                  '<b>Most Minutes Played:</b><br>' +
                  '%{customdata}',
    customdata=top_players['Top Players'].apply(lambda x: '<br>'.join(x))
)

# set the layout and display the figure
fig.update_layout(template='plotly_dark')
fig.show()


## Which club is the youngest?

In [None]:
# compute average age by club
temp = df.groupby('Club')['Age'].mean().reset_index().sort_values('Age')
# create bar plot
fig = px.bar(temp,x='Club',y='Age')
# set theme
fig.update_layout(template='plotly_dark')
# edit hover text and marker colour
fig.update_traces(marker=dict(color=[color_mapping.get(club,'gray') for club in temp['Club']]))
# set y-axis scale manually
y_axis_min = 18
y_axis_max = 30
#fig.update_yaxes(range=[y_axis_min, y_axis_max])
# show plot
#fig.show()

In [None]:
# compute average age by club
temp = df.groupby('Club')['Age'].mean().reset_index().sort_values('Age')

# Create boxplot trace
data = []
for club in temp['Club']:
    data.append(
        go.Box(
            x=[club] * len(df[df['Club'] == club]['Age']),
            y=df[df['Club'] == club]['Age'],
            name=club
        )
    )

# Create layout
layout = go.Layout(
    title='22/23 Season Premier League Players Age Distribution by Club (Sorted by Average)',
    template='plotly_dark',
    xaxis=dict(title='Club'),
    yaxis=dict(title='Age')
)

# Create figure
fig = go.Figure(data=data, layout=layout)

# Show plot
fig.show()


In [None]:
!jupyter nbconvert --to html --execute --no-input --HTMLExporter.theme=dark /content/AttackingProductivityTeams.ipynb

[NbConvertApp] Converting notebook /content/AttackingProductivityTeams.ipynb to html
[NbConvertApp] Writing 742433 bytes to /content/AttackingProductivityTeams.html
