# 2018 FIFA World Cup

<img src='./thumbnails/world_cup.png' alt="Panel Dashboard" align="right">

The FIFA World Cup is the premier international football tournament, held every four years and featuring teams from around the globe. It is a celebration of the sport, bringing together fans and players in a thrilling competition for the prestigious title. Each World Cup tournament offers a wealth of data on matches, players, and events, providing a rich resource for analysis and visualization.

In this notebook, we focus on the 2018 FIFA World Cup, hosted by Russia. Using `hvPlot` and `Panel`, we will create dynamic and interactive visualizations to explore the extensive dataset from this tournament. These tools enable us to investigate the statistics and uncover insights about player performances and more. 

The dataset used for this analysis is sourced from [Pappalardo, Luca; Massucco, Emanuele (2019)](https://doi.org/10.6084/m9.figshare.c.4415000.v5) Soccer match event dataset, figshare collection.

## Load the data

Here we will load the `players` and `World Cup events` dataset from the figshare collection to enable use create plots and visualizations focused only on the 2018 World Cup.

In [None]:
import pandas as pd
import hvplot.pandas # noqa
import holoviews as hv
from holoviews import opts
import panel as pn

pn.extension()

In [None]:
players_df = pd.read_json('data/players.json', encoding='unicode-escape')
events_df = pd.read_json('data/events/events_World_Cup.json')

In [None]:
events_df.head()

In [None]:
players_df.tail()

## Event distribution

We can take a look at the unique events that take place in a typical football game and plot the frequency of those events using a hvPlot bar chart:

In [None]:
pitch_events = list(events_df['eventName'].unique())
pitch_events

In [None]:
event_type_count = events_df['eventName'].value_counts()
event_type_distribution = event_type_count.hvplot.bar(title='Distribution of Event Types', height=400,
                                                      width=800, rot=45)
event_type_distribution

It is unsurprising that passes are the most common event in a football match, however we would also like to see the areas of the football pitch most these events occur.

First, we will use the `holoviews` library to draw an outline of a football pitch using the official dimensions:

In [None]:
opts.defaults(opts.Path(color='black'),
              opts.Rectangles(color=''),
            opts.Points(color='black', size=5))

In [None]:
import numpy as np

# Set the dimensions of the field
field_length = 105
field_width = 68
penalty_area_length = 16.5
penalty_area_width = 40.3
goal_area_length = 5.5
goal_area_width = 18.32
goal_width = 7.32
goal_depth = 2.44 
    
# Helper function to create arcs
def create_arc(center, radius, start_angle, end_angle, clockwise=False):
    if clockwise:
        angles = np.linspace(np.radians(start_angle), np.radians(end_angle), 100)
    else:
        if start_angle < end_angle:
            start_angle += 360
        angles = np.linspace(np.radians(start_angle), np.radians(end_angle), 100)
    x = center[0] + radius * np.cos(angles)
    y = center[1] + radius * np.sin(angles)
    return hv.Path([np.column_stack([x, y])])

# create football pitch
def plot_pitch():
    pitch_elements = [
        hv.Rectangles([(0, 0, field_length, field_width)]), # outer pitch rectangle
        hv.Ellipse(field_length/2, field_width/2, 18.3), # center circle
        hv.Points([(field_length/2, field_width/2)]), # center spot
        hv.Path([[(field_length/2, 0), (field_length/2, field_width)]]), # halfway line
        hv.Rectangles([(0, (field_width - penalty_area_width) / 2, penalty_area_length,
                        (field_width + penalty_area_width) / 2)]), # left penalty area
        hv.Rectangles([(field_length - penalty_area_length, (field_width - penalty_area_width) / 2,
                        field_length, (field_width + penalty_area_width) / 2)]), # right penalty area
        hv.Rectangles([(0, (field_width - goal_area_width) / 2, goal_area_length,
                        (field_width + goal_area_width) / 2)]), # left goal area
        hv.Rectangles([(field_length - goal_area_length, (field_width - goal_area_width) / 2,
                        field_length, (field_width + goal_area_width) / 2)]), # right goal area
        hv.Points([(11, field_width/2)]), # left penalty spot
        hv.Points([(field_length - 11, field_width/2)]), # right penalty spot
        create_arc((11, field_width/2), 9.15, 52, 308), # left penalty arc
        create_arc((field_length - 11, field_width/2), 9.15, 232, 128), # right penalty arc
        hv.Rectangles([(-goal_depth, (field_width - goal_width) / 2,
                        0, (field_width + goal_width) / 2)]), # left goal
        hv.Rectangles([(field_length, (field_width - goal_width) / 2,
                        field_length + goal_depth, (field_width + goal_width) / 2)]), # right goal
        hv.Arrow(20, 5, '', '>', ), # attack arrow
        hv.Text(10, 6, 'Attack', 11) # attack text
    ]
    
    field = hv.Overlay(pitch_elements).opts(width=880, height=600, xlim=(-5, field_length + 5),
                                            ylim=(-5, field_width + 5), xaxis=None, yaxis=None)
    return field

In [None]:
pitch = plot_pitch()
pitch

In `events_df` dataframe, we can see that the `positions` column is a a pair of coordinates written in percentages instead of the actual field dimensions as described in the [data source](https://figshare.com/articles/dataset/Events/7770599?backTo=/collections/_/4415000). To match the coordinates of the drawn pitch, we will have to transform those coordinates to their actual dimensions in meters:

In [None]:
def transform_positions(events_df):
    def scale_position(pos):
        if len(pos) > 1:
            return [
                {'x': pos[0]['x'] * field_length/100, 'y': pos[0]['y'] * field_width/100},
                {'x': pos[1]['x'] * field_length/100, 'y': pos[1]['y'] * field_width/100}
            ]
        return pos

    events_df['positions'] = events_df['positions'].apply(scale_position)
    return events_df

In [None]:
events_df = transform_positions(events_df)
events_df.head()

Then, we can generate a heatmap to see where these events occur the most on the pitch:

In [None]:
def plot_event_heatmap(events_df, event_type, cmap='Greens'):
    """
    Plots a heatmap of the specified event type on a football pitch.
    
    Parameters:
    events_df (pd.DataFrame): The dataframe containing event data with the following columns:
        - eventId: The identifier of the event's type.
        - eventName: The name of the event's type.
        - subEventId: The identifier of the subevent's type.
        - subEventName: The name of the subevent's type.
        - tags: A list of event tags describing additional information about the event.
        - eventSec: The time when the event occurs (in seconds since the beginning of the current half).
        - id: A unique identifier of the event.
        - matchId: The identifier of the match the event refers to.
        - matchPeriod: The period of the match (1H, 2H, E1, E2, P).
        - playerId: The identifier of the player who generated the event.
        - positions: The origin and destination positions associated with the event.
        - teamId: The identifier of the player's team.
    event_type (str): The type of event to plot (e.g., 'Pass', 'Duel', 'Shot').
    cmap (str): The color map to use for the heatmap. Default is 'Greens'.
    
    Returns:
    hvPlot object: A heatmap plot of the specified event type overlaid on a football pitch.
    """
    event_type = event_type.lower()
    event = events_df[events_df['eventName'].str.lower() == event_type]
    positions = [(pos[0]['x'], pos[0]['y']) for pos in event['positions'] if len(pos) > 0]
    event_df = pd.DataFrame(positions, columns=['x', 'y'])
    pitch = plot_pitch() 
    title = f"{event_type.capitalize()}s Heatmap" if event_type != 'pass' else "Passes Heatmap"
    
    event_heatmap = event_df.hvplot.hexbin(x='x', y='y', cmap=cmap, min_count=1, title=title)
    
    event_heatmap_plot = (event_heatmap * pitch).opts(width=880, height=600,
                                                      xlim=(-5, 110), ylim=(-5, 73),
                                                      xaxis=None, yaxis=None)

    return event_heatmap_plot

For example, let use see the heatmap of the passes in a typical game:

In [None]:
passes_map = plot_event_heatmap(events_df, 'pass')
passes_map

We can replace "pass" with another event type to see the heatmap for that event. However, Panel makes it easy to create widgets that we can use to select the different event types and immediately see the heatmap of that event.

First, we create a `Select` widget and use `pn.bind` to link the widget with the `event_heatmap` function. Then we can display it as a column using `pn.Column`:

In [None]:
event_type_selector = pn.widgets.Select(name='Event Type', options=pitch_events)
event_heatmap = pn.bind(plot_event_heatmap, events_df=events_df, event_type=event_type_selector)

pn.Column(event_type_selector, event_heatmap)

If you have a live python process running, you can use the Selector widget to alternate between the different event types and see their heatmap on the football pitch.

## Player events

Using the `playerId` from the events dataframe, we can plot the top `n` players in any event category. First, we create a function to find the top players for any event type:

In [None]:
def find_top_players(events_df, players_df, event_type, top_n=10):
    """
    Finds the top players for a given event type.

    Parameters:
    events_df (pd.DataFrame): The dataframe containing event data.
    players_df (pd.DataFrame): The dataframe containing player data.
    event_type (str): The type of event to filter by.
    top_n (int): The number of top players to return.

    Returns:
    pd.DataFrame: A dataframe containing the top players for the given event type.
    """
    event_type = event_type.lower()
    event = events_df[events_df['eventName'].str.lower() == event_type]
    event_counts = event.groupby('playerId').size().reset_index(name=f'{event_type} count')
    
    top_players = event_counts.sort_values(by=f'{event_type} count', ascending=False).head(top_n)
    top_players = top_players.merge(players_df, left_on='playerId', right_on='wyId')
    top_players.set_index('playerId', inplace=True)
    
    return top_players[['shortName', f'{event_type} count']]

For example, we can check the top 10 players with the highest passes in the World Cup:

In [None]:
pass_maestros = find_top_players(events_df, players_df, 'pass')
pass_maestros

We can then create a bar chart to visualize these players:

In [None]:
def plot_top_players(events_df, players_df, event_type, top_n=10):
    """
    Plots a bar chart of the top players for a given event type.

    Parameters:
    events_df (pd.DataFrame): The dataframe containing event data.
    players_df (pd.DataFrame): The dataframe containing player data.
    event_type (str): The type of event to filter by.
    top_n (int): The number of top players to return.

    Returns:
    hvPlot: A bar chart of the top players for the given event type.
    """
    top_players = find_top_players(events_df, players_df, event_type, top_n)
    event_type = event_type.lower()
    if event_type == 'pass':
        title = f'Top {top_n} Players for {event_type.capitalize()}es'
    else:
        title = f'Top {top_n} Players for {event_type.capitalize()}s'
    
    bar_plot = top_players.hvplot.bar(title=title, x='shortName', y=f'{event_type} count',
                                      xlabel='', ylabel=f'Number of {event_type}', height=300, width=600, rot=45)
    
    return bar_plot

In [None]:
pass_maestros_plot = plot_top_players(events_df, players_df, 'pass')
pass_maestros_plot

We can do also plot the individual player activity for any type of event on the football pitch. First, we create a function that maps the player name to their unique ID, then create another function that plots the player activity using the resulting player ID:

In [None]:
def get_player_id(player_name):
    player_name_to_id = dict(zip(players_df['shortName'], players_df['wyId']))
    return player_name_to_id.get(player_name)

In [None]:
def plot_player_events(events_df, players_df, player_name):
    """
    Plots a distribution of events performed by a specific player on a football pitch.
    
    Parameters:
    events_df (pd.DataFrame): The dataframe containing event data.
    players_df (pd.DataFrame): The dataframe containing player data.
    player_name (str): The name of the player to plot events for.
    
    Returns:
    hvPlot object: A scatter plot of the player's events overlaid on a football pitch.
    """
    if not player_name:
        return pn.pane.Markdown("Select a player to see the scatter plot.", height=200)
    
    player_id = get_player_id(player_name)
    if player_id is None:
        return pn.pane.Markdown("Please select a valid player.", height=200)
    
    player_events = events_df[events_df['playerId'] == player_id]
    
    if player_events.empty:
        return pn.pane.Markdown("No events found for the selected player.", height=200)
    
    positions = [(pos[0]['x'], pos[0]['y'], event) 
                 for pos, event in zip(player_events['positions'], player_events['eventName']) 
                 if len(pos) > 0]
    event_df = pd.DataFrame(positions, columns=['x', 'y', 'eventName'])
    pitch = plot_pitch()
    
    event_heatmap = event_df.hvplot.points(x='x', y='y', c='eventName', cmap='Category20',
                                           title=f'{player_name} Event Heatmap')
    
    player_heatmap_plot = (event_heatmap * pitch).opts(width=880, height=600,
                                                      xlim=(-5, 110), ylim=(-5, 73),
                                                      xaxis=None, yaxis=None)
    
    return player_heatmap_plot

In [None]:
isco_map = plot_player_events(events_df, players_df, 'Isco')
isco_map

Using the Panel `AutocompleteInput` widget, we can then devise a way to search for players using their names and immediately seeing their event heatmap on the football pitch:

In [None]:
player_name_selector = pn.widgets.AutocompleteInput(name='Player Name', options=list(players_df['shortName']),
                                                    placeholder='Type player name...', case_sensitive=False,
                                                   search_strategy='includes')

player_events = pn.bind(plot_player_events, events_df=events_df,
                         players_df=players_df, player_name=player_name_selector)

pn.Column(player_name_selector, player_events, sizing_mode='stretch_width')

Another insight we can glean is the "player passing heatmap", which is a way of calculating the areas where a player makes the most passes and the directions of said passes.

One way to do this is by creating a passing heatmap for a selected player and then adding a callback function that shows the direction of the passes when each location on the pitch is clicked:

In [None]:
def plot_player_pass_heatmap(events_df, players_df, player_name, cmap='Greens'):
    player_id = get_player_id(player_name)
    
    if player_id is None:
        return pn.pane.Markdown("Please select a valid player.", height=200)
    
    player_events = events_df[events_df['playerId'] == player_id]
    
    if player_events.empty:
        return pn.pane.Markdown("No events found for the selected player.", height=200)
    
    passes = player_events[player_events['eventName'].str.lower() == 'pass']
    
    if passes.empty:
        return pn.pane.Markdown("No passes found for the selected player.", height=200)
    
    pass_positions = [(pos[0]['x'], pos[0]['y']) for pos in passes['positions'] if len(pos) > 1]
    pass_df = pd.DataFrame(pass_positions, columns=['x', 'y'])
    
    pitch = plot_pitch()
    title = f"{player_name}'s Passes Heatmap"
    
    pass_heatmap = pass_df.hvplot.hexbin(x='x', y='y', cmap=cmap, min_count=1, title=title)
    
    total_passes = hv.Text(75, 70, f'Total number of passes: {len(pass_df)}', halign='center', fontsize=12)
    
    
    # Callback to filter passes based on click location
    def filter_passes(x, y, radius=1.5):
        filtered_passes = passes[
            (passes['positions'].apply(lambda pos: pos[0]['x']) >= x - radius) & 
            (passes['positions'].apply(lambda pos: pos[0]['x']) <= x + radius) &
            (passes['positions'].apply(lambda pos: pos[0]['y']) >= y - radius) & 
            (passes['positions'].apply(lambda pos: pos[0]['y']) <= y + radius)
        ]
        
        if filtered_passes.empty:
            return hv.Overlay()
        
        pass_lines = [hv.Segments([(pos[0]['x'], pos[0]['y'], pos[1]['x'], pos[1]['y'])]) for pos in filtered_passes['positions']]
        pass_lines_overlay = hv.Overlay(pass_lines)
        
        return pass_lines_overlay
    
    # Define a function to handle clicks
    def click_callback(x, y):
        pass_lines_overlay = filter_passes(x, y)
        updated_plot = pitch * pass_heatmap * pass_lines_overlay * total_passes
        return updated_plot
    
    # Create a stream for handling clicks
    stream = hv.streams.Tap(source=pass_heatmap, x=0, y=0)
    dynamic_map = hv.DynamicMap(lambda x, y: click_callback(x, y), streams=[stream])
    
    return dynamic_map.opts(width=880, height=600,
                            xlim=(-5, 110), ylim=(-5, 73),
                            xaxis=None, yaxis=None)

Then we use the previously defined `player_name_selector` widget to bind it to the `plot_player_pass_heatmap` in other to make it easier to search for different players and view their passing heatmap:

In [None]:
player_pass_heatmap = pn.bind(plot_player_pass_heatmap, events_df=events_df,
                         players_df=players_df, player_name=player_name_selector)

pn.Column(player_name_selector, player_pass_heatmap, sizing_mode='stretch_width')

## Dashboard

We can now combine all the different plots into one layout using `pn.Column`, but first we will create an `IntSlider` widget for the top n bar charts and then bind the widget to the `plot_top_players` function:

In [None]:
top_n_selector = pn.widgets.IntSlider(name='Top', start=1, end=20, value=10)

bar_chart = pn.bind(plot_top_players, events_df=events_df, players_df=players_df,
                                event_type=event_type_selector, top_n=top_n_selector)

pn.Column(pn.Row(top_n_selector, event_type_selector), bar_chart)

Finally, we can now arrange all the plots into a neat layout with the widgets at the top:

In [None]:
layout = pn.Column(
    pn.Row(event_type_selector, top_n_selector, player_name_selector, sizing_mode='stretch_width'),
    bar_chart,
    event_heatmap,
    player_pass_heatmap,
    player_events,
    sizing_mode='stretch_both')

layout

## Servable dashboard

Now that we have a fully interactive dashboard, we can now deploy it in a template to give it a more polished look:

In [None]:
logo = '<img src="https://panel.holoviz.org/_static/logo_stacked.png" width=180 height=150>'

text = ''' **Use the selector widget to select the different type of events on the pitch and see the areas where they occur the most.
            Use the slider widget to select the number of players to display.
            search with name of player to see their event map.**'''

template = pn.template.BootstrapTemplate(
    header_background='#18BB12',
    title='Interactive football dashboard',
    sidebar=[logo, text],
    main=[pn.panel(layout, sizing_mode='scale_both')]
)
template.servable();

If you have a live python process, running the cell above will open a standalone dashboard in a new browser tab where you can select and explore the data to your heart’s content, and share it with anyone else interested in this topic.

You can also display the dashboard alone using `panel serve --rest-session-info --session-history -1 world_cup.ipynb --show`