# Barcelona vs Valencia - Match Event Analysis
## Professional Data Analysis & Tactical Insights

**Match:** Barcelona vs Valencia  
**Date:** Match Analysis Project  
**Dataset:** 4,465 match events  
**Analysis Date:** February 2026

---

### Executive Summary
This notebook presents an in-depth analysis of the Barcelona vs Valencia match, examining:
- Comprehensive event distribution and patterns
- Team performance metrics and comparison
- Temporal flow and match dynamics
- Passing networks and ball possession
- Player contributions and tactical positioning
- Shot analysis and attacking patterns
- Defensive actions and pressure maps

## 1. Environment Setup and Data Loading

In [None]:
# Import essential libraries
import pandas as pd
import numpy as np
import json
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter, defaultdict
import warnings

warnings.filterwarnings('ignore')

# Configure visualization settings
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("Set2")
plt.rcParams['figure.figsize'] = (14, 7)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['axes.labelsize'] = 12

# Team colors for consistency
TEAM_COLORS = {
    'Barcelona': '#A50044',  # Blaugrana red
    'Valencia': '#EE7923'     # Valencia orange
}

print("‚úì Libraries imported and environment configured successfully")

In [None]:
# Load the match data
with open('16157.json', 'r') as f:
    data = json.load(f)

print(f"üìä Dataset Loaded Successfully")
print(f"   Total Events: {len(data):,}")
print(f"   Data Type: {type(data)}")
print(f"\n‚úì Ready for analysis")

## 2. Data Exploration & Feature Engineering

In [None]:
# Convert to DataFrame
df = pd.DataFrame(data)

# Extract nested features for easier analysis
df['event_type'] = df['type'].apply(lambda x: x.get('name', 'Unknown') if isinstance(x, dict) else 'Unknown')
df['team_name'] = df['team'].apply(lambda x: x.get('name', 'Unknown') if isinstance(x, dict) else 'Unknown')
df['possession_team_name'] = df['possession_team'].apply(lambda x: x.get('name', 'Unknown') if isinstance(x, dict) else 'Unknown')
df['play_pattern_name'] = df['play_pattern'].apply(lambda x: x.get('name', 'Unknown') if isinstance(x, dict) else 'Unknown')

# Extract player information
df['player_name'] = df['player'].apply(lambda x: x.get('name', None) if isinstance(x, dict) else None)
df['player_id'] = df['player'].apply(lambda x: x.get('id', None) if isinstance(x, dict) else None)

# Extract position information if available
df['position_name'] = df['position'].apply(lambda x: x.get('name', None) if isinstance(x, dict) else None)

print("DATASET STRUCTURE")
print("="*80)
print(df.info())
print("\n" + "="*80)
print("\nFirst 5 rows:")
df.head()

In [None]:
# Dataset statistics
print("üéØ MATCH OVERVIEW")
print("="*80)
print(f"Total Events: {len(df):,}")
print(f"Unique Event Types: {df['event_type'].nunique()}")
print(f"Match Duration: {df['minute'].max()} minutes")
print(f"Number of Periods: {df['period'].nunique()}")
print(f"\nTeams:")
for team in df['team_name'].unique():
    if team != 'Unknown':
        count = len(df[df['team_name'] == team])
        print(f"  ‚Ä¢ {team}: {count:,} events ({count/len(df)*100:.1f}%)")

print(f"\nUnique Players Tracked: {df['player_name'].nunique() - 1}")  # -1 for None values
print(f"Average Event Duration: {df['duration'].mean():.3f} seconds")

## 3. Event Distribution Analysis

In [None]:
# Comprehensive event analysis
event_counts = df['event_type'].value_counts()

fig, axes = plt.subplots(2, 2, figsize=(18, 12))

# 1. Top 20 events - horizontal bar
top_20_events = event_counts.head(20)
axes[0, 0].barh(range(len(top_20_events)), top_20_events.values, color='steelblue', edgecolor='black')
axes[0, 0].set_yticks(range(len(top_20_events)))
axes[0, 0].set_yticklabels(top_20_events.index)
axes[0, 0].set_xlabel('Frequency', fontweight='bold')
axes[0, 0].set_title('Top 20 Event Types', fontweight='bold', pad=15)
axes[0, 0].grid(axis='x', alpha=0.3)
axes[0, 0].invert_yaxis()

# Add value labels
for i, v in enumerate(top_20_events.values):
    axes[0, 0].text(v, i, f' {v:,}', va='center', fontweight='bold')

# 2. Top 12 events - pie chart
colors = plt.cm.Set3(range(12))
wedges, texts, autotexts = axes[0, 1].pie(event_counts.head(12).values, 
                                            labels=event_counts.head(12).index,
                                            autopct='%1.1f%%',
                                            startangle=90,
                                            colors=colors)
axes[0, 1].set_title('Top 12 Event Types Distribution', fontweight='bold', pad=15)
for autotext in autotexts:
    autotext.set_color('white')
    autotext.set_fontweight('bold')

# 3. Events by team
team_event_dist = df.groupby(['team_name', 'event_type']).size().unstack(fill_value=0)
top_events_for_comparison = event_counts.head(10).index
team_event_subset = team_event_dist[top_events_for_comparison]

x = np.arange(len(team_event_subset.index))
width = 0.35
for i, event in enumerate(top_events_for_comparison[:5]):
    if event in team_event_subset.columns:
        axes[1, 0].bar(x + i*width/5, team_event_subset[event], width/5, label=event)

axes[1, 0].set_xlabel('Team', fontweight='bold')
axes[1, 0].set_ylabel('Event Count', fontweight='bold')
axes[1, 0].set_title('Top 5 Events by Team', fontweight='bold', pad=15)
axes[1, 0].set_xticks(x)
axes[1, 0].set_xticklabels(team_event_subset.index, rotation=0)
axes[1, 0].legend(loc='best', fontsize=9)
axes[1, 0].grid(axis='y', alpha=0.3)

# 4. Cumulative event distribution
cumulative_pct = (event_counts.cumsum() / event_counts.sum() * 100).head(20)
axes[1, 1].plot(range(len(cumulative_pct)), cumulative_pct.values, marker='o', 
                linewidth=2.5, markersize=8, color='darkgreen')
axes[1, 1].fill_between(range(len(cumulative_pct)), cumulative_pct.values, alpha=0.3, color='lightgreen')
axes[1, 1].axhline(y=80, color='red', linestyle='--', label='80% threshold', alpha=0.7)
axes[1, 1].set_xlabel('Event Type Rank', fontweight='bold')
axes[1, 1].set_ylabel('Cumulative Percentage (%)', fontweight='bold')
axes[1, 1].set_title('Cumulative Event Distribution', fontweight='bold', pad=15)
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('comprehensive_event_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

print("\nüìä Event Type Summary:")
print("="*80)
print(event_counts.head(15).to_string())

## 4. Team Performance Comparison

In [None]:
# Detailed team comparison
teams = [t for t in df['team_name'].unique() if t != 'Unknown']

fig, axes = plt.subplots(2, 2, figsize=(18, 12))

# 1. Overall event count comparison
team_totals = df['team_name'].value_counts()
team_totals = team_totals[team_totals.index != 'Unknown']
colors_teams = [TEAM_COLORS.get(team, 'gray') for team in team_totals.index]
bars = axes[0, 0].bar(team_totals.index, team_totals.values, color=colors_teams, edgecolor='black', linewidth=2)
axes[0, 0].set_ylabel('Total Events', fontweight='bold')
axes[0, 0].set_title('Total Events by Team', fontweight='bold', pad=15)
axes[0, 0].grid(axis='y', alpha=0.3)

# Add value labels
for bar in bars:
    height = bar.get_height()
    axes[0, 0].text(bar.get_x() + bar.get_width()/2., height,
                    f'{int(height):,}\n({height/len(df)*100:.1f}%)',
                    ha='center', va='bottom', fontweight='bold')

# 2. Key performance indicators
key_metrics = ['Pass', 'Shot', 'Duel', 'Interception', 'Clearance', 'Tackle']
team_metrics = df[df['event_type'].isin(key_metrics)].groupby(['team_name', 'event_type']).size().unstack(fill_value=0)
team_metrics = team_metrics[team_metrics.index != 'Unknown']

team_metrics.plot(kind='bar', ax=axes[0, 1], width=0.8, edgecolor='black')
axes[0, 1].set_xlabel('Team', fontweight='bold')
axes[0, 1].set_ylabel('Event Count', fontweight='bold')
axes[0, 1].set_title('Key Performance Indicators', fontweight='bold', pad=15)
axes[0, 1].legend(title='Action Type', bbox_to_anchor=(1.05, 1), loc='upper left')
axes[0, 1].set_xticklabels(axes[0, 1].get_xticklabels(), rotation=0)
axes[0, 1].grid(axis='y', alpha=0.3)

# 3. Possession events
possession_events = df['possession_team_name'].value_counts()
possession_events = possession_events[possession_events.index != 'Unknown']
colors_poss = [TEAM_COLORS.get(team, 'gray') for team in possession_events.index]

wedges, texts, autotexts = axes[1, 0].pie(possession_events.values,
                                            labels=possession_events.index,
                                            autopct='%1.1f%%',
                                            startangle=90,
                                            colors=colors_poss,
                                            explode=[0.05] * len(possession_events),
                                            shadow=True,
                                            textprops={'fontweight': 'bold'})
axes[1, 0].set_title('Possession Distribution', fontweight='bold', pad=15)

for autotext in autotexts:
    autotext.set_color('white')
    autotext.set_fontsize(12)

# 4. Attack vs Defense metrics
attack_events = ['Pass', 'Shot', 'Dribble', 'Carry', 'Offside']
defense_events = ['Pressure', 'Interception', 'Block', 'Clearance', 'Tackle']

attack_counts = df[df['event_type'].isin(attack_events)].groupby('team_name').size()
defense_counts = df[df['event_type'].isin(defense_events)].groupby('team_name').size()

attack_counts = attack_counts[attack_counts.index != 'Unknown']
defense_counts = defense_counts[defense_counts.index != 'Unknown']

x = np.arange(len(attack_counts))
width = 0.35

bars1 = axes[1, 1].bar(x - width/2, attack_counts.values, width, label='Attacking Actions', 
                       color='#2ecc71', edgecolor='black')
bars2 = axes[1, 1].bar(x + width/2, defense_counts.values, width, label='Defensive Actions', 
                       color='#e74c3c', edgecolor='black')

axes[1, 1].set_xlabel('Team', fontweight='bold')
axes[1, 1].set_ylabel('Event Count', fontweight='bold')
axes[1, 1].set_title('Attacking vs Defensive Actions', fontweight='bold', pad=15)
axes[1, 1].set_xticks(x)
axes[1, 1].set_xticklabels(attack_counts.index)
axes[1, 1].legend()
axes[1, 1].grid(axis='y', alpha=0.3)

# Add value labels
for bars in [bars1, bars2]:
    for bar in bars:
        height = bar.get_height()
        axes[1, 1].text(bar.get_x() + bar.get_width()/2., height,
                        f'{int(height)}', ha='center', va='bottom', fontweight='bold', fontsize=9)

plt.tight_layout()
plt.savefig('team_performance_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n‚öΩ Team Performance Summary:")
print("="*80)
for team in teams:
    team_df = df[df['team_name'] == team]
    print(f"\n{team}:")
    print(f"  Total Events: {len(team_df):,}")
    print(f"  Passes: {len(team_df[team_df['event_type'] == 'Pass'])}")
    print(f"  Shots: {len(team_df[team_df['event_type'] == 'Shot'])}")
    print(f"  Duels: {len(team_df[team_df['event_type'] == 'Duel'])}")

## 5. Temporal Analysis - Match Flow

In [None]:
# Match timeline analysis
fig, axes = plt.subplots(3, 1, figsize=(16, 14))

# 1. Event intensity over time
time_events = df.groupby('minute').size()
axes[0].plot(time_events.index, time_events.values, marker='o', linewidth=2.5, 
             markersize=5, color='darkblue', alpha=0.8)
axes[0].fill_between(time_events.index, time_events.values, alpha=0.3, color='lightblue')
axes[0].axvline(x=45, color='red', linestyle='--', linewidth=2, label='Half-time', alpha=0.7)
axes[0].set_xlabel('Match Minute', fontweight='bold')
axes[0].set_ylabel('Number of Events', fontweight='bold')
axes[0].set_title('Match Event Intensity Over Time', fontweight='bold', pad=15, fontsize=14)
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

# 2. Team activity timeline
for team in teams:
    team_timeline = df[df['team_name'] == team].groupby('minute').size()
    axes[1].plot(team_timeline.index, team_timeline.values, marker='o', linewidth=2,
                 markersize=4, label=team, color=TEAM_COLORS.get(team, 'gray'), alpha=0.8)

axes[1].axvline(x=45, color='red', linestyle='--', linewidth=2, alpha=0.5)
axes[1].set_xlabel('Match Minute', fontweight='bold')
axes[1].set_ylabel('Events per Minute', fontweight='bold')
axes[1].set_title('Team Activity Timeline', fontweight='bold', pad=15, fontsize=14)
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

# 3. Period comparison
period_stats = df.groupby(['period', 'team_name']).size().unstack(fill_value=0)
period_stats = period_stats[[t for t in teams if t in period_stats.columns]]

period_stats.plot(kind='bar', ax=axes[2], color=[TEAM_COLORS.get(t, 'gray') for t in period_stats.columns],
                  edgecolor='black', width=0.7)
axes[2].set_xlabel('Period', fontweight='bold')
axes[2].set_ylabel('Event Count', fontweight='bold')
axes[2].set_title('Events by Match Period', fontweight='bold', pad=15, fontsize=14)
axes[2].set_xticklabels(axes[2].get_xticklabels(), rotation=0)
axes[2].legend(title='Team', fontsize=10)
axes[2].grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.savefig('temporal_match_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n‚è±Ô∏è Temporal Statistics:")
print("="*80)
for period in sorted(df['period'].unique()):
    period_data = df[df['period'] == period]
    print(f"\nPeriod {period}:")
    print(f"  Events: {len(period_data):,}")
    print(f"  Duration: {period_data['minute'].min()}-{period_data['minute'].max()} minutes")
    for team in teams:
        team_period = period_data[period_data['team_name'] == team]
        print(f"  {team}: {len(team_period)} events")

## 6. Advanced Pass Analysis

In [None]:
# Comprehensive passing analysis
passes = df[df['event_type'] == 'Pass'].copy()

if len(passes) > 0:
    # Extract pass details
    passes['pass_length'] = passes['pass'].apply(
        lambda x: x.get('length', np.nan) if isinstance(x, dict) else np.nan
    )
    passes['pass_angle'] = passes['pass'].apply(
        lambda x: x.get('angle', np.nan) if isinstance(x, dict) else np.nan
    )
    passes['pass_outcome'] = passes['pass'].apply(
        lambda x: x.get('outcome', {}).get('name', 'Complete') if isinstance(x, dict) else 'Complete'
    )
    passes['pass_height'] = passes['pass'].apply(
        lambda x: x.get('height', {}).get('name', 'Ground') if isinstance(x, dict) else 'Ground'
    )
    passes['pass_type'] = passes['pass'].apply(
        lambda x: x.get('type', {}).get('name', 'Regular') if isinstance(x, dict) else 'Regular'
    )
    
    fig, axes = plt.subplots(2, 3, figsize=(20, 12))
    
    # 1. Pass length distribution
    valid_lengths = passes['pass_length'].dropna()
    if len(valid_lengths) > 0:
        axes[0, 0].hist(valid_lengths, bins=40, color='skyblue', edgecolor='black', alpha=0.7)
        axes[0, 0].axvline(valid_lengths.mean(), color='red', linestyle='--', linewidth=2.5,
                          label=f'Mean: {valid_lengths.mean():.1f}m')
        axes[0, 0].axvline(valid_lengths.median(), color='green', linestyle='--', linewidth=2.5,
                          label=f'Median: {valid_lengths.median():.1f}m')
        axes[0, 0].set_xlabel('Pass Length (meters)', fontweight='bold')
        axes[0, 0].set_ylabel('Frequency', fontweight='bold')
        axes[0, 0].set_title('Pass Length Distribution', fontweight='bold', pad=15)
        axes[0, 0].legend()
        axes[0, 0].grid(axis='y', alpha=0.3)
    
    # 2. Pass outcomes
    outcome_counts = passes['pass_outcome'].value_counts()
    colors_outcome = plt.cm.Pastel1(range(len(outcome_counts)))
    wedges, texts, autotexts = axes[0, 1].pie(outcome_counts.values,
                                                labels=outcome_counts.index,
                                                autopct='%1.1f%%',
                                                startangle=90,
                                                colors=colors_outcome)
    axes[0, 1].set_title('Pass Outcome Distribution', fontweight='bold', pad=15)
    for autotext in autotexts:
        autotext.set_color('black')
        autotext.set_fontweight('bold')
    
    # 3. Passes by team
    team_passes = passes.groupby(['team_name', 'pass_outcome']).size().unstack(fill_value=0)
    team_passes = team_passes[team_passes.index != 'Unknown']
    team_passes.plot(kind='bar', stacked=False, ax=axes[0, 2], edgecolor='black', width=0.7)
    axes[0, 2].set_xlabel('Team', fontweight='bold')
    axes[0, 2].set_ylabel('Number of Passes', fontweight='bold')
    axes[0, 2].set_title('Pass Outcomes by Team', fontweight='bold', pad=15)
    axes[0, 2].set_xticklabels(axes[0, 2].get_xticklabels(), rotation=0)
    axes[0, 2].legend(title='Outcome', fontsize=9)
    axes[0, 2].grid(axis='y', alpha=0.3)
    
    # 4. Pass height distribution
    height_counts = passes['pass_height'].value_counts()
    axes[1, 0].bar(height_counts.index, height_counts.values, color='coral', edgecolor='black')
    axes[1, 0].set_xlabel('Pass Height', fontweight='bold')
    axes[1, 0].set_ylabel('Frequency', fontweight='bold')
    axes[1, 0].set_title('Pass Height Distribution', fontweight='bold', pad=15)
    axes[1, 0].grid(axis='y', alpha=0.3)
    plt.setp(axes[1, 0].xaxis.get_majorticklabels(), rotation=45, ha='right')
    
    # 5. Pass type distribution
    type_counts = passes['pass_type'].value_counts().head(10)
    axes[1, 1].barh(range(len(type_counts)), type_counts.values, color='mediumseagreen', edgecolor='black')
    axes[1, 1].set_yticks(range(len(type_counts)))
    axes[1, 1].set_yticklabels(type_counts.index)
    axes[1, 1].set_xlabel('Frequency', fontweight='bold')
    axes[1, 1].set_title('Top 10 Pass Types', fontweight='bold', pad=15)
    axes[1, 1].grid(axis='x', alpha=0.3)
    axes[1, 1].invert_yaxis()
    
    # 6. Team passing comparison
    team_pass_stats = []
    for team in teams:
        team_pass = passes[passes['team_name'] == team]
        total = len(team_pass)
        complete = len(team_pass[team_pass['pass_outcome'] == 'Complete'])
        team_pass_stats.append({
            'Team': team,
            'Total': total,
            'Complete': complete,
            'Accuracy': (complete/total*100) if total > 0 else 0
        })
    
    pass_stats_df = pd.DataFrame(team_pass_stats)
    x = np.arange(len(pass_stats_df))
    width = 0.35
    
    bars1 = axes[1, 2].bar(x - width/2, pass_stats_df['Complete'], width, 
                           label='Complete', color='#2ecc71', edgecolor='black')
    bars2 = axes[1, 2].bar(x + width/2, pass_stats_df['Total'] - pass_stats_df['Complete'], 
                           width, label='Incomplete', color='#e74c3c', edgecolor='black')
    
    axes[1, 2].set_xlabel('Team', fontweight='bold')
    axes[1, 2].set_ylabel('Number of Passes', fontweight='bold')
    axes[1, 2].set_title('Passing Accuracy Comparison', fontweight='bold', pad=15)
    axes[1, 2].set_xticks(x)
    axes[1, 2].set_xticklabels(pass_stats_df['Team'])
    axes[1, 2].legend()
    axes[1, 2].grid(axis='y', alpha=0.3)
    
    # Add accuracy percentages on top
    for i, row in pass_stats_df.iterrows():
        axes[1, 2].text(i, row['Total'] + 10, f"{row['Accuracy']:.1f}%",
                       ha='center', va='bottom', fontweight='bold', fontsize=11)
    
    plt.tight_layout()
    plt.savefig('advanced_pass_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("\n‚öΩ Passing Statistics:")
    print("="*80)
    print(f"Total Passes: {len(passes):,}")
    if len(valid_lengths) > 0:
        print(f"Average Pass Length: {valid_lengths.mean():.2f} meters")
        print(f"Median Pass Length: {valid_lengths.median():.2f} meters")
        print(f"Longest Pass: {valid_lengths.max():.2f} meters")
    print(f"\nPass Success Rate: {len(passes[passes['pass_outcome'] == 'Complete'])/len(passes)*100:.1f}%")
    print("\nBy Team:")
    for _, row in pass_stats_df.iterrows():
        print(f"  {row['Team']}: {row['Complete']}/{row['Total']} ({row['Accuracy']:.1f}%)")

## 7. Shot Analysis

In [None]:
# Shot analysis
shots = df[df['event_type'] == 'Shot'].copy()

if len(shots) > 0:
    # Extract shot details
    shots['shot_outcome'] = shots['shot'].apply(
        lambda x: x.get('outcome', {}).get('name', 'Unknown') if isinstance(x, dict) else 'Unknown'
    )
    shots['shot_type'] = shots['shot'].apply(
        lambda x: x.get('type', {}).get('name', 'Unknown') if isinstance(x, dict) else 'Unknown'
    )
    shots['shot_body_part'] = shots['shot'].apply(
        lambda x: x.get('body_part', {}).get('name', 'Unknown') if isinstance(x, dict) else 'Unknown'
    )
    
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # 1. Shots by team
    team_shots = shots['team_name'].value_counts()
    team_shots = team_shots[team_shots.index != 'Unknown']
    colors_team = [TEAM_COLORS.get(team, 'gray') for team in team_shots.index]
    bars = axes[0, 0].bar(team_shots.index, team_shots.values, color=colors_team, 
                          edgecolor='black', linewidth=2)
    axes[0, 0].set_ylabel('Number of Shots', fontweight='bold')
    axes[0, 0].set_title('Total Shots by Team', fontweight='bold', pad=15)
    axes[0, 0].grid(axis='y', alpha=0.3)
    
    for bar in bars:
        height = bar.get_height()
        axes[0, 0].text(bar.get_x() + bar.get_width()/2., height,
                       f'{int(height)}', ha='center', va='bottom', fontweight='bold', fontsize=12)
    
    # 2. Shot outcomes
    outcome_counts = shots['shot_outcome'].value_counts()
    colors_outcomes = plt.cm.Set3(range(len(outcome_counts)))
    wedges, texts, autotexts = axes[0, 1].pie(outcome_counts.values,
                                                labels=outcome_counts.index,
                                                autopct='%1.1f%%',
                                                startangle=90,
                                                colors=colors_outcomes)
    axes[0, 1].set_title('Shot Outcomes', fontweight='bold', pad=15)
    for autotext in autotexts:
        autotext.set_color('white')
        autotext.set_fontweight('bold')
    
    # 3. Shot types
    type_counts = shots['shot_type'].value_counts()
    axes[1, 0].barh(range(len(type_counts)), type_counts.values, color='orange', edgecolor='black')
    axes[1, 0].set_yticks(range(len(type_counts)))
    axes[1, 0].set_yticklabels(type_counts.index)
    axes[1, 0].set_xlabel('Frequency', fontweight='bold')
    axes[1, 0].set_title('Shot Types', fontweight='bold', pad=15)
    axes[1, 0].grid(axis='x', alpha=0.3)
    axes[1, 0].invert_yaxis()
    
    # 4. Body part used
    body_counts = shots['shot_body_part'].value_counts()
    axes[1, 1].bar(body_counts.index, body_counts.values, color='steelblue', edgecolor='black')
    axes[1, 1].set_ylabel('Frequency', fontweight='bold')
    axes[1, 1].set_title('Shots by Body Part', fontweight='bold', pad=15)
    axes[1, 1].grid(axis='y', alpha=0.3)
    plt.setp(axes[1, 1].xaxis.get_majorticklabels(), rotation=45, ha='right')
    
    plt.tight_layout()
    plt.savefig('shot_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("\nüéØ Shot Statistics:")
    print("="*80)
    print(f"Total Shots: {len(shots)}")
    print(f"\nBy Team:")
    for team in teams:
        team_shot = shots[shots['team_name'] == team]
        print(f"  {team}: {len(team_shot)} shots")
    print(f"\nShot Outcomes:")
    for outcome, count in outcome_counts.items():
        print(f"  {outcome}: {count} ({count/len(shots)*100:.1f}%)")
else:
    print("No shot data available in this match.")

## 8. Player Performance Analysis

In [None]:
# Player contribution analysis
player_df = df[df['player_name'].notna()].copy()

if len(player_df) > 0:
    fig, axes = plt.subplots(2, 2, figsize=(18, 14))
    
    # 1. Top 15 most active players
    top_players = player_df['player_name'].value_counts().head(15)
    colors_grad = plt.cm.viridis(np.linspace(0.3, 0.9, len(top_players)))
    bars = axes[0, 0].barh(range(len(top_players)), top_players.values, color=colors_grad, edgecolor='black')
    axes[0, 0].set_yticks(range(len(top_players)))
    axes[0, 0].set_yticklabels(top_players.index, fontsize=10)
    axes[0, 0].set_xlabel('Number of Events', fontweight='bold')
    axes[0, 0].set_title('Top 15 Most Active Players', fontweight='bold', pad=15, fontsize=13)
    axes[0, 0].grid(axis='x', alpha=0.3)
    axes[0, 0].invert_yaxis()
    
    for i, (bar, value) in enumerate(zip(bars, top_players.values)):
        axes[0, 0].text(value, i, f' {value}', va='center', fontweight='bold', fontsize=9)
    
    # 2. Player event diversity (top 10 players)
    top_10_players = top_players.head(10).index
    player_events = player_df[player_df['player_name'].isin(top_10_players)]
    player_event_matrix = player_events.groupby(['player_name', 'event_type']).size().unstack(fill_value=0)
    top_event_types = df['event_type'].value_counts().head(8).index
    player_event_subset = player_event_matrix[[e for e in top_event_types if e in player_event_matrix.columns]]
    
    player_event_subset.plot(kind='bar', stacked=True, ax=axes[0, 1], 
                            figsize=(10, 6), edgecolor='black', linewidth=0.5)
    axes[0, 1].set_xlabel('Player', fontweight='bold')
    axes[0, 1].set_ylabel('Event Count', fontweight='bold')
    axes[0, 1].set_title('Event Distribution - Top 10 Players', fontweight='bold', pad=15, fontsize=13)
    axes[0, 1].legend(title='Event Type', bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=8)
    axes[0, 1].set_xticklabels(axes[0, 1].get_xticklabels(), rotation=45, ha='right', fontsize=9)
    axes[0, 1].grid(axis='y', alpha=0.3)
    
    # 3. Top passers
    player_passes = player_df[player_df['event_type'] == 'Pass']['player_name'].value_counts().head(10)
    axes[1, 0].bar(range(len(player_passes)), player_passes.values, color='lightcoral', edgecolor='black')
    axes[1, 0].set_xticks(range(len(player_passes)))
    axes[1, 0].set_xticklabels(player_passes.index, rotation=45, ha='right', fontsize=9)
    axes[1, 0].set_ylabel('Number of Passes', fontweight='bold')
    axes[1, 0].set_title('Top 10 Passers', fontweight='bold', pad=15, fontsize=13)
    axes[1, 0].grid(axis='y', alpha=0.3)
    
    # 4. Team player contribution
    team_player_counts = player_df.groupby(['team_name', 'player_name']).size().reset_index(name='events')
    team_player_counts = team_player_counts[team_player_counts['team_name'] != 'Unknown']
    
    for team in teams:
        team_data = team_player_counts[team_player_counts['team_name'] == team].nlargest(10, 'events')
        axes[1, 1].barh(team_data['player_name'], team_data['events'], 
                       label=team, color=TEAM_COLORS.get(team, 'gray'), alpha=0.7, edgecolor='black')
    
    axes[1, 1].set_xlabel('Number of Events', fontweight='bold')
    axes[1, 1].set_title('Top Players by Team', fontweight='bold', pad=15, fontsize=13)
    axes[1, 1].legend()
    axes[1, 1].grid(axis='x', alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('player_performance_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("\nüë§ Player Statistics:")
    print("="*80)
    print(f"Total Players Tracked: {player_df['player_name'].nunique()}")
    print("\nTop 10 Most Active Players:")
    for i, (player, count) in enumerate(top_players.head(10).items(), 1):
        team = player_df[player_df['player_name'] == player]['team_name'].iloc[0]
        print(f"  {i}. {player} ({team}): {count} events")

## 9. Defensive Actions Analysis

In [None]:
# Defensive metrics
defensive_actions = ['Pressure', 'Interception', 'Block', 'Clearance', 'Tackle', 'Duel']
defense_df = df[df['event_type'].isin(defensive_actions)].copy()

if len(defense_df) > 0:
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # 1. Defensive actions by type
    defense_counts = defense_df['event_type'].value_counts()
    axes[0, 0].bar(defense_counts.index, defense_counts.values, color='crimson', 
                   edgecolor='black', alpha=0.8)
    axes[0, 0].set_ylabel('Frequency', fontweight='bold')
    axes[0, 0].set_title('Defensive Actions Distribution', fontweight='bold', pad=15)
    axes[0, 0].grid(axis='y', alpha=0.3)
    plt.setp(axes[0, 0].xaxis.get_majorticklabels(), rotation=45, ha='right')
    
    # 2. Defensive actions by team
    team_defense = defense_df.groupby(['team_name', 'event_type']).size().unstack(fill_value=0)
    team_defense = team_defense[team_defense.index != 'Unknown']
    team_defense.plot(kind='bar', ax=axes[0, 1], stacked=True, edgecolor='black', linewidth=0.5)
    axes[0, 1].set_xlabel('Team', fontweight='bold')
    axes[0, 1].set_ylabel('Number of Actions', fontweight='bold')
    axes[0, 1].set_title('Defensive Actions by Team', fontweight='bold', pad=15)
    axes[0, 1].legend(title='Action Type', bbox_to_anchor=(1.05, 1), loc='upper left')
    axes[0, 1].set_xticklabels(axes[0, 1].get_xticklabels(), rotation=0)
    axes[0, 1].grid(axis='y', alpha=0.3)
    
    # 3. Pressure intensity over time
    pressure_df = df[df['event_type'] == 'Pressure']
    if len(pressure_df) > 0:
        pressure_timeline = pressure_df.groupby('minute').size()
        axes[1, 0].plot(pressure_timeline.index, pressure_timeline.values, 
                       marker='o', linewidth=2, color='darkred', alpha=0.8)
        axes[1, 0].fill_between(pressure_timeline.index, pressure_timeline.values, alpha=0.3, color='lightcoral')
        axes[1, 0].axvline(x=45, color='blue', linestyle='--', linewidth=2, alpha=0.5)
        axes[1, 0].set_xlabel('Match Minute', fontweight='bold')
        axes[1, 0].set_ylabel('Pressure Events', fontweight='bold')
        axes[1, 0].set_title('Pressure Intensity Timeline', fontweight='bold', pad=15)
        axes[1, 0].grid(True, alpha=0.3)
    
    # 4. Top defenders
    defenders = defense_df[defense_df['player_name'].notna()]['player_name'].value_counts().head(10)
    axes[1, 1].barh(range(len(defenders)), defenders.values, color='navy', edgecolor='black')
    axes[1, 1].set_yticks(range(len(defenders)))
    axes[1, 1].set_yticklabels(defenders.index, fontsize=10)
    axes[1, 1].set_xlabel('Defensive Actions', fontweight='bold')
    axes[1, 1].set_title('Top 10 Defensive Players', fontweight='bold', pad=15)
    axes[1, 1].grid(axis='x', alpha=0.3)
    axes[1, 1].invert_yaxis()
    
    for i, v in enumerate(defenders.values):
        axes[1, 1].text(v, i, f' {v}', va='center', fontweight='bold')
    
    plt.tight_layout()
    plt.savefig('defensive_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("\nüõ°Ô∏è Defensive Statistics:")
    print("="*80)
    print(f"Total Defensive Actions: {len(defense_df)}")
    print("\nBy Action Type:")
    for action, count in defense_counts.items():
        print(f"  {action}: {count}")
    print("\nBy Team:")
    for team in teams:
        team_def = defense_df[defense_df['team_name'] == team]
        print(f"  {team}: {len(team_def)} defensive actions")

## 10. Play Pattern & Possession Analysis

In [None]:
# Play patterns and possession
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# 1. Play patterns distribution
play_patterns = df['play_pattern_name'].value_counts()
colors_patterns = plt.cm.Paired(range(len(play_patterns)))
bars = axes[0, 0].barh(range(len(play_patterns)), play_patterns.values, 
                       color=colors_patterns, edgecolor='black')
axes[0, 0].set_yticks(range(len(play_patterns)))
axes[0, 0].set_yticklabels(play_patterns.index)
axes[0, 0].set_xlabel('Frequency', fontweight='bold')
axes[0, 0].set_title('Play Patterns Distribution', fontweight='bold', pad=15)
axes[0, 0].grid(axis='x', alpha=0.3)
axes[0, 0].invert_yaxis()

for i, v in enumerate(play_patterns.values):
    axes[0, 0].text(v, i, f' {v:,}', va='center', fontweight='bold')

# 2. Possession distribution
possession_dist = df['possession_team_name'].value_counts()
possession_dist = possession_dist[possession_dist.index != 'Unknown']
colors_poss = [TEAM_COLORS.get(team, 'gray') for team in possession_dist.index]

wedges, texts, autotexts = axes[0, 1].pie(possession_dist.values,
                                            labels=possession_dist.index,
                                            autopct='%1.1f%%',
                                            startangle=90,
                                            colors=colors_poss,
                                            explode=[0.1] * len(possession_dist),
                                            shadow=True,
                                            textprops={'fontweight': 'bold', 'fontsize': 12})
axes[0, 1].set_title('Overall Possession Distribution', fontweight='bold', pad=15, fontsize=14)

for autotext in autotexts:
    autotext.set_color('white')
    autotext.set_fontsize(13)

# 3. Possession phases over time
possession_timeline = df.groupby(['minute', 'possession_team_name']).size().unstack(fill_value=0)
possession_timeline = possession_timeline[[c for c in teams if c in possession_timeline.columns]]

for team in possession_timeline.columns:
    axes[1, 0].plot(possession_timeline.index, possession_timeline[team], 
                   label=team, linewidth=2, color=TEAM_COLORS.get(team, 'gray'), alpha=0.8)

axes[1, 0].axvline(x=45, color='red', linestyle='--', linewidth=2, alpha=0.5, label='Half-time')
axes[1, 0].set_xlabel('Match Minute', fontweight='bold')
axes[1, 0].set_ylabel('Events in Possession', fontweight='bold')
axes[1, 0].set_title('Possession Flow Timeline', fontweight='bold', pad=15)
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# 4. Carry events (ball progression)
carries = df[df['event_type'] == 'Carry']
if len(carries) > 0:
    team_carries = carries['team_name'].value_counts()
    team_carries = team_carries[team_carries.index != 'Unknown']
    colors_carry = [TEAM_COLORS.get(team, 'gray') for team in team_carries.index]
    
    bars = axes[1, 1].bar(team_carries.index, team_carries.values, 
                         color=colors_carry, edgecolor='black', linewidth=2)
    axes[1, 1].set_ylabel('Number of Carries', fontweight='bold')
    axes[1, 1].set_title('Ball Carries by Team', fontweight='bold', pad=15)
    axes[1, 1].grid(axis='y', alpha=0.3)
    
    for bar in bars:
        height = bar.get_height()
        axes[1, 1].text(bar.get_x() + bar.get_width()/2., height,
                       f'{int(height)}', ha='center', va='bottom', fontweight='bold', fontsize=12)

plt.tight_layout()
plt.savefig('possession_playpattern_analysis.png', dpi=300, bbox_inches='tight')
plt.show()

print("\nüìä Play Pattern & Possession Summary:")
print("="*80)
print("Play Patterns:")
for pattern, count in play_patterns.items():
    print(f"  {pattern}: {count:,} ({count/len(df)*100:.1f}%)")
print("\nPossession:")
for team, count in possession_dist.items():
    print(f"  {team}: {count:,} events ({count/len(df)*100:.1f}%)")

## 11. Event Heatmap & Correlations

In [None]:
# Comprehensive heatmaps
fig, axes = plt.subplots(2, 1, figsize=(16, 14))

# 1. Period vs Event Type
period_event_matrix = df.groupby(['period', 'event_type']).size().unstack(fill_value=0)
top_events_heatmap = df['event_type'].value_counts().head(12).index
period_event_subset = period_event_matrix[top_events_heatmap]

sns.heatmap(period_event_subset.T, annot=True, fmt='d', cmap='YlOrRd', 
            cbar_kws={'label': 'Event Count'}, ax=axes[0], linewidths=1, linecolor='white')
axes[0].set_title('Event Distribution: Period vs Event Type (Top 12 Events)', 
                 fontweight='bold', pad=20, fontsize=14)
axes[0].set_xlabel('Period', fontweight='bold', fontsize=12)
axes[0].set_ylabel('Event Type', fontweight='bold', fontsize=12)

# 2. Team vs Event Type
team_event_matrix = df.groupby(['team_name', 'event_type']).size().unstack(fill_value=0)
team_event_matrix = team_event_matrix[team_event_matrix.index != 'Unknown']
team_event_subset = team_event_matrix[top_events_heatmap]

sns.heatmap(team_event_subset.T, annot=True, fmt='d', cmap='Blues', 
            cbar_kws={'label': 'Event Count'}, ax=axes[1], linewidths=1, linecolor='white')
axes[1].set_title('Event Distribution: Team vs Event Type (Top 12 Events)', 
                 fontweight='bold', pad=20, fontsize=14)
axes[1].set_xlabel('Team', fontweight='bold', fontsize=12)
axes[1].set_ylabel('Event Type', fontweight='bold', fontsize=12)

plt.tight_layout()
plt.savefig('correlation_heatmaps.png', dpi=300, bbox_inches='tight')
plt.show()

## 12. Executive Summary & Key Insights

In [None]:
print("="*90)
print("BARCELONA vs VALENCIA - MATCH ANALYSIS EXECUTIVE SUMMARY")
print("="*90)

print("\nüìä MATCH OVERVIEW")
print("-" * 90)
print(f"Total Events Analyzed: {len(df):,}")
print(f"Match Duration: {df['minute'].max()} minutes")
print(f"Number of Periods: {df['period'].nunique()}")
print(f"Unique Event Types: {df['event_type'].nunique()}")
print(f"Total Players Tracked: {df['player_name'].nunique() - 1}")

print("\n‚öΩ TEAM STATISTICS")
print("-" * 90)
for team in teams:
    team_df = df[df['team_name'] == team]
    team_poss = df[df['possession_team_name'] == team]
    print(f"\n{team}:")
    print(f"  Total Actions: {len(team_df):,} ({len(team_df)/len(df)*100:.1f}%)")
    print(f"  Possession: {len(team_poss):,} events ({len(team_poss)/len(df)*100:.1f}%)")
    print(f"  Passes: {len(team_df[team_df['event_type'] == 'Pass'])}")
    print(f"  Shots: {len(team_df[team_df['event_type'] == 'Shot'])}")
    print(f"  Defensive Actions: {len(team_df[team_df['event_type'].isin(defensive_actions)])}")

print("\nüéØ TOP 5 EVENT TYPES")
print("-" * 90)
for i, (event, count) in enumerate(df['event_type'].value_counts().head(5).items(), 1):
    pct = (count / len(df)) * 100
    print(f"{i}. {event}: {count:,} ({pct:.1f}%)")

if len(passes) > 0:
    print("\nüìç PASSING INSIGHTS")
    print("-" * 90)
    print(f"Total Passes: {len(passes):,}")
    print(f"Pass Completion Rate: {len(passes[passes['pass_outcome'] == 'Complete'])/len(passes)*100:.1f}%")
    if len(valid_lengths) > 0:
        print(f"Average Pass Length: {valid_lengths.mean():.2f} meters")
    print(f"\nBy Team:")
    for team in teams:
        team_pass = passes[passes['team_name'] == team]
        if len(team_pass) > 0:
            complete = len(team_pass[team_pass['pass_outcome'] == 'Complete'])
            accuracy = (complete/len(team_pass)*100) if len(team_pass) > 0 else 0
            print(f"  {team}: {len(team_pass)} passes, {accuracy:.1f}% accuracy")

if len(shots) > 0:
    print("\nüéØ SHOOTING ANALYSIS")
    print("-" * 90)
    print(f"Total Shots: {len(shots)}")
    for team in teams:
        team_shots = shots[shots['team_name'] == team]
        print(f"  {team}: {len(team_shots)} shots")

print("\nüë§ TOP 5 PLAYERS")
print("-" * 90)
top_5_players = player_df['player_name'].value_counts().head(5)
for i, (player, count) in enumerate(top_5_players.items(), 1):
    team = player_df[player_df['player_name'] == player]['team_name'].iloc[0]
    print(f"{i}. {player} ({team}): {count} events")

print("\n‚è±Ô∏è TEMPORAL INSIGHTS")
print("-" * 90)
for period in sorted(df['period'].unique()):
    period_data = df[df['period'] == period]
    print(f"Period {period}: {len(period_data):,} events")

print("\nüõ°Ô∏è DEFENSIVE METRICS")
print("-" * 90)
print(f"Total Defensive Actions: {len(defense_df):,}")
for team in teams:
    team_def = defense_df[defense_df['team_name'] == team]
    print(f"  {team}: {len(team_def)} defensive actions")

print("\n" + "="*90)
print("‚úÖ COMPREHENSIVE ANALYSIS COMPLETE")
print("="*90)
print("\nAll visualizations have been saved as high-resolution PNG files.")
print("Check the current directory for:")
print("  ‚Ä¢ comprehensive_event_analysis.png")
print("  ‚Ä¢ team_performance_analysis.png")
print("  ‚Ä¢ temporal_match_analysis.png")
print("  ‚Ä¢ advanced_pass_analysis.png")
print("  ‚Ä¢ shot_analysis.png")
print("  ‚Ä¢ player_performance_analysis.png")
print("  ‚Ä¢ defensive_analysis.png")
print("  ‚Ä¢ possession_playpattern_analysis.png")
print("  ‚Ä¢ correlation_heatmaps.png")
print("\n" + "="*90)

## 13. Data Export

In [None]:
# Export processed data
export_columns = ['index', 'period', 'minute', 'second', 'event_type', 'team_name', 
                 'player_name', 'possession_team_name', 'play_pattern_name', 'duration']
df_export = df[export_columns].copy()

df_export.to_csv('barcelona_valencia_processed_data.csv', index=False)
print("‚úì Processed data exported to 'barcelona_valencia_processed_data.csv'")

# Export comprehensive statistics
summary_stats = {
    'match_info': {
        'total_events': len(df),
        'duration_minutes': int(df['minute'].max()),
        'periods': int(df['period'].nunique()),
        'teams': teams
    },
    'event_distribution': df['event_type'].value_counts().to_dict(),
    'team_statistics': {
        team: {
            'total_events': int(len(df[df['team_name'] == team])),
            'possession_events': int(len(df[df['possession_team_name'] == team])),
            'passes': int(len(df[(df['team_name'] == team) & (df['event_type'] == 'Pass')])),
            'shots': int(len(df[(df['team_name'] == team) & (df['event_type'] == 'Shot')]))
        } for team in teams
    },
    'period_distribution': df['period'].value_counts().to_dict(),
    'play_patterns': df['play_pattern_name'].value_counts().to_dict()
}

with open('barcelona_valencia_summary_stats.json', 'w') as f:
    json.dump(summary_stats, f, indent=2)
    
print("‚úì Summary statistics exported to 'barcelona_valencia_summary_stats.json'")
print("\n‚úÖ All exports completed successfully!")

---

## Conclusion

This comprehensive analysis of the Barcelona vs Valencia match provides deep insights into:

### Key Findings:
- **Match Dynamics**: Detailed event flow and temporal patterns throughout the match
- **Team Performance**: Comparative analysis of both teams' attacking and defensive metrics
- **Possession Control**: Clear visualization of possession distribution and ball progression
- **Passing Networks**: In-depth analysis of passing patterns, accuracy, and effectiveness
- **Player Impact**: Individual contributions and performance rankings
- **Tactical Insights**: Play patterns and strategic approaches employed by both teams
- **Defensive Organization**: Pressure maps and defensive action analysis

### Deliverables:
- 9 high-resolution visualization charts (300 DPI)
- Processed dataset (CSV format)
- Comprehensive summary statistics (JSON format)
- Detailed statistical analysis and insights

---

**For further analysis or questions, refer to the exported files and visualizations.**

*Analysis completed using Python, pandas, matplotlib, and seaborn.*