# Creating Violin Plots for MMA Fight Analysis
## A Tutorial for Sports Science Data Visualisation

### Using This Notebook

**Running the code:**
1. **Code cells**: Click on grey boxes and press `Shift + Enter` or click the play button (▶️)
2. **Order matters**: Run cells from top to bottom - each cell builds on previous ones
3. **Text cells**: White boxes contain explanations - no need to run these
4. **Saving your work**: 
   - **Google Colab**: File → Save a copy in Drive (creates your own editable version)
   - **Jupyter**: File → Download as → Notebook (.ipynb)

---

## About This Tutorial

This tutorial recreates the violin plots from Barley et al. (2025) studying the influence of height and reach on fight-ending punches in the UFC. You'll learn to:

- Generate realistic MMA fighter data
- Create violin plots with box plot overlays
- Understand the statistical patterns in the visualisations
- Apply these techniques to sports science data

### Target Visualisation

![Figure 1 from Barley et al. (2025)](https://raw.githubusercontent.com/yourusername/yourrepo/main/SCR-20250608-ivum.png)

*Note: To use this image in Google Colab, you'll need to upload the image file to a public repository (like GitHub) and update the URL above.*

### Understanding Violin Plots

Violin plots combine box plots and density plots, showing:
- **Distribution shape** (the violin curve)
- **Median and quartiles** (inner box plot)
- **Individual data points** (dots)
- **Sample size** (labelled on x-axis)

We examine how height and reach differences between fighters affect success with different punch types (hooks, overhand, straight, uppercut).

## Step 1: Installing and Importing Libraries

Let's start by importing the Python libraries we'll need. Run this cell first!

In [None]:
# Import necessary libraries for data analysis and visualisation
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducible results
np.random.seed(42)

# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

print("Libraries imported successfully")
print("Ready to create violin plots.")

## Step 2: Understanding Our Data

Before we create the data, let's understand what we're measuring:

### Key Variables:
- **Height Difference**: Winner's height minus loser's height (cm)
- **Reach Difference**: Winner's reach minus loser's reach (cm)
- **Combined Difference**: Height + Reach difference (cm)
- **Punch Type**: Hook, Overhand, Straight, Uppercut

### What the Numbers Mean:
- **Positive values**: Winner was taller/had longer reach
- **Negative values**: Winner was shorter/had shorter reach
- **Zero**: Both fighters were equal in that dimension

This helps determine whether physical advantages translate to fighting success.

## Step 3: Generating Realistic MMA Data

We'll create pseudo data matching the patterns from the original study, based on typical MMA fighter measurements and fight outcomes.

In [None]:
# We'll create pseudo data that matches the patterns from the original study

# Sample sizes for each punch type (matching the study)
hook_fights = 136
overhand_fights = 14  
straight_fights = 89
uppercut_fights = 23

# Set random seed so everyone gets the same "random" data
np.random.seed(42)

# Create empty lists to store our data
punch_types = []
height_diffs = []
reach_diffs = []
combined_diffs = []

# Generate data for Hook punches (slight positive bias - taller fighters have small advantage)
for i in range(hook_fights):
    punch_types.append('Hook')
    height_diff = np.random.normal(0, 8)  # Random number around 0 with spread of 8
    reach_diff = np.random.normal(0.28, 12)  # Random number around 0.28 with spread of 12
    combined_diff = height_diff + reach_diff + np.random.normal(0.4, 15)
    
    height_diffs.append(height_diff)
    reach_diffs.append(reach_diff)
    combined_diffs.append(combined_diff)

# Generate data for Overhand punches (shorter fighters have advantage)
for i in range(overhand_fights):
    punch_types.append('Overhand')
    height_diff = np.random.normal(-4, 5)  # Negative bias (shorter fighters win more)
    reach_diff = np.random.normal(-5.43, 8)
    combined_diff = height_diff + reach_diff + np.random.normal(-7.93, 12)
    
    height_diffs.append(height_diff)
    reach_diffs.append(reach_diff)
    combined_diffs.append(combined_diff)

# Generate data for Straight punches (taller fighters have advantage)
for i in range(straight_fights):
    punch_types.append('Straight')
    height_diff = np.random.normal(3, 12)  # Positive bias (taller fighters win more)
    reach_diff = np.random.normal(1.63, 13)
    combined_diff = height_diff + reach_diff + np.random.normal(3.62, 18)
    
    height_diffs.append(height_diff)
    reach_diffs.append(reach_diff)
    combined_diffs.append(combined_diff)

# Generate data for Uppercut punches (shorter fighters have slight advantage)
for i in range(uppercut_fights):
    punch_types.append('Uppercut')
    height_diff = np.random.normal(-4, 8)  # Negative bias (shorter fighters win more)
    reach_diff = np.random.normal(-1.74, 10)
    combined_diff = height_diff + reach_diff + np.random.normal(-4.13, 16)
    
    height_diffs.append(height_diff)
    reach_diffs.append(reach_diff)
    combined_diffs.append(combined_diff)

# Create a table (DataFrame) with all our data
# Think of this like creating a spreadsheet with columns and rows
mma_data = pd.DataFrame({
    'punch_type': punch_types,
    'height_difference': height_diffs,
    'reach_difference': reach_diffs,
    'combined_difference': combined_diffs
})

# Display basic information about the data we created
print("MMA Fight Data Generated")
print(f"Total fights analysed: {len(mma_data)}")
print("\nFights by punch type:")
print(mma_data['punch_type'].value_counts())

# Show first few rows
print("\nFirst 5 rows of data:")
print(mma_data.head())

## Step 4: Exploring Our Data

Calculate basic statistics to understand the data before creating visualisations.

In [None]:
# Let's look at the basic statistics for each punch type
# This helps us understand the patterns in the data

measurements = ['height_difference', 'reach_difference', 'combined_difference']

for measurement in measurements:
    print(f"\n{measurement.replace('_', ' ').title()} Statistics:")
    print("-" * 50)
    
    # Look at each punch type
    for punch_type in ['Hook', 'Overhand', 'Straight', 'Uppercut']:
        # Filter data for this punch type only
        data_for_this_punch = mma_data[mma_data['punch_type'] == punch_type][measurement]
        
        # Calculate basic statistics
        median_val = data_for_this_punch.median()  # Middle value when sorted
        mean_val = data_for_this_punch.mean()     # Average value
        n = len(data_for_this_punch)              # Number of fights
        
        print(f"{punch_type:>10} (n={n:>3}): Median = {median_val:>6.2f} cm, Mean = {mean_val:>6.2f} cm")

print("\n" + "="*60)
print("Key Insights:")
print("• Positive values = Winner was taller/had longer reach")
print("• Negative values = Winner was shorter/had shorter reach")
print("• Different punch types show different patterns")

## Step 5: Creating Our Violin Plots

Creating violin plots that recreate the exact style from the research paper.

### Plot Elements:
- **Violin shape**: Distribution of data (wider = more data points at that value)
- **Box plot inside**: Median (middle line), quartiles (box edges), and outliers
- **Individual points**: Each dot represents one fight
- **Median labels**: The μ values shown in boxes

In [None]:
# Create a figure with 3 subplots (one for each measurement)
# Think of this like creating 3 empty canvases stacked vertically
fig, axes = plt.subplots(3, 1, figsize=(12, 16))

# Add a main title for all plots
fig.suptitle('Box-violin plots illustrating (A) height difference, (B) reach difference, and (C) combined difference, stratified by punch type.', 
             fontsize=14, fontweight='bold', y=0.95)

# We'll create the same plot 3 times, once for each measurement
measurements_to_plot = [
    ('height_difference', 'Height difference (cm)', '(A)'),
    ('reach_difference', 'Reach difference (cm)', '(B)'),
    ('combined_difference', 'Combined difference (cm)', '(C)')
]

# Loop through each measurement and create a plot
for plot_num, (measure_name, y_label, title) in enumerate(measurements_to_plot):
    
    # Get the current subplot (canvas)
    ax = axes[plot_num]
    
    # Prepare data for each punch type
    punch_types = ['Hook', 'Overhand', 'Straight', 'Uppercut']
    data_by_punch = []
    
    for punch_type in punch_types:
        # Get all values for this punch type and measurement
        values = mma_data[mma_data['punch_type'] == punch_type][measure_name].values
        data_by_punch.append(values)
    
    # Create violin plots (the main "blob" shapes)
    # These show the distribution of data - wider parts mean more data points at that value
    parts = ax.violinplot(data_by_punch, positions=[1, 2, 3, 4], widths=0.8, 
                         showmeans=False, showmedians=False, showextrema=False)
    
    # Style the violin plots to look like the research paper
    for pc in parts['bodies']:
        pc.set_facecolor('#E6E6E6')  # Light grey colour
        pc.set_edgecolor('black')    # Black outline
        pc.set_linewidth(1)
        pc.set_alpha(0.7)            # Slightly transparent
    
    # Add box plots inside the violins
    # Box plots show median, quartiles, and outliers
    bp = ax.boxplot(data_by_punch, positions=[1, 2, 3, 4], widths=0.3, 
                   patch_artist=True,
                   boxprops=dict(facecolor='white', color='black', linewidth=1.5),
                   medianprops=dict(color='black', linewidth=2),
                   whiskerprops=dict(color='black', linewidth=1.5),
                   capprops=dict(color='black', linewidth=1.5),
                   flierprops=dict(marker='o', markerfacecolor='black', markersize=4, alpha=0.6))
    
    # Add individual data points as dots
    for i, punch_type in enumerate(punch_types):
        y_data = mma_data[mma_data['punch_type'] == punch_type][measure_name].values
        # Add some random spread (jitter) so dots don't overlap exactly
        x_data = np.random.normal(i+1, 0.1, len(y_data))
        ax.scatter(x_data, y_data, alpha=0.6, s=8, color='black')
    
    # Add median value labels (those little boxes with numbers)
    for i, punch_type in enumerate(punch_types):
        median_val = mma_data[mma_data['punch_type'] == punch_type][measure_name].median()
        
        # Create a text box showing the median value
        ax.text(i+1.35, median_val, f'μmedian = {median_val:.2f}', 
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', edgecolor='black'),
                fontsize=9, ha='left', va='center')
    
    # Customise the plot appearance
    ax.set_title(title, fontsize=14, fontweight='bold', pad=20)
    ax.set_ylabel(y_label, fontsize=12)
    ax.set_xlabel('Punch type', fontsize=12)
    
    # Set x-axis labels with sample sizes
    sample_sizes = [len(mma_data[mma_data['punch_type'] == pt]) for pt in punch_types]
    x_labels = [f"{pt}\n(n = {n})" for pt, n in zip(punch_types, sample_sizes)]
    ax.set_xticks([1, 2, 3, 4])
    ax.set_xticklabels(x_labels)
    
    # Add horizontal line at zero (no advantage either way)
    ax.axhline(y=0, color='black', linestyle='--', alpha=0.5, linewidth=1)
    
    # Add grid for easier reading
    ax.grid(True, alpha=0.3)
    ax.set_axisbelow(True)  # Put grid behind the data

# Adjust spacing between plots
plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.show()

print("Violin plots created successfully")
print("\nThese plots show how physical advantages (height and reach) relate to")
print("fighting success for different punch types in MMA.")

## Step 6: Interpreting the Results

Analysis of the violin plots reveals several patterns from the original study:

### Key Findings:

**Hook Punches:**
- Relatively balanced distribution around zero
- Slight advantage for fighters with longer reach
- Most common punch type in the dataset

**Overhand Punches:**
- Negative median values suggest shorter fighters have an advantage
- Biomechanically logical - shorter fighters can get under taller opponents
- Smallest sample size but clear pattern

**Straight Punches:**
- Positive median values suggest taller fighters with longer reach have advantages
- Aligns with expectations - longer reach improves straight punch effectiveness
- Large sample size provides confidence in this pattern

**Uppercut Punches:**
- Negative median suggests shorter fighters have advantages
- Biomechanically logical - shorter fighters can generate more upward power
- Moderate sample size shows consistent pattern

### Implications:
- **Fight strategy**: Coaches can use this data to develop game plans
- **Matchmaking**: Understanding physical matchups helps predict fight dynamics
- **Training focus**: Fighters can emphasise techniques that suit their physical attributes

## Step 7: Exporting Your Data and Plots

Want to save your work? Here's how to export your data and plots:

In [None]:
# Save the data to a CSV file (like an Excel spreadsheet)
mma_data.to_csv('mma_fight_data.csv', index=False)
print("Data saved to 'mma_fight_data.csv'")

# Save the plots as a high-resolution image
# We need to recreate the plots to save them properly
fig, axes = plt.subplots(3, 1, figsize=(12, 16))
fig.suptitle('MMA Violin Plots - Height, Reach, and Combined Differences by Punch Type', 
             fontsize=14, fontweight='bold', y=0.95)

# Recreate the plots (same code as before, but condensed)
measurements_to_plot = [
    ('height_difference', 'Height difference (cm)', '(A)'),
    ('reach_difference', 'Reach difference (cm)', '(B)'),
    ('combined_difference', 'Combined difference (cm)', '(C)')
]

for plot_num, (measure_name, y_label, title) in enumerate(measurements_to_plot):
    ax = axes[plot_num]
    punch_types = ['Hook', 'Overhand', 'Straight', 'Uppercut']
    data_by_punch = [mma_data[mma_data['punch_type'] == pt][measure_name].values for pt in punch_types]
    
    # Create violin and box plots
    parts = ax.violinplot(data_by_punch, positions=[1, 2, 3, 4], widths=0.8, 
                         showmeans=False, showmedians=False, showextrema=False)
    
    for pc in parts['bodies']:
        pc.set_facecolor('#E6E6E6')
        pc.set_edgecolor('black')
        pc.set_linewidth(1)
        pc.set_alpha(0.7)
    
    bp = ax.boxplot(data_by_punch, positions=[1, 2, 3, 4], widths=0.3, patch_artist=True,
                   boxprops=dict(facecolor='white', color='black', linewidth=1.5),
                   medianprops=dict(color='black', linewidth=2),
                   whiskerprops=dict(color='black', linewidth=1.5),
                   capprops=dict(color='black', linewidth=1.5),
                   flierprops=dict(marker='o', markerfacecolor='black', markersize=4, alpha=0.6))
    
    # Add scatter points and median labels
    for i, punch_type in enumerate(punch_types):
        y_data = mma_data[mma_data['punch_type'] == punch_type][measure_name].values
        x_data = np.random.normal(i+1, 0.1, len(y_data))
        ax.scatter(x_data, y_data, alpha=0.6, s=8, color='black')
        
        median_val = mma_data[mma_data['punch_type'] == punch_type][measure_name].median()
        ax.text(i+1.35, median_val, f'μmedian = {median_val:.2f}', 
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', edgecolor='black'),
                fontsize=9, ha='left', va='center')
    
    # Customise appearance
    ax.set_title(title, fontsize=14, fontweight='bold', pad=20)
    ax.set_ylabel(y_label, fontsize=12)
    ax.set_xlabel('Punch type', fontsize=12)
    
    sample_sizes = [len(mma_data[mma_data['punch_type'] == pt]) for pt in punch_types]
    x_labels = [f"{pt}\n(n = {n})" for pt, n in zip(punch_types, sample_sizes)]
    ax.set_xticks([1, 2, 3, 4])
    ax.set_xticklabels(x_labels)
    ax.axhline(y=0, color='black', linestyle='--', alpha=0.5, linewidth=1)
    ax.grid(True, alpha=0.3)
    ax.set_axisbelow(True)

plt.tight_layout(rect=[0, 0, 1, 0.96])
plt.savefig('mma_violin_plots.png', dpi=300, bbox_inches='tight')
plt.show()

print("High-resolution plot saved to 'mma_violin_plots.png'")
print("\nYou can download these files from the Files panel on the left.")

## Step 8: Next Steps and Applications

You've successfully created professional violin plots for sports science analysis. Next steps:

### For Sports Scientists:
- Apply this to your own combat sports data
- Analyse other physical attributes (weight, leg reach, etc.)
- Compare different fighting styles or weight classes
- Study temporal trends in fighting techniques

### For Data Analysts:
- Adapt the violin plot code for any categorical vs continuous analysis
- Use the statistical testing framework for your own datasets
- Experiment with different colour schemes and styling
- Combine with other plot types for comprehensive analysis

### For MMA Enthusiasts:
- Analyse your favourite fighters' physical attributes
- Predict fight outcomes based on physical matchups
- Create similar analyses for other combat sports
- Share insights with the MMA community

### Learning Resources:
- **Matplotlib documentation**: https://matplotlib.org/
- **Seaborn tutorials**: https://seaborn.pydata.org/tutorial.html
- **Pandas for data analysis**: https://pandas.pydata.org/docs/
- **Sports analytics with Python**: Various online courses and books

---

### Key Takeaways:
1. **Violin plots** are excellent for showing both distribution shape and summary statistics
2. **Physical attributes** can significantly influence fighting effectiveness
3. **Different techniques** favour different body types
4. **Data visualisation** is crucial for understanding complex relationships
5. **Statistical testing** helps validate our visual observations

You now have the skills to create professional sports science visualisations. The world of sports data contains many fascinating insights waiting to be discovered.