# Hypothesis Testing on IFSC and 8a.nu Climbing Data

## Introduction

This section explores statistical relationships between outdoor and competition climbing performance using a merged dataset derived from two key sources: **IFSC** (competition climbing data) and **8a.nu** (outdoor climbing data), which was scraped and merged in the notebooks located in the `01_data_collection` folder. For details on the data collection and merging process, refer to those notebooks. By analyzing this unified dataset, we aim to investigate whether measurable trends or correlations exist between climbers’ achievements across different contexts.

## Objectives

The main goals of this analysis are to test the following hypotheses:

1. **Outdoor vs. Competition Performance**
   Do climbers with stronger outdoor climbing metrics—such as higher maximum grade or a greater number of 8c+ ascents—also perform better in competitions?
   *Separate hypothesis tests will be conducted for each competition discipline: bouldering, lead, and combined.*

2. **Interdisciplinary Competition Correlation**
   Is there a statistically significant correlation between performance in different IFSC competition disciplines (e.g., do high lead scores correlate with high boulder scores)?

3. **Outdoor Performance Correlation**
   Within outdoor climbing, is there a meaningful correlation between the metrics (e.g., do higher average grades correlate with higher best grade climbed)?


In [5]:
import pandas as pd
import scipy.stats as stats

In [4]:
df = pd.read_csv('../data/final_data.csv')

## Hypothesis Test: Outdoor Grade 8c+ vs. Competition Performance

Do climbers perform better in **competitions** if they have climbed an **outdoor grade of 8c+ or above**?

### Methodology

- **Test Type:** One-tailed, two-sample t-test
- **Grouping Variable:** Whether the climber has climbed 8c+ or above
- **Dependent Variable:** Competition points (tested separately for 3 disciplines)

This test aims to determine whether there is a statistically significant difference in competition performance between climbers who have reached the 8c+ outdoor benchmark and those who have not.



### Test 1: Outdoor Grade 8c+ vs **Bouldering**
Do climbers perform better in **bouldering competitions** if they have climbed an **outdoor grade of 8c+ or above**?

#### Hypotheses

- **Null Hypothesis (H₀):**  Climbing an outdoor grade of 8c+ or above has no effect on bouldering competition performance.
Mathematically:
  μ₁ = μ₂  *(Where μ₁ = mean boulder points for climbers who have climbed 8c+ or above, μ₂ = mean boulder points for those who have not.)*

- **Alternative Hypothesis (Hₐ):**  Climbers who have climbed an outdoor grade of 8c+ or above perform better in bouldering competitions.
Mathematically:
  μ₁ > μ₂

In [7]:
# Filter bouldering athletes
boulder_athletes = df[df['boulder_points'] > 0]

# Define the two samples
# Sample 1: Climbers with 8c+ or above
sample_1 = boulder_athletes[boulder_athletes['count_8c_plus'] > 0]['boulder_points']

# Sample 2: Climbers without 8c+ (count_8c_plus is NaN for no ascents)
sample_2 = boulder_athletes[boulder_athletes['count_8c_plus'].isna()]['boulder_points']

# Perform one-tailed t-test (Hₐ: μ₁ > μ₂)
t_stat, p_value = stats.ttest_ind(sample_1, sample_2, equal_var=False, alternative='greater')

print(f"t-statistic: {t_stat:.4f}")
print(f"One-tailed p-value: {p_value:.4f}")

t-statistic: 1.8086
One-tailed p-value: 0.0382


Since the p-value (0.0382) is **less than** the significance level (0.05), we **reject the null hypothesis (H₀)**.

**Conclusion:**  Climbers who have climbed a grade of 8c+ or above perform significantly better in bouldering competitions.

### Test 2: Outdoor Grade 8c+ vs **Lead**
Do climbers perform better in **lead competitions** if they have climbed an **outdoor grade of 8c+ or above**?

#### Hypotheses

- **Null Hypothesis (H₀):**  Climbing an outdoor grade of 8c+ or above has no effect on lead competition performance.
Mathematically:
  μ₁ = μ₂  *(Where μ₁ = mean lead points for climbers who have climbed 8c+ or above, μ₂ = mean lead points for those who have not.)*

- **Alternative Hypothesis (Hₐ):**  Climbers who have climbed an outdoor grade of 8c+ or above perform better in lead competitions.
Mathematically:
  μ₁ > μ₂

In [8]:
# Filter lead athletes
combined_athletes = df[df['lead_points'] > 0]

# Define the two samples
# Sample 1: Climbers with 8c+ or above
sample_1 = combined_athletes[combined_athletes['count_8c_plus'] > 0]['lead_points']

# Sample 2: Climbers without 8c+ (count_8c_plus is NaN for no ascents)
sample_2 = combined_athletes[combined_athletes['count_8c_plus'].isna()]['lead_points']

# Perform one-tailed t-test (Hₐ: μ₁ > μ₂)
t_stat, p_value = stats.ttest_ind(sample_1, sample_2, equal_var=False, alternative='greater')

print(f"t-statistic: {t_stat:.4f}")
print(f"One-tailed p-value: {p_value:.4f}")

t-statistic: 2.3655
One-tailed p-value: 0.0105


Since the p-value (0.0105) is **less than** the significance level (0.05), we **reject the null hypothesis (H₀)**.

**Conclusion:**  Climbers who have climbed a grade of 8c+ or above perform significantly better in lead competitions.

### Test 3: Outdoor Grade 8c+ vs **Combined**
Do climbers perform better in **combined competitions** if they have climbed an **outdoor grade of 8c+ or above**?

#### Hypotheses

- **Null Hypothesis (H₀):**  Climbing an outdoor grade of 8c+ or above has no effect on combined competition performance.
Mathematically:
  μ₁ = μ₂  *(Where μ₁ = mean lead points for climbers who have climbed 8c+ or above, μ₂ = mean lead points for those who have not.)*

- **Alternative Hypothesis (Hₐ):**  Climbers who have climbed an outdoor grade of 8c+ or above perform better in combined competitions.
Mathematically:
  μ₁ > μ₂

In [9]:
# Filter combined athletes
combined_athletes = df[df['combined_points'] > 0]

# Define the two samples
# Sample 1: Climbers with 8c+ or above
sample_1 = combined_athletes[combined_athletes['count_8c_plus'] > 0]['combined_points']

# Sample 2: Climbers without 8c+ (count_8c_plus is NaN for no ascents)
sample_2 = combined_athletes[combined_athletes['count_8c_plus'].isna()]['combined_points']

# Perform one-tailed t-test (Hₐ: μ₁ > μ₂)
t_stat, p_value = stats.ttest_ind(sample_1, sample_2, equal_var=False, alternative='greater')

print(f"t-statistic: {t_stat:.4f}")
print(f"One-tailed p-value: {p_value:.4f}")

t-statistic: 2.4045
One-tailed p-value: 0.0103


Since the p-value (0.0103) is **less than** the significance level (0.05), we **reject the null hypothesis (H₀)**.

**Conclusion:**  Climbers who have climbed a grade of 8c+ or above perform significantly better in combined competitions.