# Hypothesis Testing on IFSC and 8a.nu Climbing Data

## Introduction

This  explores statistical relationships between outdoor and competition climbing performance using a dataset derived from two key sources: **IFSC** (competition climbing data) and **8a.nu** (outdoor climbing data), which was scraped and merged in the notebooks located in the `01_data_collection` folder. For details on the data collection and merging process, refer to those notebooks. By analyzing this unified dataset, we aim to investigate whether measurable trends or correlations exist between climbers’ achievements across different contexts.

## Objectives

The main goals of this analysis are to test the following hypotheses:

- **Outdoor vs. Competition Performance:**
   Do climbers with stronger outdoor climbing metrics—such as higher maximum grade or having a 8c+ ascent—also perform better in competitions?
   *Separate hypothesis tests will be conducted for each competition discipline: bouldering, lead, and combined.*

- **Interdisciplinary Competition Correlation:**
   Is there a statistically significant correlation between performance in different IFSC competition disciplines (e.g., do high lead scores correlate with high boulder scores)?
- **Outdoor Performance Correlation:**
    Within outdoor climbing, is there a meaningful correlation between the metrics (e.g., do higher average grades correlate with higher best grade climbed)?


In [1]:
import pandas as pd
import scipy.stats as stats

In [2]:
df = pd.read_csv('../data/final_data.csv')

## Hypothesis Test 1: Outdoor Grade 8c+ vs. Competition Performance

Do climbers perform better in **competitions** if they have climbed an **outdoor grade of 8c+ or above**?

#### Hypotheses:

- **$H_0$**: Climbing an outdoor grade of 8c+ or above has no effect on competition performance in a specific discipline.
  μ₁ = μ₂
  *(Where μ₁ = mean competition points in specific discipline (eg. `boulder_points`) for climbers who have climbed 8c+ or above, μ₂ = mean boulder points for those who have not.)*

- **$H_a$**:  Climbers who have climbed an outdoor grade of 8c+ or above perform better on competitions in said specific discipline.
  μ₁ > μ₂

#### Methodology
- **Test Type:** One-tailed, **Two-sample T-Test**
- **Grouping Variable:** Whether the climber has climbed 8c+ or above
- **Dependent Variable:** Competition points (tested separately for 3 disciplines)

This test aims to determine whether there is a statistically significant difference in competition performance between climbers who have reached the 8c+ outdoor benchmark and those who have not.
To conduct this analysis, a Python function is defined and called separately for each discipline to evaluate the hypothesis:

In [3]:
def perform_t_test(df, points_column, alpha=0.05):
    # Filter athletes with non-zero points for the discipline
    athletes = df[df[points_column] > 0]

    # Sample 1: Climbers with 8c+ or above
    sample_1 = athletes[athletes['count_8c_plus'] > 0][points_column]
    # Sample 2: Climbers without 8c+ (count_8c_plus is NaN)
    sample_2 = athletes[athletes['count_8c_plus'].isna()][points_column]

    # Perform one-tailed t-test (Hₐ: μ₁ > μ₂)
    t_stat, p_value = stats.ttest_ind(sample_1, sample_2, equal_var=False, alternative='greater')

    # Print results
    discipline = points_column.replace('_points', '').capitalize()
    print(f"\n{discipline} Discipline:")
    print(f"T-Test p-value: {p_value:.4f}")
    print(f"Sample sizes: With 8c+ = {len(sample_1)}, Without 8c+ = {len(sample_2)}")

    # Interpret results
    if p_value < alpha:
        print(f"Reject H₀: Climbers with 8c+ perform significantly better in {discipline.lower()} discipline.")
    else:
        print(f"Fail to reject H₀: No significant evidence that climbers with 8c+ perform better in {discipline.lower()} discipline.")

In [4]:
# List of disciplines (points columns)
disciplines = ['boulder_points', 'lead_points', 'combined_points']

# Perform correlation test for each discipline
for points_column in disciplines:
    perform_t_test(df, points_column)


Boulder Discipline:
T-Test p-value: 0.0382
Sample sizes: With 8c+ = 41, Without 8c+ = 388
Reject H₀: Climbers with 8c+ perform significantly better in boulder discipline.

Lead Discipline:
T-Test p-value: 0.0105
Sample sizes: With 8c+ = 51, Without 8c+ = 265
Reject H₀: Climbers with 8c+ perform significantly better in lead discipline.

Combined Discipline:
T-Test p-value: 0.0103
Sample sizes: With 8c+ = 32, Without 8c+ = 163
Reject H₀: Climbers with 8c+ perform significantly better in combined discipline.


#### Conclusion:
Overall, having climbed a grade of 8c+ or above appears to be a strong indicator of success in competition climbing. While all three disciplines show statistically significant improvements, the effect is especially pronounced in the **lead** and **combined** categories, suggesting that high-level outdoor climbing experience may translate more directly to these formats than to **bouldering**.


## Hypothesis Test 2: Highest Grade Achieved in Outdoor Climbing vs. Competition Performance

Is there a statistically significant correlation between the **highest outdoor grade** a climber has achieved and their **performance in competition climbing**?

#### Hypotheses:

- **$H_0$**: There is **no correlation** between the highest grade ever climbed in outdoor climbing and competition performance.  ρ = 0
  *(Where ρ is the population correlation coefficient between `highest_grade` and a given competition performance metric, such as `boulder_points`.)*

- **$H_a$**: There **is a correlation** between the highest grade ever climbed in outdoor climbing and competition performance.  ρ ≠ 0

#### Methodology

- **Test Type: Pearson correlation test**
- **Independent Variable:** `highest_grade` (converted to numeric scale while scraping)
- **Dependent Variables:** Competition points (`boulder_points`, `lead_points`, and `combined_points`)

To conduct this analysis, a Python function is defined and called separately for each discipline:


In [5]:
def test_correlation(df, metric1, metric2,alpha=0.05):
    # Filter athletes who has metric 1 and 2 (eg. who compete in boulder and lead)
    athletes = df[(df[metric1] > 0) & (df[metric2] > 0)]

    # Perform Pearson correlation test
    r, p_value = stats.pearsonr(athletes[metric1], athletes[metric2])

    # Print results
    print(f"\n{metric1} vs. {metric2}")
    print(f"Pearson Correlation Coefficient: {r:.4f}")
    print(f"Two-tailed p-value: {p_value:.4e}")
    print(f"Sample size: {len(athletes)}")

    # Interpret results
    if p_value < alpha:
        print(f"Reject H₀: Significant correlation between {metric1} and {metric2}.")
    else:
        print(f"Fail to reject H₀: No significant correlation between {metric1} and {metric2}.")

In [6]:
# List of disciplines (points columns)
disciplines = ['boulder_points', 'lead_points', 'combined_points']

# Perform correlation test for each discipline
for points_column in disciplines:
    test_correlation(df, points_column,'highest_grade')


boulder_points vs. highest_grade
Pearson Correlation Coefficient: 0.2791
Two-tailed p-value: 2.4140e-03
Sample size: 116
Reject H₀: Significant correlation between boulder_points and highest_grade.

lead_points vs. highest_grade
Pearson Correlation Coefficient: 0.1689
Two-tailed p-value: 6.9930e-02
Sample size: 116
Fail to reject H₀: No significant correlation between lead_points and highest_grade.

combined_points vs. highest_grade
Pearson Correlation Coefficient: 0.2915
Two-tailed p-value: 1.0621e-02
Sample size: 76
Reject H₀: Significant correlation between combined_points and highest_grade.


#### Conclusion:

- In the **bouldering and combined** disciplines, a **moderate positive correlation** was found, indicating that climbers who have sent harder outdoor routes tend to perform better in bouldering and combined competitions.
- However, for the **lead** discipline, the correlation (r = 0.1689) was **not statistically significant** at the 0.05 level (p = 0.0699), meaning we cannot confidently conclude a relationship in this case, which is surprising since outdoor sport climbing is most similar to lead climbing.

These results suggest that **outdoor climbing experience—particularly reaching higher grades—may be linked to improved competition performance**, especially in **bouldering** and **combined** formats. However, the absence of significance in lead climbing implies that the relationship may be more complex or non-linear.


## Hypothesis Test 3: Average Outdoor Grade vs. Competition Performance
Is there a statistically significant correlation between a climber's average outdoor climbing grade and their performance in competition climbing?

#### Hypotheses:

- **$H_0$**: There is **no correlation** between the average outdoor climbing grade and competition performance.  ρ = 0
  *(Where ρ is the population correlation coefficient between `avg_grade_first5` and a given competition performance metric, such as `boulder_points`.)*

- **$H_a$**: There **is a correlation** between the average outdoor climbing grade and competition performance.  ρ ≠ 0

#### Methodology

- **Test Type: Pearson correlation test**
- **Independent Variable:** `avg_grade_first5` (converted to numeric scale while scraping)
- **Dependent Variables:** Competition points (`boulder_points`, `lead_points`, and `combined_points`)

The same Python function used in Test 2 (`test_correlation`) is applied here, with the only difference being the outdoor performance metric passed as a parameter. Instead of highest_grade, we now use avg_grade_first5 to evaluate the correlation.

In [7]:
# Perform correlation test for each discipline
for points_column in disciplines:
    test_correlation(df, points_column,'avg_grade_first5')


boulder_points vs. avg_grade_first5
Pearson Correlation Coefficient: 0.3271
Two-tailed p-value: 3.3878e-04
Sample size: 116
Reject H₀: Significant correlation between boulder_points and avg_grade_first5.

lead_points vs. avg_grade_first5
Pearson Correlation Coefficient: 0.1849
Two-tailed p-value: 4.6929e-02
Sample size: 116
Reject H₀: Significant correlation between lead_points and avg_grade_first5.

combined_points vs. avg_grade_first5
Pearson Correlation Coefficient: 0.3209
Two-tailed p-value: 4.7021e-03
Sample size: 76
Reject H₀: Significant correlation between combined_points and avg_grade_first5.


#### Conclusion:
There are statistically significant but moderate correlations between competition performance and average outdoor grade. The strongest relationships appear in bouldering and combined formats, suggesting some alignment between outdoor climbing ability and competition outcomes. However, the relatively modest correlation values indicate that outdoor performance is only one of many factors influencing competition success.


## Hypothesis Test 4: Performance Across IFSC Competition Disciplines

Is there a statistically significant correlation between performance in different **IFSC competition disciplines**?

#### Hypotheses

- **$H_0$**: There is **no correlation** between performance in different competition disciplines.   Mathematically:  ρ = 0
  *(Where ρ is the population correlation coefficient eg. between `boulder_points` and `lead_points`)*

- **$H_a$**:  There **is a correlation** between performance in different competition disciplines. Mathematically:  ρ ≠ 0

#### Methodology

- **Test Type:** Pearson correlation test

This test aims to assess whether success in one competition discipline is predictive of success in another, which could indicate shared skillsets or training transfer across disciplines.
The same `test_correlation` function used in previous tests is applied here, with combinations of competition points in different disciplines (`boulder_points`, `lead_points`,`combined_points`) as parameters.

In [8]:
test_correlation(df, 'boulder_points', 'lead_points')
test_correlation(df, 'boulder_points', 'combined_points')
test_correlation(df, 'lead_points', 'combined_points' )


boulder_points vs. lead_points
Pearson Correlation Coefficient: 0.5940
Two-tailed p-value: 6.8724e-20
Sample size: 194
Reject H₀: Significant correlation between boulder_points and lead_points.

boulder_points vs. combined_points
Pearson Correlation Coefficient: 0.8283
Two-tailed p-value: 5.9989e-50
Sample size: 193
Reject H₀: Significant correlation between boulder_points and combined_points.

lead_points vs. combined_points
Pearson Correlation Coefficient: 0.8671
Two-tailed p-value: 4.9803e-60
Sample size: 194
Reject H₀: Significant correlation between lead_points and combined_points.


#### Conclusion:
The results show strong and statistically significant correlations between all three competition disciplines, suggesting that success in one discipline is strongly predictive of success in others, indicating overlapping skillsets and effective transfer of training across bouldering, lead, and combined formats.



## Hypothesis Test 5: Correlation Between Outdoor Climbing Performance Metrics

Is there a statistically significant correlation between different **outdoor climbing performance metrics**, such as a climber’s **average grade** and their **highest grade**?

#### Hypotheses

- **$H_0$**: There is **no correlation** between outdoor climbing metrics (e.g., average grade and highest grade).  Mathematically: ρ = 0
  *(Where ρ is the population correlation coefficient between `avg_outdoor_grade` and `highest_grade`.)*

- **$H_a$**: There **is a correlation** between outdoor climbing metrics (e.g., average grade and highest grade). Mathematically: ρ ≠ 0

#### Methodology

- **Test Type:** Pearson correlation test

The same `test_correlation` function used in previous tests is applied here, with combinations of outdoor climbing metrics (`highest_grade`, `avg_grade_first5`,`count_8c_plus`) as parameters.


In [9]:
test_correlation(df, 'highest_grade', 'avg_grade_first5')
test_correlation(df, 'highest_grade', 'count_8c_plus')
test_correlation(df, 'avg_grade_first5', 'count_8c_plus')


highest_grade vs. avg_grade_first5
Pearson Correlation Coefficient: 0.9724
Two-tailed p-value: 7.5802e-100
Sample size: 157
Reject H₀: Significant correlation between highest_grade and avg_grade_first5.

highest_grade vs. count_8c_plus
Pearson Correlation Coefficient: 0.6463
Two-tailed p-value: 1.8443e-08
Sample size: 61
Reject H₀: Significant correlation between highest_grade and count_8c_plus.

avg_grade_first5 vs. count_8c_plus
Pearson Correlation Coefficient: 0.5852
Two-tailed p-value: 7.2829e-07
Sample size: 61
Reject H₀: Significant correlation between avg_grade_first5 and count_8c_plus.


#### Conclusion:
The results reveal strong and statistically significant correlations between key outdoor climbing performance metrics. In particular, the exceptionally high correlation between a climber’s highest grade and their average grade over their first five hardest ascents suggests strong consistency between peak and sustained outdoor performance.
