# Introduction

This Python script merges climbing data from two distinct sources: the 8anu climbing dataset and the IFSC climbers dataset. The purpose of this script is to combine relevant climber information, including performance metrics from both datasets, into a unified CSV file for further analysis.

### Datasets
1. **8anu Climbing Data** (`8anu_climbing_data.csv`): Contains climber names along with their highest recorded grade, count of 8c+ routes climbed, and average grade of their first five ascents.
2. **IFSC Climbers Data** (`ifsc_climbers.csv`): Provides climber details such as country, gender, and competition points across boulder, lead, and combined disciplines.

For more information on how these data were scraped, you can check out the previous notebooks.

### Output
The resulting dataset includes the following columns:
- `name`: Climber's name (as appears in 8anu data)
- `country`: Climber's country (from IFSC data)
- `gender`: Climber's gender (from IFSC data)
- `boulder_points`: Points in boulder discipline (from IFSC data)
- `lead_points`: Points in lead discipline (from IFSC data)
- `combined_points`: Points in combined discipline (from IFSC data)
- `highest_grade`: Highest grade achieved (from 8anu data)
- `count_8c_plus`: Number of 8c+ routes climbed (from 8anu data)
- `avg_grade_first5`: Average grade of first five ascents (from 8anu data)

This merged dataset provides a comprehensive view of climbers' performance across both competitive and outdoor climbing metrics.

Let's start with the imports

In [6]:
import pandas as pd
import os

Next, let's read the datasets `8anu_climbing_data.csv` and `ifsc_climbers.csv`

In [7]:
# Read the datasets
anu_data = pd.read_csv("../data/8anu_data/8anu_climbing_data.csv")
ifsc_data = pd.read_csv("../data/ifsc_data/ifsc_climbers.csv")

# Create a dictionary from 8anu data for faster lookup (case-insensitive)
anu_dict = {row['name'].lower(): row for _, row in anu_data.iterrows()}

# Prepare the output data
merged_data = []

Now, let's loop through the 8anu dataset, pulling each climber’s name and matching it (case-insensitive) with IFSC data and add it to the merged data.

In [8]:
# Loop through IFSC dataset
for _, ifsc_row in ifsc_data.iterrows():
    # Get the name (keep it as in IFSC data)
    name = ifsc_row['name']

    # Try to find matching 8anu data (case-insensitive)
    anu_row = anu_dict.get(name.lower())

    # Create new row with all required fields
    new_row = {
        'name': name,
        'country': ifsc_row['country'] if pd.notna(ifsc_row['country']) else '',
        'gender': ifsc_row['gender'] if pd.notna(ifsc_row['gender']) else '',
        'boulder_points': ifsc_row['boulder_points'] if pd.notna(ifsc_row['boulder_points']) else '',
        'lead_points': ifsc_row['lead_points'] if pd.notna(ifsc_row['lead_points']) else '',
        'combined_points': ifsc_row['combined_points'] if pd.notna(ifsc_row['combined_points']) else '',
        'highest_grade': anu_row['highest_grade'] if anu_row is not None else 0,
        'count_8c_plus': anu_row['count_8c_plus'] if anu_row is not None else 0,
        'avg_grade_first5': anu_row['avg_grade_first5'] if anu_row is not None else 0
    }

    merged_data.append(new_row)

 Finally, let's convert the merged data into a DataFrame, reorder the columns and save it to a csv.

In [9]:
# Create final DataFrame
result_df = pd.DataFrame(merged_data)

# Define the column order
column_order = ['name', 'country', 'gender', 'boulder_points', 'lead_points',
                'combined_points', 'highest_grade', 'count_8c_plus', 'avg_grade_first5']


output_dir = "../data/"
os.makedirs(output_dir, exist_ok=True)
temp_output_path = os.path.join(output_dir, "merged_climbing_data.csv")
result_df.to_csv(temp_output_path, index=False)

print("Merged dataset saved to 'merged_climbing_data.csv'")

Merged dataset saved to 'merged_climbing_data.csv'


Let's preview the final dataset

In [12]:
print("\nFirst few rows of the result:")
result_df.head(20)


First few rows of the result:


Unnamed: 0,name,country,gender,boulder_points,lead_points,combined_points,highest_grade,count_8c_plus,avg_grade_first5
0,sorato ANRAKU,JPN,male,3640.0,2971.0,6313.0,0,0,0.0
1,dohyun LEE,KOR,male,3183.0,2343.0,4320.0,0,0,0.0
2,meichi NARASAKI,JPN,male,2860.0,0.0,0.0,0,0,0.0
3,tomoa NARASAKI,JPN,male,2849.0,765.0,3600.0,0,0,0.0
4,sohta AMAGASA,JPN,male,2619.0,0.0,0.0,0,0,0.0
5,toby ROBERTS,GBR,male,2365.0,3380.0,6490.0,0,0,0.0
6,sam AVEZOU,FRA,male,2118.0,3059.0,3898.0,0,0,0.0
7,maximillian MILNE,GBR,male,1879.7,490.0,2061.7,0,0,0.0
8,jongwon CHON,KOR,male,1506.5,267.5,1534.5,0,0,0.0
9,manuel CORNU,FRA,male,1444.5,0.0,0.0,0,0,0.0
