# Introduction

This Python script merges climbing data from two distinct sources: the 8anu climbing dataset and the IFSC climbers dataset. The purpose of this script is to combine relevant climber information, including performance metrics from both datasets, into a unified CSV file for further analysis.

### Datasets
1. **8anu Climbing Data** (`8anu_climbing_data.csv`): Contains climber names along with their highest recorded grade, count of 8c+ routes climbed, and average grade of their first five ascents.
2. **IFSC Climbers Data** (`ifsc_climbers.csv`): Provides climber details such as country, gender, and competition points across boulder, lead, and combined disciplines.

For more information on how these data were scraped, you can check out the previous notebooks.

### Output
The resulting dataset includes the following columns:
- `name`: Climber's name (as appears in 8anu data)
- `country`: Climber's country (from IFSC data)
- `gender`: Climber's gender (from IFSC data)
- `boulder_points`: Points in boulder discipline (from IFSC data)
- `lead_points`: Points in lead discipline (from IFSC data)
- `combined_points`: Points in combined discipline (from IFSC data)
- `highest_grade`: Highest grade achieved (from 8anu data)
- `count_8c_plus`: Number of 8c+ routes climbed (from 8anu data)
- `avg_grade_first5`: Average grade of first five ascents (from 8anu data)

This merged dataset provides a comprehensive view of climbers' performance across both competitive and outdoor climbing metrics.

Let's start with the imports

In [2]:
import pandas as pd
import os

Next, let's read the datasets `8anu_climbing_data.csv` and `ifsc_climbers.csv`

In [3]:
# Read the datasets
anu_data = pd.read_csv("../data/8anu_data/8anu_climbing_data.csv")
ifsc_data = pd.read_csv("../data/ifsc_data/ifsc_climbers.csv")

# Create a dictionary from IFSC data for faster lookup (case-insensitive)
ifsc_dict = {row['name'].lower(): row for _, row in ifsc_data.iterrows()}

# Prepare the output data
merged_data = []

Now, let's loop through the 8anu dataset, pulling each climber’s name and matching it (case-insensitive) with IFSC data and add it to the merged data.

In [4]:
# Loop through 8anu dataset
for _, anu_row in anu_data.iterrows():
    # Get the name (keep it capitalized as in original)
    name = anu_row['name']

    # Try to find matching IFSC data (case-insensitive)
    ifsc_row = ifsc_dict.get(name.lower())

    # Create new row with all required fields
    new_row = {
        'name': name,
        'country': ifsc_row['country'] if ifsc_row is not None else '',
        'gender': ifsc_row['gender'] if ifsc_row is not None else '',
        'boulder_points': ifsc_row['boulder_points'] if ifsc_row is not None else '',
        'lead_points': ifsc_row['lead_points'] if ifsc_row is not None else '',
        'combined_points': ifsc_row['combined_points'] if ifsc_row is not None else '',
        'highest_grade': anu_row['highest_grade'],
        'count_8c_plus': anu_row['count_8c_plus'],
        'avg_grade_first5': anu_row['avg_grade_first5']
    }

    merged_data.append(new_row)

 Finally, let's convert the merged data into a DataFrame, reorder the columns and save it to a csv.

In [5]:
# Create final DataFrame
result_df = pd.DataFrame(merged_data)

# Define the column order
column_order = ['name', 'country', 'gender', 'boulder_points', 'lead_points',
                'combined_points', 'highest_grade', 'count_8c_plus', 'avg_grade_first5']


output_dir = "../data/"
os.makedirs(output_dir, exist_ok=True)
temp_output_path = os.path.join(output_dir, "merged_climbing_data.csv")
result_df.to_csv(temp_output_path, index=False)

print("Merged dataset saved to 'merged_climbing_data.csv'")

Merged dataset saved to 'merged_climbing_data.csv'


Let's preview the final dataset

In [None]:
print("\nFirst few rows of the result:")
result_df.head()