# **IFSC Climbing Scraper**

## **Introduction**

This notebook scrapes the official **IFSC (International Federation of Sport Climbing) rankings** from [IFSC Rankings](https://www.ifsc-climbing.org/rankings/index) using **Selenium and BeautifulSoup**. It extracts the names, total points, and nationalities of climbers across three disciplines:

- **Bouldering** (max 100 points per comp)
- **Lead Climbing** (max 100 points per comp)
- **Combined (Bouldering & Lead)** (max 200 points per comp)

The final dataset consolidates ranking data for both male and female climbers, ensuring each athlete is uniquely recorded with their respective points in different disciplines.

For more details on how IFSC scoring works, refer to [this guide](https://www.redbull.com/int-en/climbing-competition-scoring-guide#:~:text=As%20mentioned%20above%20the%20final,score%20each%20athlete%20can%20achieve).

---

### **Challenges Encountered**

During the development of this scraper, several challenges had to be addressed:

- **Dynamic Page Elements**: Initially, BeautifulSoup alone was used for scraping, but IFSC rankings pages load data dynamically using JavaScript. This required switching to **Selenium** to interact with the page and wait for content to fully load.

- **Cookie Popups**: Some pages display a cookie consent popup that prevents access to rankings. A delay and automated button click were added to bypass this issue.

- **Missing Data**: Some climbers do not have recorded points in certain disciplines. These missing values were replaced with **0**.

- **Bias in Total Points**: The dataset records **total points across all competitions**, which can be misleading. Climbers who have competed in more events will generally have higher points, even if their individual performances were inconsistent.
  - **Potential Improvement**: Scraping the **average rank** of each climber per event would provide a better measure of relative performance. However, this would require navigating additional pages and handling more complex data structures.

---
Let's get to the scraping

#### **Setup and Imports**
Before scraping the data, we need to import the necessary libraries.

- **Selenium**: Used for interacting with the dynamic IFSC rankings website.
- **BeautifulSoup**: Used for parsing the page source once Selenium loads it.
- **Pandas**: Used for structuring and saving the scraped data.
- **OS & Time**: Used for file handling and managing delays in page interactions.


In [1]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import pandas as pd
import os
import time

#### **Web Scraping Function**
To extract ranking data, we define a function that:

- **Loads the IFSC rankings page** using Selenium.
- **Handles cookie popups** to ensure smooth navigation.
- **Clicks the appropriate discipline tab** (Boulder, Lead, or Combined).
- **Extracts climber names, countries, and points** using BeautifulSoup.
- **Handles missing data** (if a climber has no recorded points in a discipline, we assign `0`).

This function will be called separately for each category (Men's and Women's Boulder, Lead, and Combined rankings).


In [2]:
# Scraping IFSC with Points Extraction
def scrape_ifsc_data(url, category_name):
    """Scrape climber names, countries, and points using Selenium with debugging."""
    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service)

    print(f"Scraping {category_name} from {url}")
    driver.get(url)

    # Handle cookie popup with a slight delay
    try:
        time.sleep(1)
        accept_button = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.XPATH, "//button[contains(text(), 'Accept')]"))
        )
        driver.execute_script("arguments[0].click();", accept_button)
        print(f"Accepted cookie popup for {category_name}")
    except Exception as e:
        print(f"No cookie popup found or error accepting it for {category_name}: {e}")

    # Determine discipline from category_name
    if "combined" in category_name:
        discipline = "combined"
        tab_name = "Boulder & Lead"
    elif "lead" in category_name:
        discipline = "lead"
        tab_name = "Lead"
    else:
        discipline = "boulder"
        tab_name = "Boulder"

    # Click the appropriate tab
    try:
        tab = WebDriverWait(driver, 15).until(
            EC.element_to_be_clickable(
                (By.XPATH, f"//a[contains(@class, 'd3-ty-navigation-large') and contains(text(), '{tab_name}')]"))
        )
        tab.click()
        print(f"Clicked '{tab_name}' tab for {category_name}")

        WebDriverWait(driver, 15).until(
            EC.presence_of_element_located((By.CLASS_NAME, "font-normal"))
        )
        print(f"{tab_name} rankings data loaded for {category_name}!")
    except Exception as e:
        print(f"Could not click '{tab_name}' tab or load data for {category_name}: {e}")
        print(f"Attempting to proceed with URL as-is...")

    # Capture page source and parse
    html = driver.page_source
    soup = BeautifulSoup(html, "html.parser")
    driver.quit()

    # Extract climber data
    climbers = []
    rows = soup.find_all("tr")
    debug_printed = False
    for row in rows:
        fname = row.find("span", class_="font-normal")
        sname = row.find("span", class_="font-bold uppercase")
        if fname and sname:  # Only process rows with names
            full_name = f"{fname.text.strip()} {sname.text.strip()}"
            columns = row.find_all("td")
            if len(columns) >= 4:  # Expect picture, name, country, points
                # Country: Third column (index 2)
                country_td = columns[2]
                country_span = country_td.find("span")
                country = country_span.text.strip() if country_span else "N/A"

                # Points: Fourth column (index 3)
                points_td = columns[3]
                points_spans = points_td.find_all("span")  # Get all spans
                points = "0"
                for span in points_spans:
                    text = span.text.strip()
                    if text and any(c.isdigit() for c in text):  # Pick span with numbers
                        points = text
                        break

                # Debug if points are 0 for known climbers
                if points == "0" and full_name in ["jongwon CHON", "anze PEHARC"] and not debug_printed:
                    print(f"Debug for {full_name}:")
                    print(f"  Points TD: {points_td.prettify()[:300]}...")
                    print(f"  Points Spans Found: {[s.text.strip() for s in points_spans]}")
                    debug_printed = True

                # Convert points to float to handle decimals, default to 0 if invalid
                try:
                    points_value = float(points) if points else 0.0
                except ValueError:
                    points_value = 0.0

            else:
                country = "N/A"
                points_value = 0.0
                print(f"Debug: Row for {full_name} has {len(columns)} columns: {row.prettify()[:200]}...")

            climbers.append({
                "name": full_name,
                "country": country,
                f"{discipline}_points": points_value
            })

    print(f"Collected {len(climbers)} climbers for {category_name}")
    return climbers

#### **Merging and Saving the Data**
Once we have scraped rankings for each category, we need to merge them into a single dataset.

- **Combining disciplines**: Climbers may have rankings in Boulder, Lead, and Combined events. We ensure each climber appears once, with separate columns for each discipline’s points.
- **Adding gender information**: We label each climber as "male" or "female" based on their category.
- **Saving to CSV**: The final dataset is stored as a CSV file for further analysis.


In [3]:
# Merging and Saving Data to CSV
def merge_and_save_data(men_boulder, men_lead, men_combined, women_boulder, women_lead, women_combined):
    """Merge all climber data into one dataset with gender attribute and save to CSV."""
    os.makedirs(os.path.join("../data", "ifsc_data"), exist_ok=True)

    def merge_gender_data(boulder_data, lead_data, combined_data, gender):
        climbers_dict = {}
        for climber in boulder_data:
            climbers_dict[climber["name"]] = {
                "name": climber["name"],
                "country": climber["country"],
                "gender": gender,
                "boulder_points": climber["boulder_points"],
                "lead_points": 0.0,
                "combined_points": 0.0
            }
        for climber in lead_data:
            if climber["name"] in climbers_dict:
                climbers_dict[climber["name"]]["lead_points"] = climber["lead_points"]
            else:
                climbers_dict[climber["name"]] = {
                    "name": climber["name"],
                    "country": climber["country"],
                    "gender": gender,
                    "boulder_points": 0.0,
                    "lead_points": climber["lead_points"],
                    "combined_points": 0.0
                }
        for climber in combined_data:
            if climber["name"] in climbers_dict:
                climbers_dict[climber["name"]]["combined_points"] = climber["combined_points"]
            else:
                climbers_dict[climber["name"]] = {
                    "name": climber["name"],
                    "country": climber["country"],
                    "gender": gender,
                    "boulder_points": 0.0,
                    "lead_points": 0.0,
                    "combined_points": climber["combined_points"]
                }
        return list(climbers_dict.values())

    # Merge data for men and women with gender attribute
    men_data = merge_gender_data(men_boulder, men_lead, men_combined, "male")
    women_data = merge_gender_data(women_boulder, women_lead, women_combined, "female")

    # Combine men and women data
    all_climbers = men_data + women_data

    # Define columns including gender
    columns = ["name", "country", "gender", "boulder_points", "lead_points", "combined_points"]
    climbers_df = pd.DataFrame(all_climbers, columns=columns)

    # Save to single CSV file
    filepath = os.path.join("../data", "ifsc_data", "ifsc_climbers.csv")
    climbers_df.to_csv(filepath, index=False)

    print(f"Saved {len(all_climbers)} unique climbers (men and women) to {filepath}")

Now that we have defined our scraping and data processing functions, we can:

- **Run the scraper** for each category (Men’s and Women’s Boulder, Lead, and Combined).
- **Merge the results** into a single dataset.
- **Save the dataset** as a CSV file for further analysis.

In [4]:
# Running the Scraper
categories = [
    ("boulder_men", "https://www.ifsc-climbing.org/rankings/index?discipline=boulder&category=men"),
    ("boulder_women", "https://www.ifsc-climbing.org/rankings/index?discipline=boulder&category=women"),
    ("lead_men", "https://www.ifsc-climbing.org/rankings/index?discipline=lead&category=men"),
    ("lead_women", "https://www.ifsc-climbing.org/rankings/index?discipline=lead&category=women"),
    ("combined_men", "https://www.ifsc-climbing.org/rankings/index?discipline=boulder-lead&category=men"),
    ("combined_women", "https://www.ifsc-climbing.org/rankings/index?discipline=boulder-lead&category=women")
]

print("Starting scraping process...")
men_boulder = scrape_ifsc_data(categories[0][1], categories[0][0])
women_boulder = scrape_ifsc_data(categories[1][1], categories[1][0])
men_lead = scrape_ifsc_data(categories[2][1], categories[2][0])
women_lead = scrape_ifsc_data(categories[3][1], categories[3][0])
men_combined = scrape_ifsc_data(categories[4][1], categories[4][0])
women_combined = scrape_ifsc_data(categories[5][1], categories[5][0])
merge_and_save_data(men_boulder, men_lead, men_combined, women_boulder, women_lead, women_combined)
print("Scraping and merging process completed!")

Starting scraping process...
Scraping boulder_men from https://www.ifsc-climbing.org/rankings/index?discipline=boulder&category=men
Accepted cookie popup for boulder_men
Clicked 'Boulder' tab for boulder_men
Boulder rankings data loaded for boulder_men!
Collected 202 climbers for boulder_men
Scraping boulder_women from https://www.ifsc-climbing.org/rankings/index?discipline=boulder&category=women
Accepted cookie popup for boulder_women
Clicked 'Boulder' tab for boulder_women
Boulder rankings data loaded for boulder_women!
Collected 179 climbers for boulder_women
Scraping lead_men from https://www.ifsc-climbing.org/rankings/index?discipline=lead&category=men
Accepted cookie popup for lead_men
Clicked 'Lead' tab for lead_men
Lead rankings data loaded for lead_men!
Collected 170 climbers for lead_men
Scraping lead_women from https://www.ifsc-climbing.org/rankings/index?discipline=lead&category=women
Accepted cookie popup for lead_women
Clicked 'Lead' tab for lead_women
Lead rankings data 

The final dataset contains information about competitive climbers from the IFSC rankings. Each row represents a unique climber with the following attributes:

- **name**: The full name of the climber.
- **country**: The climber's country of representation.
- **gender**: Male or female.
- **boulder_points**: Total points earned in bouldering
- **lead_points**: Total points earned in lead climbing
- **combined_points**: Total points earned in the combined event

As a last step, we can preview the first few rows of our IFSC dataset

In [6]:
# Load the saved dataset to preview the results
df = pd.read_csv("../data/ifsc_data/ifsc_climbers.csv")
df.head()

Unnamed: 0,name,country,gender,boulder_points,lead_points,combined_points
0,sorato ANRAKU,JPN,male,3640.0,2971.0,6313.0
1,dohyun LEE,KOR,male,3183.0,2343.0,4320.0
2,meichi NARASAKI,JPN,male,2860.0,0.0,0.0
3,tomoa NARASAKI,JPN,male,2849.0,765.0,3600.0
4,sohta AMAGASA,JPN,male,2619.0,0.0,0.0
