# **8a.nu Climber Profile Scraper**

## **Introduction**

This notebook identifies and saves [8a.nu](https://www.8a.nu/) profiles of IFSC competition climbers, whose data was scraped in the previous notebook (**1.1_ifsc_scraper**). Using Selenium for web automation, it searches for climbers names directly via 8a.nu’s user search URL and extracts the most relevant profile based on name similarity. To improve accuracy, it employs **fuzzy string matching** (`fuzzywuzzy`) to account for variations in names and common nicknames.

 To ensure meaningful results, only profiles with recorded ascents are considered. The final data, including climber names and the highest-probability profile links, is saved to a CSV file for further analysis.

---

## **Challenges Encountered**

**Direct URL-Based Search Instead of Input-Based Search:**
   - Instead of relying on the website's search bar, the script queries the 8a.nu user search page directly, reducing the risk of automation blocks and ensuring faster navigation.

**Non-Exact Name Matching:**
   - Climbers' usernames on 8a.nu often differ from their official competition names, requiring a more flexible matching approach.
   - To handle variations, the script uses **fuzzy string matching (`fuzzywuzzy`)** with a **90% similarity threshold**, selecting only the most probable match.

**Nicknames Used as Usernames:**
   - Some climbers use nicknames instead of their full names (e.g., *Alexander Megos* vs. *Alex Megos*).
   - To improve accuracy, the script includes a **nickname lookup table** that accounts for common name variations.

**Profiles Without Ascent Data:**
   - Some climbers have 8a.nu accounts but no recorded ascents, making their profiles irrelevant for performance analysis.
   - The script **automatically discards profiles with zero ascents**, ensuring that only active climbers are considered.

---



#### **Setup and Imports**
Before automating the search, we need to import the necessary libraries.

- **Selenium**: Used for interacting with the dynamic 8a.nu page.
- **BeautifulSoup**: Used for parsing the page source once Selenium loads it.
- **Pandas**: Used for structuring and saving the scraped data.
- **OS & Time**: Used for file handling and managing delays in page interactions.


In [1]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import pandas as pd
import os
import urllib.parse
from fuzzywuzzy import fuzz

Next, we’ll define a function to preprocess the climbers' names from the ifsc_climbers.csv file, formatting them into a more searchable form by capitalizing the first letter of each word.

In [2]:
def capitalize_name(name):
    return " ".join(word.capitalize() for word in name.split())

And let's create a lookup table for known nicknames

In [3]:
NICKNAME_LOOKUP = {
    "Nikolay Rusev": ["Niki Rusev"],
    "Alexander Megos": ["Alex Megos"],
}

Next let's define a function that will add the climber usernames to the csv incrementally

In [4]:
def append_to_csv(data, output_file):
    """Append a single row to the CSV file."""
    df = pd.DataFrame([data])
    if os.path.exists(output_file):
        df.to_csv(output_file, mode='a', header=False, index=False)
    else:
        df.to_csv(output_file, mode='w', header=True, index=False)

#### Climber Profile Search Function
To identify a climber's profile on 8a.nu, let's define a function that:
- **Constructs the search URL** and navigates to it using Selenium.
- **Parses the search results** with BeautifulSoup to extract profile links.
- **Calculates name similarity** using fuzzy matching and checks for known nicknames.
- **Filters candidates** to include only profiles with at least one recorded ascent.
- **Selects the most probable profile** based on similarity score and country match.

This function will be called separately for each climber.

In [5]:
def search_8a_nu(climber_name, country, driver, output_file, similarity_threshold=90):
    """Search 8a.nu for a climber and save the highest probable profile link with ascents to CSV."""
    # Capitalize only first letters
    climber_name = capitalize_name(climber_name)
    encoded_name = urllib.parse.quote(climber_name)
    search_url = f"https://www.8a.nu/search/users?query={encoded_name}"
    print(f"Searching for {climber_name} from {country} on 8a.nu...")

    try:
        # Navigate directly to the search URL
        driver.get(search_url)

        # Wait for search results table rows to load
        WebDriverWait(driver, 15).until(
            EC.presence_of_element_located((By.TAG_NAME, "tr"))
        )

        # Parse the search results page
        soup = BeautifulSoup(driver.page_source, "html.parser")

        # Find climber profile links with ascent counts
        candidates = []
        result_rows = soup.find_all("tr")
        nicknames = NICKNAME_LOOKUP.get(climber_name, [])  # Get possible nicknames
        for row in result_rows:
            name_link = row.find("a", href=lambda href: href and "/user/" in href)
            if name_link:
                link_text = name_link.text.strip()
                link_href = name_link["href"]
                # Calculate similarity with IFSC name or check nicknames
                similarity = fuzz.partial_ratio(climber_name.lower(), link_text.lower())
                is_nickname = any(nick.lower() in link_text.lower() for nick in nicknames)
                if similarity >= similarity_threshold or is_nickname:
                    # Extract country from the row
                    country_td = row.find("td", class_="col-flag")
                    found_country = "N/A"
                    country_code = "N/A"
                    if country_td:
                        country_text = country_td.text.strip()
                        found_country = country_text if country_text else "N/A"
                        country_span = country_td.find("span", class_=lambda x: x and x.startswith("f-"))
                        if country_span and country_span.get("class"):
                            try:
                                country_code = country_span["class"][0].split("-")[1].upper()
                            except (IndexError, AttributeError):
                                country_code = "N/A"

                    # Extract ascent count
                    ascent_td = row.find("td", class_="col-ascents")
                    ascent_count = 0
                    if ascent_td:
                        ascent_text = ascent_td.text.strip().replace(" ", "")
                        try:
                            ascent_count = int(ascent_text) if ascent_text else 0
                        except ValueError:
                            ascent_count = 0

                    # Only include profiles with ascents > 0
                    full_url = f"https://www.8a.nu{link_href}"
                    if ascent_count > 0:
                        candidates.append({
                            "url": full_url,
                            "name": link_text,
                            "similarity": similarity if not is_nickname else 100,  # Nicknames get max score
                            "country": found_country,
                            "country_code": country_code,
                            "ascents": ascent_count,
                            "verified": country == found_country or country == country_code
                        })
                        print(f"Found candidate profile for {climber_name}: {full_url} (Name: {link_text}, Similarity: {similarity}%, Country: {found_country}, Ascents: {ascent_count})")

        # Select the highest probable profile
        profile_link = None
        if candidates:
            best_candidate = max(candidates, key=lambda x: x["similarity"])  # Highest similarity
            profile_link = best_candidate["url"]
            if best_candidate["verified"]:
                print(f"Selected verified profile: {profile_link} (Similarity: {best_candidate['similarity']}%)")
            else:
                print(f"Selected potential profile: {profile_link} (Similarity: {best_candidate['similarity']}%, expected country: {country})")

        # Only save to CSV if a profile with ascents is found
        if profile_link:
            data = {"name": climber_name, "possible_profile_link_1": profile_link}
            print(f"Selected 1 profile with ascents for {climber_name}")
            append_to_csv(data, output_file)
            return data
        else:
            print(f"No profile with ascents found for {climber_name}")
            return None

    except Exception as e:
        print(f"Error searching for {climber_name}: {e}")
        return None

Now that we have defined our searching and data processing functions, we can:

- **Process the automated search** for each climber and save the useernames of climbers for scraping later.

In [6]:
def process_profiles(ifsc_dir="../data/ifsc_data", output_dir="../data/8anu_data"):
    """Process IFSC climbers and save their highest probable 8a.nu profile link with ascents to CSV."""
    climbers_df = pd.read_csv(f"{ifsc_dir}/ifsc_climbers.csv")

    output_file = f"{output_dir}/8a_nu_profiles.csv"
    os.makedirs(output_dir, exist_ok=True)

    # Initialize a single WebDriver instance
    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service)

    total_climbers = len(climbers_df)
    processed_with_profiles = 0
    try:
        for index, row in climbers_df.iterrows():
            climber_name = row["name"]
            country = row["country"]
            result = search_8a_nu(climber_name, country, driver, output_file)
            if result:
                processed_with_profiles += 1
            print(f"Processed {index + 1}/{total_climbers} climbers ({processed_with_profiles} with profiles and ascents)")
    finally:
        driver.quit()  # Ensure driver closes even if an error occurs

Let's run the script

In [7]:
print("Starting 8a.nu scraping process...")
process_profiles()
print("8a.nu scraping process completed!")

Starting 8a.nu scraping process...
Searching for Sorato Anraku from JPN on 8a.nu...
No profile with ascents found for Sorato Anraku
Processed 1/517 climbers (0 with profiles and ascents)
Searching for Dohyun Lee from KOR on 8a.nu...
No profile with ascents found for Dohyun Lee
Processed 2/517 climbers (0 with profiles and ascents)
Searching for Meichi Narasaki from JPN on 8a.nu...
No profile with ascents found for Meichi Narasaki
Processed 3/517 climbers (0 with profiles and ascents)
Searching for Tomoa Narasaki from JPN on 8a.nu...
No profile with ascents found for Tomoa Narasaki
Processed 4/517 climbers (0 with profiles and ascents)
Searching for Sohta Amagasa from JPN on 8a.nu...
No profile with ascents found for Sohta Amagasa
Processed 5/517 climbers (0 with profiles and ascents)
Searching for Toby Roberts from GBR on 8a.nu...
Found candidate profile for Toby Roberts: https://www.8a.nu/user/toby-roberts-e1619 (Name: Toby  Roberts, Similarity: 92%, Country: GBR, Ascents: 4)
Selected

Finally, we can preview our dataset, which contains climber names and their 8a.nu profile URLs

In [8]:
# Load the saved dataset to preview the results
df = pd.read_csv("../data/8anu_data/8a_nu_profiles.csv")
df.head()

Unnamed: 0,name,profile_url
0,Toby Roberts,https://www.8a.nu/user/toby-roberts-e1619
1,Anze Peharc,https://www.8a.nu/user/ane-peharc
2,Hannes Van Duysen,https://www.8a.nu/user/hannes-van-duysen
3,Jakob Schubert,https://www.8a.nu/user/jakob-schubert
4,Mejdi Schalck,https://www.8a.nu/user/mejdi-schalck
