# Coach Level Scraping

As a first draft, the following codes extracts the data from the super league game  **Servette FC - Lugano (23.12.2023, Result = 2:2)**

In [1]:
import time
import os
import re
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException

# Specify the path to the directory containing the ChromeDriver executable
chrome_driver_directory = "C:/Users/moren/Downloads/chromedriver-win64" #insert your own path here #User moreno: 'moren'

# Add the ChromeDriver directory to the PATH environment variable
os.environ["PATH"] += os.pathsep + chrome_driver_directory

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


## Line-Ups per Game

NOCH ZU MACHEN:
- MySQL Verbindung o. csv-datei

**Page Link:** https://www.transfermarkt.com/servette-fc_fc-lugano/aufstellung/spielbericht/4089797 

**Description:** Shows Line-Up of the team and counter team, its substitudes as well as different statistics such as average age, market value of them.



We aim to extract the following attributes for each Game:

Table **lineups_df**
- Position (GK, CB, RM, CF, etc.)
- Player Name
- Player Age
- Market Value (in Euros)
- Club
- H/A (Home Team / Away Team)
- Status (Starting 11 or Subsititude)

Table **lineups_stats_df**
- Club
- H/A (Home Team / Away Team)
- Manager
- Foreigners Starting (Amount of Foreigners in Starting LineUp)
- Foreigners Subs (Amount of Foreigners as Subs)
- Avg Age Starting
- Avg Age Subs
- Purchase Value Starting (Aggregated value of players in Starting LineUp that have been purchased by Club [in EUR])
- Purchase Value Subs (Aggregated value of players as Subs that have been purchased by Club [in EUR])
- Total Market Value Starting
- Total Market Value Subs

In [6]:
## PAGE NAVIGATION ##
# Initialize the Chrome driver
driver = webdriver.Chrome()

# Navigate to the tm page
driver.get('https://www.transfermarkt.com/servette-fc_fc-lugano/aufstellung/spielbericht/4089797') 

# Wait for page to load
time.sleep(2) 

# Wait for the iframe to be present and switch to it
wait = WebDriverWait(driver, 10)
iframe = wait.until(EC.presence_of_element_located((By.ID, "sp_message_iframe_953358")))
driver.switch_to.frame(iframe)

# Now wait for the 'Accept & continue' button to be clickable inside the iframe
accept_button = wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(@class, 'accept')]")))
accept_button.click()

# Switch back to the main document
driver.switch_to.default_content()

## SCRAPING ## 

# Extract the home and away club names from the 'title' attribute
home_club_name = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[1]/div/div/div[2]/div[1]/a[2]').get_attribute("title")
away_club_name = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[1]/div/div/div[2]/div[3]/a[2]').get_attribute("title")

# Function to extract data from a table given its rows
def extract_table_data(table_rows, club_name):
    positions = []
    players = []
    ages = []
    market_values = []
    club_names = [club_name] * (len(table_rows) // 3)  # There's a club name for each player row
    
    for i in range(0, len(table_rows), 3):  # Increment by 3 for each player's data set
        cells = table_rows[i].find_elements(By.TAG_NAME, "td")
        player_info = cells[1].text
        name_age_parts = player_info.split(' (')
        player_name = name_age_parts[0].strip()
        age_part = name_age_parts[1] if len(name_age_parts) > 1 else ''
        age_match = re.search(r'(\d+) years old', age_part)
        age = age_match.group(1) if age_match else None

        position_market_value = cells[4].text
        if ', ' in position_market_value:
            position, market_value = position_market_value.split(', ')
        else:
            position = position_market_value
            market_value = None
        
        players.append(player_name)
        ages.append(age)
        positions.append(position)
        market_values.append(market_value)
    
    return pd.DataFrame({
        'Position': positions,
        'Player': players,
        'Age': ages,
        'Market Value': market_values,
        'Club': club_names 
    })

all_tables_df = []

# XPath or CSS Selector for each table
tables_xpaths = {
    'starting_lineup_home': '//*[@id="main"]/main/div[5]/div[1]/div/div[1]/table', 
    'substitutes_home': '//*[@id="main"]/main/div[6]/div[1]/div/div[1]/table',
    'starting_lineup_away': '//*[@id="main"]/main/div[5]/div[2]/div/div[1]/table',
    'substitutes_away': '//*[@id="main"]/main/div[6]/div[2]/div/div[1]/table'
}

all_tables_df = []

# Loop through the table paths and extract data
for key, value in tables_xpaths.items():
    table = driver.find_element(By.XPATH, value)
    rows = table.find_elements(By.TAG_NAME, "tr")
    team_type = 'Home' if 'home' in key else 'Away'
    club_name = home_club_name if 'home' in key else away_club_name
    df = extract_table_data(rows, club_name)
    df['H/A'] = team_type
    df['Status'] = 'Starting' if 'starting' in key else 'Substitute'
    all_tables_df.append(df)

# Combine all dataframes
lineups_df = pd.concat(all_tables_df, ignore_index=True)

# Convert 'Age' to int, handling missing or malformed data
lineups_df['Age'] = pd.to_numeric(lineups_df['Age'], errors='coerce').astype('Int64')


# Extract the home and away club names
home_club_name_element = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[1]/div/div/div[2]/div[1]/a[2]')
home_club_name = home_club_name_element.get_attribute("title")
away_club_name_element = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[1]/div/div/div[2]/div[3]/a[2]')
away_club_name = away_club_name_element.get_attribute("title")

# Extract the home and away managers' names using the updated XPaths
home_manager_element = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[6]/div[1]/div/div/table/tbody/tr/td[1]/table/tbody/tr[1]/td[2]')
home_manager_name = home_manager_element.text
away_manager_element = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[6]/div[2]/div/div/table/tbody/tr/td[1]/table/tbody/tr[1]/td[2]')
away_manager_name = away_manager_element.text

# Extract additional information for both home and away teams
foreigners_starting_home = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[4]/div[1]/div/div[2]/table/tbody/tr/td[1]').text
foreigners_subs_home = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[5]/div[1]/div/div[2]/table/tbody/tr/td[1]').text
avg_age_starting_home = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[4]/div[1]/div/div[2]/table/tbody/tr/td[2]').text
avg_age_subs_home = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[5]/div[1]/div/div[2]/table/tbody/tr/td[2]').text
purchase_value_starting_home = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[4]/div[1]/div/div[2]/table/tbody/tr/td[3]').text
purchase_value_subs_home = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[5]/div[1]/div/div[2]/table/tbody/tr/td[3]').text
total_market_value_starting_home = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[4]/div[1]/div/div[2]/table/tbody/tr/td[4]').text
total_market_value_subs_home = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[5]/div[1]/div/div[2]/table/tbody/tr/td[4]').text

foreigners_starting_away = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[4]/div[2]/div/div[2]/table/tbody/tr/td[1]').text
foreigners_subs_away = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[5]/div[2]/div/div[2]/table/tbody/tr/td[1]').text
avg_age_starting_away = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[4]/div[2]/div/div[2]/table/tbody/tr/td[2]').text
avg_age_subs_away = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[5]/div[2]/div/div[2]/table/tbody/tr/td[2]').text
purchase_value_starting_away = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[4]/div[2]/div/div[2]/table/tbody/tr/td[3]').text
purchase_value_subs_away = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[5]/div[2]/div/div[2]/table/tbody/tr/td[3]').text
total_market_value_starting_away = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[4]/div[2]/div/div[2]/table/tbody/tr/td[4]').text
total_market_value_subs_away = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[5]/div[2]/div/div[2]/table/tbody/tr/td[4]').text

# Function to clean the extracted data by removing preceding text
def clean_data(text, keep_eur_sign=False):
    if keep_eur_sign:
        # Directly slice away the preceding text if it follows a known pattern
        if 'Purchase value: ' in text:
            return text.replace('Purchase value: ', '')
        elif 'Total MV: ' in text:
            return text.replace('Total MV: ', '')
    else:
        # Using regex to find numeric values or percentages and return them for other columns
        match = re.search(r'\d+(\.\d+)?%?', text)
        return match.group(0) if match else text

# Create a DataFrame for the club and manager information along with the newly extracted data
lineups_stats_df = pd.DataFrame({
    'Club': [home_club_name, away_club_name],
    'H/A': ['Home', 'Away'],
    'Manager': [home_manager_name, away_manager_name],
    'Foreigners Starting': [clean_data(foreigners_starting_home), clean_data(foreigners_starting_away)],
    'Foreigners Subs': [clean_data(foreigners_subs_home), clean_data(foreigners_subs_away)],
    'Avg Age Starting': [clean_data(avg_age_starting_home), clean_data(avg_age_starting_away)],
    'Avg Age Subs': [clean_data(avg_age_subs_home), clean_data(avg_age_subs_away)],
    'Purchase Value Starting': [clean_data(purchase_value_starting_home, True), clean_data(purchase_value_starting_away, True)],
    'Purchase Value Subs': [clean_data(purchase_value_subs_home, True), clean_data(purchase_value_subs_away, True)],
    'Total Market Value Starting': [clean_data(total_market_value_starting_home, True), clean_data(total_market_value_starting_away, True)],
    'Total Market Value Subs': [clean_data(total_market_value_subs_home, True), clean_data(total_market_value_subs_away, True)]
})

## WRAP UP ##

# Close the driver after scraping is done
driver.quit()

# Print a success message
print("Webscraping successfully completed")

# Display the combined DataFrame
#lineups_df.head()
#lineups_stats_df.head()


Webscraping successfully completed


From the scraping we get the following two table **lineups_df** and **stats_df**:

In [7]:
lineups_df

Unnamed: 0,Position,Player,Age,Market Value,Club,H/A,Status
0,Goalkeeper,Léo Besson,21.0,€100k,Servette FC,Home,Starting
1,Centre-Back,Anthony Baron,30.0,€300k,Servette FC,Home,Starting
2,Centre-Back,Nicolas Vouilloz,22.0,€2.00m,Servette FC,Home,Starting
3,Right-Back,Théo Magnin,20.0,€100k,Servette FC,Home,Starting
4,Central Midfield,Samba Diba,19.0,€300k,Servette FC,Home,Starting
5,Right Midfield,Bendegúz Bolla,24.0,€1.80m,Servette FC,Home,Starting
6,Left Winger,Hussayn Touati,22.0,€400k,Servette FC,Home,Starting
7,Left Winger,Tiemoko Ouattara,18.0,€300k,Servette FC,Home,Starting
8,Centre-Forward,Jérémy Guillemenot,25.0,€1.00m,Servette FC,Home,Starting
9,,,,,Servette FC,Home,Substitute


In [8]:
lineups_stats_df

Unnamed: 0,Club,H/A,Manager,Foreigners Starting,Foreigners Subs,Avg Age Starting,Avg Age Subs,Purchase Value Starting,Purchase Value Subs,Total Market Value Starting,Total Market Value Subs
0,Servette FC,Home,René Weiler,7,4,28.1,22.9,€420k,€500k,€14.20m,€6.30m
1,FC Lugano,Away,Mattia Croci-Torti,6,3,25.6,22.6,€7.77m,€475k,€12.20m,€5.01m


## Matchsheet (evtl)

Page Link: https://www.transfermarkt.com/servette-fc_fc-lugano/index/spielbericht/4089797

Description: Shows events such as Goals, Substitutions and Cards as well


We aim to extract the following attributes for each Game:

Table **events_df**
- Club
- H/A (Home Team / Away Team)
- Timestamp (of event in the game)*
- Event (Goal, Substitution, Card)
- Player Event (Name of the player relevant to the event)
- Remark Event (additional information)
- Player Assist (name of player if event = goal)
- Player Out (player substituted out if event = substitution)

**Events in overtime (45'+ and 90'+) get the timestamps 45' or 90' (not exact time such as 94' or 45' + 2')*


In [9]:
## PAGE NAVIGATION ##
# Initialize the Chrome driver
driver = webdriver.Chrome()

# Navigate to the tm page
driver.get('https://www.transfermarkt.com/servette-fc_fc-lugano/index/spielbericht/4089797') 

# Wait for page to load
time.sleep(2) 

# Wait for the iframe to be present and switch to it
wait = WebDriverWait(driver, 10)
iframe = wait.until(EC.presence_of_element_located((By.ID, "sp_message_iframe_953358")))
driver.switch_to.frame(iframe)

# Now wait for the 'Accept & continue' button to be clickable inside the iframe
accept_button = wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(@class, 'accept')]")))
accept_button.click()

# Switch back to the main document
driver.switch_to.default_content()

## SCRAPING ## 

# Extracting club names
home_club_name = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[5]/div/div/div[1]/div[1]/div[2]/nobr/a').get_attribute("title")
away_club_name = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[5]/div/div/div[2]/div[1]/div[2]/nobr/a').get_attribute("title")

# Function to convert pixel values to minutes based on the pattern provided
def convert_px_to_minute(x_px, y_px):
    # Remove any non-numeric characters and convert to integer
    x_px = int(re.sub(r'[^\d-]', '', str(x_px)))
    y_px = int(re.sub(r'[^\d-]', '', str(y_px)))
    
    # Convert negative values to positive
    x_px = abs(x_px)
    y_px = abs(y_px)
    
    unit_minutes = (x_px // 36) + 1
    ten_minutes = (y_px // 36) * 10
    timestamp = f"{unit_minutes + ten_minutes}'"
    return timestamp


def extract_px_from_style(style_str):
    # Use regular expression to find all pixel values in the style string
    px_values = re.findall(r'-?\d+px', style_str)  # Include optional minus sign
    
    # Check if there are at least two pixel values
    if len(px_values) >= 2:
        x_px, y_px = [int(px.strip('px')) for px in px_values[:2]]  # Take the first two values
        return x_px, y_px
    else:
        # Handle the case when there are not enough values
        return None, None  # You can return None or some default values


# Function to extract events with Remark Event adjustment
def extract_events(event_type_xpath, event_type, home_club_name, away_club_name):
    events_list = driver.find_element(By.XPATH, event_type_xpath)
    events_items = events_list.find_elements(By.TAG_NAME, "li")
    events_data = []

    for item in events_items:
        team = "Home" if "heim" in item.get_attribute("class") else "Away"
        club = home_club_name if team == "Home" else away_club_name

        # Extract the style attribute for timestamp
        style_str = item.find_element(By.XPATH, ".//div/div[1]/span").get_attribute("style")
        x_px, y_px = extract_px_from_style(style_str)
        timestamp = convert_px_to_minute(x_px, y_px)

        player_event = "N/A"  # Default value if player name is not found
        player_out = None  # Initialize player_out to None
        remark_event = ""  # Initialize remark_event to empty string
        player_assist = None  # Ensure this variable is also initialized

        try:
            player_event_element = None
            full_text = item.find_element(By.XPATH, ".//div/div[4]").text.strip()
            if event_type == "Substitution":
                parts = full_text.split('\n')
                if len(parts) > 1:
                    player_out_part = parts[-1]
                    player_out_parts = player_out_part.split(', ')
                    if len(player_out_parts) > 1:
                        player_out = player_out_parts[0]
                        remark_event = player_out_parts[1]
                    else:
                        player_out = player_out_parts[0]
                player_event_element = item.find_element(By.XPATH, ".//div/div[4]/span[1]/a")
                player_event = player_event_element.get_attribute("title")


            else:
                player_event_element = item.find_element(By.XPATH, ".//div/div[4]/a")
                player_event = player_event_element.get_attribute("title")
                # Adjust this block to handle goals and cards specifically
                full_text = item.find_element(By.XPATH, ".//div/div[4]").text
                if event_type == "Goal":
                    parts = full_text.split(',')
                    if len(parts) > 2:  # If there are at least 3 parts, indicating a remark is present
                        remark_event = parts[1].strip()  # The part before the second ',' is the remark for goals
                        # Handling Assist information for goals
                        if "Assist:" in full_text:
                            assist_part = full_text.split('Assist:')[1].split(',')[0].strip()
                            player_assist = assist_part  # Assume player_assist is already defined elsewhere as None
                    else:
                        remark_event = parts[0].strip() if len(parts) > 1 else ""
                else:
                    # For Cards, just an example, adjust as needed
                    remark_event = full_text.split(',')[-1].strip() if ',' in full_text else full_text
        except NoSuchElementException:
            pass



        card_type = event_type  # Default card type is the event type itself
        if event_type == "Card":
            card_span_class = item.find_element(By.XPATH, ".//div/div[2]/span").get_attribute("class")
            if "gelbrot" in card_span_class:
                card_type = "Yellow-Red Card"
            elif "gelb" in card_span_class and "rot" not in card_span_class:
                card_type = "Yellow Card"
            elif "rot" in card_span_class:
                card_type = "Direct Red Card"

        events_data.append({
            "Timestamp": timestamp,
            "Club": club,
            "H/A": team,
            "Event": card_type,
            "Player Event": player_event,
            "Remark Event": remark_event,
            "Player Assist": player_assist,
            "Player Out": player_out,
         }) 
    return events_data


all_events_data = []
event_types = {"Goal": '//*[@id="sb-tore"]/ul', "Substitution": '//*[@id="sb-wechsel"]/ul', "Card": '//*[@id="sb-karten"]/ul'}

# Iterate through each event type and extract data
for event_type, xpath in event_types.items():
    events_data = extract_events(xpath, event_type, home_club_name, away_club_name)
    all_events_data.extend(events_data)


# Create DataFrame and reorder columns to put 'Timestamp' second
events_df = pd.DataFrame(all_events_data)
columns_order = ['Club', 'H/A', 'Timestamp', 'Event', 'Player Event', 'Remark Event', 'Player Assist', 'Player Out']
events_df = events_df[columns_order]


## WRAP UP ##

# Close the driver after scraping is done
driver.quit()

# Print a success message
print("Webscraping successfully completed")

# Display the combined DataFrame
events_df.head()

Webscraping successfully completed


Unnamed: 0,Club,H/A,Timestamp,Event,Player Event,Remark Event,Player Assist,Player Out
0,Servette FC,Home,27',Goal,Chris Bedia,Penalty,,
1,Servette FC,Home,40',Goal,Chris Bedia,Left-footed shot,Keigo Tsunemoto,
2,FC Lugano,Away,53',Goal,Shkelqim Vladi,Right-footed shot,Renato Steffen,
3,FC Lugano,Away,84',Goal,Ayman El Wafi,Header,Renato Steffen,
4,FC Lugano,Away,46',Substitution,Johan Nkama,Tactical,,Yanis Cimignani


In [10]:
#lets check the full event table 
events_df

Unnamed: 0,Club,H/A,Timestamp,Event,Player Event,Remark Event,Player Assist,Player Out
0,Servette FC,Home,27',Goal,Chris Bedia,Penalty,,
1,Servette FC,Home,40',Goal,Chris Bedia,Left-footed shot,Keigo Tsunemoto,
2,FC Lugano,Away,53',Goal,Shkelqim Vladi,Right-footed shot,Renato Steffen,
3,FC Lugano,Away,84',Goal,Ayman El Wafi,Header,Renato Steffen,
4,FC Lugano,Away,46',Substitution,Johan Nkama,Tactical,,Yanis Cimignani
5,FC Lugano,Away,46',Substitution,Ayman El Wafi,Tactical,,Albian Hajdari
6,Servette FC,Home,71',Substitution,Bendegúz Bolla,Tactical,,Alexis Antunes
7,FC Lugano,Away,71',Substitution,Zan Celar,Tactical,,Shkelqim Vladi
8,FC Lugano,Away,78',Substitution,Romeo Morandi,Tactical,,Roman Macek
9,Servette FC,Home,81',Substitution,Samba Diba,Tactical,,Dereck Kutesa


And now take a beer, you deserved it big time! ;)

let's try for another game to see, if the code is generic and works for all these pages:

In [11]:
## PAGE NAVIGATION ##
# Initialize the Chrome driver
driver = webdriver.Chrome()

# Navigate to the tm page
driver.get('https://www.transfermarkt.com/spielbericht/index/spielbericht/4089816') 

# Wait for page to load
time.sleep(2) 

# Wait for the iframe to be present and switch to it
wait = WebDriverWait(driver, 10)
iframe = wait.until(EC.presence_of_element_located((By.ID, "sp_message_iframe_953358")))
driver.switch_to.frame(iframe)

# Now wait for the 'Accept & continue' button to be clickable inside the iframe
accept_button = wait.until(EC.element_to_be_clickable((By.XPATH, "//button[contains(@class, 'accept')]")))
accept_button.click()

# Switch back to the main document
driver.switch_to.default_content()

## SCRAPING ## 

# Extracting club names
home_club_name = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[5]/div/div/div[1]/div[1]/div[2]/nobr/a').get_attribute("title")
away_club_name = driver.find_element(By.XPATH, '//*[@id="main"]/main/div[5]/div/div/div[2]/div[1]/div[2]/nobr/a').get_attribute("title")

# Function to convert pixel values to minutes based on the pattern provided
def convert_px_to_minute(x_px, y_px):
    # Remove any non-numeric characters and convert to integer
    x_px = int(re.sub(r'[^\d-]', '', str(x_px)))
    y_px = int(re.sub(r'[^\d-]', '', str(y_px)))
    
    # Convert negative values to positive
    x_px = abs(x_px)
    y_px = abs(y_px)
    
    unit_minutes = (x_px // 36) + 1
    ten_minutes = (y_px // 36) * 10
    timestamp = f"{unit_minutes + ten_minutes}'"
    return timestamp


def extract_px_from_style(style_str):
    # Use regular expression to find all pixel values in the style string
    px_values = re.findall(r'-?\d+px', style_str)  # Include optional minus sign
    
    # Check if there are at least two pixel values
    if len(px_values) >= 2:
        x_px, y_px = [int(px.strip('px')) for px in px_values[:2]]  # Take the first two values
        return x_px, y_px
    else:
        # Handle the case when there are not enough values
        return None, None  # You can return None or some default values


# Function to extract events with Remark Event adjustment
def extract_events(event_type_xpath, event_type, home_club_name, away_club_name):
    events_list = driver.find_element(By.XPATH, event_type_xpath)
    events_items = events_list.find_elements(By.TAG_NAME, "li")
    events_data = []

    for item in events_items:
        team = "Home" if "heim" in item.get_attribute("class") else "Away"
        club = home_club_name if team == "Home" else away_club_name

        # Extract the style attribute for timestamp
        style_str = item.find_element(By.XPATH, ".//div/div[1]/span").get_attribute("style")
        x_px, y_px = extract_px_from_style(style_str)
        timestamp = convert_px_to_minute(x_px, y_px)

        player_event = "N/A"  # Default value if player name is not found
        player_out = None  # Initialize player_out to None
        remark_event = ""  # Initialize remark_event to empty string
        player_assist = None  # Ensure this variable is also initialized

        try:
            player_event_element = None
            full_text = item.find_element(By.XPATH, ".//div/div[4]").text.strip()
            if event_type == "Substitution":
                parts = full_text.split('\n')
                if len(parts) > 1:
                    player_out_part = parts[-1]
                    player_out_parts = player_out_part.split(', ')
                    if len(player_out_parts) > 1:
                        player_out = player_out_parts[0]
                        remark_event = player_out_parts[1]
                    else:
                        player_out = player_out_parts[0]
                player_event_element = item.find_element(By.XPATH, ".//div/div[4]/span[1]/a")
                player_event = player_event_element.get_attribute("title")


            else:
                player_event_element = item.find_element(By.XPATH, ".//div/div[4]/a")
                player_event = player_event_element.get_attribute("title")
                # Adjust this block to handle goals and cards specifically
                full_text = item.find_element(By.XPATH, ".//div/div[4]").text
                if event_type == "Goal":
                    parts = full_text.split(',')
                    if len(parts) > 2:  # If there are at least 3 parts, indicating a remark is present
                        remark_event = parts[1].strip()  # The part before the second ',' is the remark for goals
                        # Handling Assist information for goals
                        if "Assist:" in full_text:
                            assist_part = full_text.split('Assist:')[1].split(',')[0].strip()
                            player_assist = assist_part  # Assume player_assist is already defined elsewhere as None
                    else:
                        remark_event = parts[0].strip() if len(parts) > 1 else ""
                else:
                    # For Cards, just an example, adjust as needed
                    remark_event = full_text.split(',')[-1].strip() if ',' in full_text else full_text
        except NoSuchElementException:
            pass



        card_type = event_type  # Default card type is the event type itself
        if event_type == "Card":
            card_span_class = item.find_element(By.XPATH, ".//div/div[2]/span").get_attribute("class")
            if "gelbrot" in card_span_class:
                card_type = "Yellow-Red Card"
            elif "gelb" in card_span_class and "rot" not in card_span_class:
                card_type = "Yellow Card"
            elif "rot" in card_span_class:
                card_type = "Direct Red Card"

        events_data.append({
            "Timestamp": timestamp,
            "Club": club,
            "H/A": team,
            "Event": card_type,
            "Player Event": player_event,
            "Remark Event": remark_event,
            "Player Assist": player_assist,
            "Player Out": player_out,
         }) 
    return events_data


all_events_data = []
event_types = {"Goal": '//*[@id="sb-tore"]/ul', "Substitution": '//*[@id="sb-wechsel"]/ul', "Card": '//*[@id="sb-karten"]/ul'}

# Iterate through each event type and extract data
for event_type, xpath in event_types.items():
    events_data = extract_events(xpath, event_type, home_club_name, away_club_name)
    all_events_data.extend(events_data)


# Create DataFrame and reorder columns to put 'Timestamp' second
events_df = pd.DataFrame(all_events_data)
columns_order = ['Club', 'H/A', 'Timestamp', 'Event', 'Player Event', 'Remark Event', 'Player Assist', 'Player Out']
events_df = events_df[columns_order]


## WRAP UP ##

# Close the driver after scraping is done
driver.quit()

# Print a success message
print("Webscraping successfully completed")

# Display the combined DataFrame
events_df.head()

Webscraping successfully completed


Unnamed: 0,Club,H/A,Timestamp,Event,Player Event,Remark Event,Player Assist,Player Out
0,FC Basel 1893,Away,13',Goal,Thierno Barry,Right-footed shot,,
1,FC Basel 1893,Away,72',Goal,Thierno Barry,Right-footed shot,,
2,FC Winterthur,Home,79',Goal,Loïc Lüthi,Header,Samir Ramizi,
3,FC Basel 1893,Away,90',Goal,Renato Veiga,Left-footed shot,Leon Avdullahu,
4,FC Winterthur,Home,5',Substitution,Boubacar Fofana,Injury,,Aldin Turkes


In [12]:
events_df

Unnamed: 0,Club,H/A,Timestamp,Event,Player Event,Remark Event,Player Assist,Player Out
0,FC Basel 1893,Away,13',Goal,Thierno Barry,Right-footed shot,,
1,FC Basel 1893,Away,72',Goal,Thierno Barry,Right-footed shot,,
2,FC Winterthur,Home,79',Goal,Loïc Lüthi,Header,Samir Ramizi,
3,FC Basel 1893,Away,90',Goal,Renato Veiga,Left-footed shot,Leon Avdullahu,
4,FC Winterthur,Home,5',Substitution,Boubacar Fofana,Injury,,Aldin Turkes
5,FC Basel 1893,Away,62',Substitution,Jean-Kévin Augustin,Tactical,,Marvin Akahomen
6,FC Winterthur,Home,64',Substitution,Thibault Corbaz,Tactical,,Sayfallah Ltaief
7,FC Winterthur,Home,65',Substitution,Loïc Lüthi,Tactical,,Alexandre Jankewitz
8,FC Basel 1893,Away,70',Substitution,Yusuf Demir,Tactical,,Benjamin Kololli
9,FC Basel 1893,Away,70',Substitution,Roméo Beney,Tactical,,Juan Carlos Gauto


## Match statistics (evtl)

Page Link: https://www.transfermarkt.com/servette-fc_fc-lugano/statistik/spielbericht/4089797

Description: