# Abstract

This project involves web scraping to collect football match data using Selenium WebDriver. The script automates the process of navigating to a sports website, interacting with dropdown menus to filter the data, and extracting information from tables. The collected data includes match dates, home and away teams, and scores. This information is then organized into a structured pandas DataFrame and saved as a CSV file for further analysis. The project demonstrates the use of web scraping techniques to gather sports statistics and automate data extraction processes.

# Tools Used

- **Selenium WebDriver**: Automates browser interactions to navigate and scrape data from web pages.
- **ChromeDriver**: Provides the WebDriver implementation for Google Chrome, enabling browser automation.
- **Pandas**: A data manipulation and analysis library used to create and manage the DataFrame and export data to CSV.
- **Python**: The programming language used to write the script and perform the web scraping tasks.
- **Chrome Browser**: The web browser used for navigating and interacting with the website.


# Code

**Import Modules**: imports necessary modules and classes from `selenium`, `time`, and `pandas` for web scraping and data handling.

In [2]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support import expected_conditions as EC
import time
import pandas as pd

**Set Paths**: defines custom paths for Chrome and ChromeDriver.

**Configure Driver**: sets up ChromeDriver with custom Chrome binary.

**Open URL**: navigates to the specified website.


In [3]:
chrome_test_path ="D:\\Webscraping\\chrome-win64\\chrome-win64\\chrome.exe"                                                    # custom chrome path (the testing Chrome)

chromedriver_path = "D:\\Webscraping\\chromedriver-win64\\chromedriver.exe"                                                    # chromedriver Path

# Set ChromeOptions to use the specific Chrome binary
chrome_options = Options()
chrome_options.binary_location = chrome_test_path

# Initialize the ChromeDriver with the custom binary and service
service = Service(executable_path=chromedriver_path)
driver = webdriver.Chrome(service=service, options=chrome_options)


website = 'https://www.adamchoi.co.uk/teamgoals/detailed'
driver.get(website)


**Click Button**: Finds and clicks the button labeled "All matches."


**Select Country**: Chooses "England" from the country dropdown menu.


**Select Season**: Chooses "23/24" from the season dropdown menu.


In [4]:
all_matches_button = driver.find_element('xpath', '//label[@analytics-event="All matches"]')
all_matches_button.click()

country_dropdown = Select(driver.find_element(By.ID, 'country'))
country_dropdown.select_by_visible_text('England')

time.sleep(3) # wait for 3 seconds

season_dropdwon = Select(driver.find_element(By.ID, 'season'))
season_dropdwon.select_by_visible_text('23/24')
time.sleep(3) # wait for 3 seconds


# Xpath syntax = //tagName[@AttributeName='Value']                                                                             # syntax for x path to use go to browser, developer tools , ctrl+f , //tr/td[1] 

**Wait for Tables**: Waits up to 10 seconds until all `<table>` elements are present on the page.

In [5]:
tables = WebDriverWait(driver, 10).until( # Wait until all tables are present on the page (or until 10 seconds timeout)
    EC.presence_of_all_elements_located((By.TAG_NAME, "table"))
)

 **Iterate Tables**: Loops through each table found.

**Extract Rows**: Finds all rows (`<tr>`) in the table.

**Extract Columns**: For each row, finds and prints the text in all columns (`<td>`), separated by `|`.



In [6]:

for index, table in enumerate(tables):                                                                                         # Iterate through each table
    print(f"\nTable {index + 1}:")

    rows = table.find_elements(By.TAG_NAME, "tr")
    
    for row in rows:
        columns = row.find_elements(By.TAG_NAME, "td")
        
        for column in columns:
            print(column.text, end=' | ')
        print()                                                                                                                # Move to the next line after each row


Table 1:
12-08-2023 | Arsenal | 2 - 1 | Nott'm Forest | 
21-08-2023 | Crystal Palace | 0 - 1 | Arsenal | 
26-08-2023 | Arsenal | 2 - 2 | Fulham | 
03-09-2023 | Arsenal | 3 - 1 | Man United | 
17-09-2023 | Everton | 0 - 1 | Arsenal | 
24-09-2023 | Arsenal | 2 - 2 | Tottenham | 
30-09-2023 | Bournemouth | 0 - 4 | Arsenal | 
08-10-2023 | Arsenal | 1 - 0 | Man City | 
21-10-2023 | Chelsea | 2 - 2 | Arsenal | 
28-10-2023 | Arsenal | 5 - 0 | Sheffield United | 
04-11-2023 | Newcastle | 1 - 0 | Arsenal | 
11-11-2023 | Arsenal | 3 - 1 | Burnley | 
25-11-2023 | Brentford | 0 - 1 | Arsenal | 
02-12-2023 | Arsenal | 2 - 1 | Wolves | 
05-12-2023 | Luton | 3 - 4 | Arsenal | 
09-12-2023 | Aston Villa | 1 - 0 | Arsenal | 
17-12-2023 | Arsenal | 2 - 0 | Brighton | 
23-12-2023 | Liverpool | 1 - 1 | Arsenal | 
28-12-2023 | Arsenal | 0 - 2 | West Ham | 
31-12-2023 | Fulham | 2 - 1 | Arsenal | 
20-01-2024 | Arsenal | 5 - 0 | Crystal Palace | 
30-01-2024 | Nott'm Forest | 1 - 2 | Arsenal | 
04-02-2024 | A

**Initialize Lists**: Creates empty lists for `date`, `home_team`, `score`, and `away_team`.

**Extract Data**: Loops through each table and each row to extract data from columns and append it to the respective lists.

**Quit Driver**: Closes the browser and ends the WebDriver session.


In [7]:
date = []
home_team = []
score = []
away_team = []

for table in tables:
    rows = table.find_elements(By.TAG_NAME, "tr")

    for row in rows:
        date.append(row.find_element(By.XPATH, './td[1]').text)
        home_team.append(row.find_element(By.XPATH, './td[2]').text)
        score.append(row.find_element(By.XPATH, './td[3]').text)
        away_team.append(row.find_element(By.XPATH, './td[4]').text)
        

driver.quit()

**Checking Score List**

In [10]:
score[:10]

['2 - 1',
 '0 - 1',
 '2 - 2',
 '3 - 1',
 '0 - 1',
 '2 - 2',
 '0 - 4',
 '1 - 0',
 '2 - 2',
 '5 - 0']

**Create Dictionary**: Constructs a dictionary with `date`, `home_team`, `score`, and `away_team` lists.


**Create DataFrame**: Converts the dictionary into a pandas DataFrame.


In [11]:
dict = {'date' : date, 'home_team': home_team, 'score': score, 'away_team': away_team}
df = pd.DataFrame(dict)

**Completed Data Frame**

In [12]:
df

Unnamed: 0,date,home_team,score,away_team
0,12-08-2023,Arsenal,2 - 1,Nott'm Forest
1,21-08-2023,Crystal Palace,0 - 1,Arsenal
2,26-08-2023,Arsenal,2 - 2,Fulham
3,03-09-2023,Arsenal,3 - 1,Man United
4,17-09-2023,Everton,0 - 1,Arsenal
...,...,...,...,...
755,24-04-2024,Wolves,0 - 1,Bournemouth
756,27-04-2024,Wolves,2 - 1,Luton
757,04-05-2024,Man City,5 - 1,Wolves
758,11-05-2024,Wolves,1 - 3,Crystal Palace


**Save DataFrame**: Exports the DataFrame to a CSV file named `football_data.csv` without including the index.

In [None]:
df.to_csv('football_data.csv', index=False)

# What I Learned

- **How to Use Selenium**: I learned how to use Selenium to control a web browser automatically, which helps in scraping data from websites.
- **Extracting Data**: I figured out how to find and grab specific pieces of information from web pages, like clicking buttons and selecting options from dropdown menus.
- **Handling Data with Pandas**: I got the hang of using pandas to organize the data I scraped, turning it into a table and saving it as a CSV file.
- **Setting Up Browser Automation**: I set up ChromeDriver to work with Selenium, allowing me to automate browsing tasks.
- **Real-World Application**: I saw how these techniques can be used to collect and work with data from real websites.


# Conclusion

In this project, I successfully used web scraping to gather football match data from a website. By setting up Selenium and ChromeDriver, I was able to automate browsing and extract information like match dates, teams, and scores. I then used pandas to organize this data into a neat table and saved it as a CSV file. This experience showed me how to automate data collection and handle it efficiently, making it easier to analyze and use in future projects.
