## ESPN Subpages Longest Network Requests

**Steps:**

1. **Initializes the WebDriver with performance logging enabled**
   - Sets up the WebDriver to interact with the Chrome browser and enable performance logging.

2. **Defines the base URL and subpages to measure**
   - Lists the subpages of ESPN's website for network request capture.

3. **Captures the longest network requests for each subpage**
   - Records the 10 longest network requests for each subpage.

4. **Stores the data in a dictionary**
   - Organizes the captured network request data for each subpage.

5. **Creates DataFrames for each page**
   - Structures the data using pandas DataFrames for easier manipulation and analysis.

6. **Generates bar charts for each page**
   - Visualizes the longest network requests for better analysis and understanding.

**Safety Concerns:**

1. **No data modification on the website, only read and log**
   - Ensures the script only reads and logs data without making any changes.

2. **Safe to run the script on the website without risk of altering content or changing network requests**
   - Confirms that the script’s operations do not interfere with the website’s functionality.

3. **Read-Only actions when exploring the subpages**
   - Guarantees that all actions performed by the script are read-only, preventing any unintended modifications.

**Installs Required:**

- `pip install selenium`
- `pip install webdriver-manager`
- `pip install pandas`
- `pip install matplotlib`

**Other Requirements:**

- **Ensure that the ChromeDriver version matches your installed Chrome browser version**
   - Compatibility between ChromeDriver and the Chrome browser version is crucial.

**Helpful Links:**

- [Stack Overflow: Retrieving all network requests required to load a webpage using Python](https://stackoverflow.com/questions/68516062/retrieving-all-network-requests-required-to-load-a-webpage-using-python)
- [R.K. Engler: How to capture network traffic when scraping with Selenium and Python](https://www.rkengler.com/how-to-capture-network-traffic-when-scraping-with-selenium-and-python/)
- [PyPI: Selenium Wire](https://pypi.org/project/selenium-wire/)


In [2]:
import json
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
import matplotlib.pyplot as plt

BASE_URL = 'https://www.espn.com'
SUBPAGES = {
    'Home': BASE_URL,
    'NFL': f'{BASE_URL}/nfl/',
    'NBA': f'{BASE_URL}/nba/',
    'MLB': f'{BASE_URL}/mlb/'
}

# Initialize WebDriver with performance logging enabled
options = Options()
capabilities = webdriver.DesiredCapabilities.CHROME.copy()
capabilities["goog:loggingPrefs"] = {"performance": "ALL"}
options.add_argument('--headless')  # Runs Chrome in headless mode (comment out for debugging)
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--disable-gpu')
options.set_capability("goog:loggingPrefs", {"performance": "ALL"})  # Enable performance logging

driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=options)

# Function to capture the longest network requests
def capture_longest_requests(driver, url):
    try:
        driver.get(url)  # Navigate to the URL
        time.sleep(10)  # Wait for the page to fully load

        # Capture performance logs
        logs = driver.get_log("performance")
        network_requests = {}

        for log in logs:
            log_data = json.loads(log["message"])["message"]
            if log_data["method"] == "Network.responseReceived":
                req_url = log_data["params"]["response"]["url"]
                if log_data["params"]["type"] in ["XHR", "Fetch", "Script", "Stylesheet"]:
                    try:
                        timing = log_data["params"]["response"]["timing"]
                        if timing:
                            duration = timing["receiveHeadersEnd"] - timing["sendEnd"]
                            # Store the longest duration for each URL
                            if req_url not in network_requests or network_requests[req_url] < duration:
                                network_requests[req_url] = duration
                    except KeyError:
                        pass

        # Find the top 10 longest-running unique requests
        top_10_requests = sorted(network_requests.items(), key=lambda x: x[1], reverse=True)[:10]
        return top_10_requests
    except Exception as e:
        print(f"An error occurred: {e}")
        return []

# Store the data in a dictionary
data = {}

# Measure load times and network requests for each subpage
for page, url in SUBPAGES.items():  # Iterate over each page and its URL
    print(f"Capturing network requests for {page} page...")
    top_requests = capture_longest_requests(driver, url)
    if top_requests:
        data[page] = top_requests
        print(f"Top 10 longest network requests on {page} page:")
        for req_url, req_time in top_requests:
            print(f"  {req_url}: {req_time:.2f} ms")
    else:
        print(f"Failed to capture network requests for {page} page")

driver.quit()  # Close the browser

# Create DataFrames for each page
dfs = {page: pd.DataFrame(requests, columns=["URL", "Duration (ms)"]) for page, requests in data.items()}



Capturing network requests for Home page...
Top 10 longest network requests on Home page:
  https://sp.auth.adobe.com/adobe-services/config/ESPN: 259.05 ms
  https://cdn.registerdisney.go.com/v4/responder.js: 243.30 ms
  https://cdn1.espn.net/fitt/42044d91aaee-release-07-22-2024.2.0.1061/client/watch/watch.syndicatedplayer.watch-efd2f49d.js: 177.94 ms
  https://cdn1.espn.net/fitt/42044d91aaee-release-07-22-2024.2.0.1061/client/watch/watch-340a76b8.js: 151.25 ms
  https://site.api.espn.com/apis/personalized/v2/scoreboard/header?_ceID=4379198&configuration=SITE_LEGACY&lang=en&region=us&contentorigin=espn&tz=America%2FNew_York&platform=web&showAirings=buy%2Clive%2Creplay&showZipLookup=true&buyWindow=1m&postalCode=76104: 148.52 ms
  https://cdn1.espn.net/fitt/42044d91aaee-release-07-22-2024.2.0.1061/client/watch/runtime-74bf48fc.js: 145.03 ms
  https://cdn1.espn.net/fitt/42044d91aaee-release-07-22-2024.2.0.1061/client/watch/espn-en.watch-b0bb9671.js: 130.71 ms
  https://dcf.espn.com/TWDC-D