<a href="https://colab.research.google.com/github/silvia-denanni/DI-Bootcamp-nov25/blob/main/W8D2ExercisesXPDynamicWebScraping.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Exercise 1 : Exploring JavaScript Variables and Data Types
**Instructions**

Create a JavaScript script that defines variables of different data types and logs them to the console.
Instructions

- Create a new HTML file with a script> tag.
Inside the script> tag, declare variables of different data types (String, Number, Boolean, Undefined, Null).

- Use console.log() to print each variable and its type to the browser console.
Open the HTML file in a web browser and inspect the console output.

```
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <title>JavaScript Variables and Data Types</title>
</head>
<body>
  <script>
    // Declare variables of different data types
    let myString = "Hello, JavaScript!"; // String
    let myNumber = 42;                    // Number
    let myBoolean = true;                 // Boolean
    let myUndefined;                      // Undefined (no value assigned)
    let myNull = null;                    // Null

    // Log each variable and its type to the console
    console.log(myString, typeof myString);
    console.log(myNumber, typeof myNumber);
    console.log(myBoolean, typeof myBoolean);
    console.log(myUndefined, typeof myUndefined);
    console.log(myNull, typeof myNull); // Note: typeof null returns "object" due to JS quirk
  </script>
</body>
</html>
```




#Exercise 2 : JavaScript Page vs. HTML Page
**Instructions**

Compare the behavior of a static HTML page with a JavaScript-enhanced HTML page.

**Instructions**

Create two HTML files – one with only HTML content and another with HTML and JavaScript.

In the first file, create a static page with headings, paragraphs, and a list.

In the second file, add JavaScript to dynamically modify one of the elements on page load (e.g., change the text of a heading).

Open both files in a web browser and observe the differences in behavior and content rendering.

**Expected Outcome**

The static HTML page should display content as is, whereas the JavaScript-enhanced page should show dynamically altered content, illustrating the interactivity added by JavaScript.

#A. Static HTML Page (static.html)
This page contains only HTML elements: headings, paragraphs, and a list. The content is fixed and does not change after loading.

```
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <title>Static HTML Page</title>
</head>
<body>
  <h1>Welcome to My Static Page</h1>
  <p>This page contains only static HTML content.</p>
  <ul>
    <li>HTML is static</li>
    <li>No interactivity</li>
    <li>Content does not change</li>
  </ul>
</body>
</html>

```



#B. JavaScript-Enhanced HTML Page (dynamic.html)

This page has the same initial HTML content but includes JavaScript that dynamically changes the heading text when the page loads, illustrating interactivity.



```
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <title>JavaScript-Enhanced Page</title>
</head>
<body>
  <h1 id="main-heading">Welcome to My Static Page</h1>
  <p>This page contains HTML content enhanced with JavaScript.</p>
  <ul>
    <li>HTML is static</li>
    <li>JavaScript adds interactivity</li>
    <li>Content can change dynamically</li>
  </ul>

  <script>
    // Change the heading text dynamically on page load
    document.getElementById('main-heading').innerText = "Welcome to My Dynamic Page!";
  </script>
</body>
</html>
```



#Exercise 3 : Scrape Dynamic Content from Rotten Tomatoes

**Task:**

- Use Selenium to navigate to the Rotten Tomatoes Certified Fresh Movies page.
- Extract the HTML content after it’s fully loaded.
- Use BeautifulSoup to parse and extract the movie titles, scores, and release dates.

**Instructions**
- Set up Selenium WebDriver and navigate to the Rotten Tomatoes page.

- Extract the HTML content using driver.page_source.
- Parse the HTML with BeautifulSoup.
- Find and extract the desired movie information.
Print the extracted data.


The first step is to install Selenium to ensure the necessary libraries are available for WebDriver setup.



In [18]:
pip install selenium



Install webdriver_manager to automatically handle ChromeDriver installation and path configuration, ensuring compatibility.



In [17]:
pip install webdriver-manager



## Install Google Chrome

### Subtask:
Install Google Chrome directly using `wget` and `dpkg -i` to bypass APT repository and GPG key issues, and then resolve any dependencies.


In [22]:
import subprocess

# Download the latest Google Chrome .deb package
subprocess.run(['wget', 'https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb'])

print('Google Chrome .deb package downloaded.')

Google Chrome .deb package downloaded.


**Reasoning**:
Now that the Google Chrome .deb package is downloaded, the next step is to install it using `dpkg -i` as per the instructions.



In [23]:
import subprocess

# Install the downloaded .deb package
subprocess.run(['sudo', 'dpkg', '-i', 'google-chrome-stable_current_amd64.deb'])

print('Google Chrome .deb package installation initiated.')

Google Chrome .deb package installation initiated.


**Reasoning**:
The previous step initiated the installation of Google Chrome, and now it's necessary to resolve any potential broken dependencies as instructed by running `apt-get install -f`.



In [24]:
import subprocess

# Resolve any broken dependencies
subprocess.run(['sudo', 'apt-get', 'install', '-f', '-y'])

print('Attempting to resolve broken dependencies.')

Attempting to resolve broken dependencies.


In [35]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import time
from webdriver_manager.chrome import ChromeDriverManager

# Set up Selenium WebDriver with ChromeDriverManager
service = Service(ChromeDriverManager().install())
options = webdriver.ChromeOptions()
options.add_argument('--headless')  # Run in headless mode (optional)
options.add_argument('--no-sandbox') # Required for running as root/headless
options.add_argument('--disable-dev-shm-usage') # Required for running in environments with limited /dev/shm
options.add_argument('--window-size=1920,1080') # Set a specific window size to simulate a larger display
options.add_argument('user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36') # Add user-agent
driver = webdriver.Chrome(service=service, options=options)

try:
    # Navigate to Rotten Tomatoes Certified Fresh Movies page
    url = 'https://www.rottentomatoes.com/browse/movies_in_theaters/critics:certified_fresh~sort:popular'
    driver.get(url)

    # Maximize window for better rendering and element visibility
    driver.maximize_window()
    print("Window maximized.")

    # Give the page some initial time to load its basic structure
    time.sleep(10) # Increased initial sleep

    # Try to dismiss a potential cookie consent banner or other overlay
    try:
        # Look for a button with text 'Accept' or 'Agree' (common for cookie banners)
        # Also try more generic "Got it" or "Close" selectors
        accept_button = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.XPATH, "//button[contains(., 'Accept')] | //button[contains(., 'Agree')] | //button[contains(., 'Got it')] | //button[contains(@aria-label, 'Close')] | //button[text()='I Accept'] | //button[contains(text(), 'I Accept')] | //button[contains(text(), 'Accept All')] "))
        )
        accept_button.click()
        print("Clicked 'Accept/Agree/Got it/Close' button on cookie banner/modal.")
        time.sleep(3) # Give time for the banner to disappear
    except Exception as e:
        print(f"No common 'Accept/Agree/Got it/Close' button found or not clickable within 10 seconds. Proceeding... Error: {e}")
        pass # If no cookie banner, continue

    # Robust scrolling to load all dynamic content
    last_height = driver.execute_script("return document.body.scrollHeight")
    scroll_attempts = 0
    max_scroll_attempts = 30 # Further increased max attempts
    previous_movie_count = 0

    print("Starting continuous scroll...")
    while scroll_attempts < max_scroll_attempts:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(5) # Increased sleep after scroll for more content to load

        new_height = driver.execute_script("return document.body.scrollHeight")
        current_movie_count = len(driver.find_elements(By.CSS_SELECTOR, 'a[data-qa="discovery-media-list-item"]'))

        print(f"Scrolled {scroll_attempts+1} times. Current height: {new_height}, Movies found: {current_movie_count}")

        if new_height == last_height and current_movie_count == previous_movie_count:
            # If height and movie count haven't changed, try to scroll again after a short pause
            time.sleep(2)
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            new_height = driver.execute_script("return document.body.scrollHeight")
            current_movie_count = len(driver.find_elements(By.CSS_SELECTOR, 'a[data-qa="discovery-media-list-item"]'))
            if new_height == last_height and current_movie_count == previous_movie_count:
                print(f"Scroll height and movie count did not change after {scroll_attempts+1} attempts. Breaking scroll loop.")
                break # Truly at the end of the scroll or no more content to load

        last_height = new_height
        previous_movie_count = current_movie_count
        scroll_attempts += 1

    print(f"Finished scrolling. Final page height: {last_height}")

    # Wait until at least one movie list item is present after scrolling
    # This also acts as a final check that the page is ready
    wait = WebDriverWait(driver, 120) # Increased wait time significantly to 2 minutes
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'a[data-qa="discovery-media-list-item"]')))
    print("Movie list elements are present.")

    # Give a final generous sleep to allow all loaded elements to render completely
    time.sleep(15)

    # Extract the fully loaded page source
    page_source = driver.page_source

    # Parse with BeautifulSoup
    soup = BeautifulSoup(page_source, 'html.parser')

    # Find all movie tiles using the new selector
    movies = soup.select('a[data-qa="discovery-media-list-item"]')
    print(f"Found {len(movies)} movies.")

    # Extract and print movie details
    if not movies:
        print("No movies found. Check selectors or page loading process.")

    for movie in movies:
        # Extract title using the new selector
        title_tag = movie.select_one('p.p--small')
        title = title_tag.text.strip() if title_tag else 'N/A'

        # Extract score (Tomatometer) using the new selector
        score_tag = movie.select_one('span.percentage')
        score = score_tag.text.strip() if score_tag else 'N/A'

        # Extract release date using the new selector
        release_tag = movie.select_one('p.minor-caption')
        release_date = release_tag.text.strip() if release_tag else 'N/A'

        print(f"Title: {title}, Score: {score}, Release Date: {release_date}")

finally:
    driver.quit()


Window maximized.
No common 'Accept/Agree/Got it/Close' button found or not clickable within 10 seconds. Proceeding... Error: Message: 

Starting continuous scroll...
Scrolled 1 times. Current height: 3108, Movies found: 3
Scrolled 2 times. Current height: 3108, Movies found: 3
Scroll height and movie count did not change after 2 attempts. Breaking scroll loop.
Finished scrolling. Final page height: 3108
Movie list elements are present.
Found 3 movies.
Title: N/A, Score: N/A, Release Date: N/A
Title: N/A, Score: N/A, Release Date: N/A
Title: N/A, Score: N/A, Release Date: N/A


**Reasoning**:
The previous scraping attempt still resulted in 'N/A' values, indicating persistent issues with element selection or page loading on Rotten Tomatoes.


# Exercise 4 : Scrape and Categorize News Articles from a JavaScript-Enabled News Site
**Task:**

Visit this website https://www.bbc.com/innovation/technology

Scrape news article titles and their publication dates.

Categorize articles based on their publication month.

**Instructions:**
Use Selenium to navigate to a specific news section on the website.

Extract and parse the HTML content that is dynamically loaded via JavaScript.

Using BeautifulSoup, extract news article titles and publication dates.

Categorize articles by their publication month (e.g., ‘January’, ‘February’, etc.).
Print the categorized lists of articles.

**Reasoning**:
The first instruction is to verify the Google Chrome installation and its version using the `google-chrome --version` command.



In [59]:
import subprocess

# Verify Google Chrome installation and version
result = subprocess.run(['google-chrome', '--version'], capture_output=True, text=True)
print(result.stdout)
if result.returncode != 0:
    print('Google Chrome is not installed or not found in PATH.')
    print(result.stderr)
else:
    print('Google Chrome version verified.')

Google Chrome 143.0.7499.169 

Google Chrome version verified.


**Reasoning**:
The previous step verified the Google Chrome version. The next instruction is to test if Chrome can run in headless mode by attempting to launch it from the command line and dumping the DOM of a simple page.



In [60]:
import subprocess

# Attempt to launch Chrome in headless mode and dump the DOM
result = subprocess.run(
    ['google-chrome', '--headless', '--disable-gpu', '--dump-dom', 'https://www.google.com'],
    capture_output=True,
    text=True
)

print('Headless Chrome test initiated.')
if result.returncode != 0:
    print('Error launching headless Chrome or dumping DOM.')
    print(result.stderr)
else:
    print('Headless Chrome successfully launched and dumped DOM (partial output for brevity).')
    # Print only a portion of the output to avoid flooding the console with full HTML
    print(result.stdout[:1000] + ('...' if len(result.stdout) > 1000 else ''))

Headless Chrome test initiated.
Error launching headless Chrome or dumping DOM.
[37822:37822:1229/213239.707895:ERROR:content/browser/zygote_host/zygote_host_impl_linux.cc:101] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.



**Reasoning**:
The previous attempt to run headless Chrome failed with an error indicating that running as root without '--no-sandbox' is not supported. This step will re-attempt the headless Chrome test, explicitly adding the `--no-sandbox` argument to resolve this issue.



In [61]:
import subprocess

# Attempt to launch Chrome in headless mode and dump the DOM with --no-sandbox
result = subprocess.run(
    ['google-chrome', '--headless', '--disable-gpu', '--no-sandbox', '--dump-dom', 'https://www.google.com'],
    capture_output=True,
    text=True
)

print('Headless Chrome test initiated with --no-sandbox.')
if result.returncode != 0:
    print('Error launching headless Chrome or dumping DOM.')
    print(result.stderr)
else:
    print('Headless Chrome successfully launched and dumped DOM (partial output for brevity).')
    # Print only a portion of the output to avoid flooding the console with full HTML
    print(result.stdout[:1000] + ('...' if len(result.stdout) > 1000 else ''))

Headless Chrome test initiated with --no-sandbox.
Headless Chrome successfully launched and dumped DOM (partial output for brevity).
<!DOCTYPE html>
<html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta charset="UTF-8"><link href="//www.gstatic.com/images/branding/searchlogo/ico/favicon.ico" rel="icon"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script nonce="">window._hst=Date.now();</script><script nonce="">(function(){var _g={kEI:'jfNSaYHwHq2EwbkP2v7v-QI',kEXPI:'31',kBL:'lvLq',kOPI:89978449};(function(){var a;((a=window.google)==null?0:a.stvsc)?google.kEI=_g.kEI:window.google=_g;}).call(this);})();(function(){google.sn='webhp';google.kHL='en';google.rdn=false;})();(function(){
var g=this||self;function k(){return window.google&&window.google.kOPI||null};var l,m=[];function n(a){for(var b;a&&(!a.getAttribute||!(b=a.getAttribute("eid")));)a=a.parentNode;return b||l}function p(a){for(var b=n

## Refine ChromeOptions for stability

### Subtask:
Add or modify ChromeOptions arguments to enhance stability in headless mode, such as `--disable-gpu`, `--no-sandbox`, and `--disable-dev-shm-usage`, which can mitigate common issues in containerized or cloud environments.


**Reasoning**:
The previous attempt to run Selenium in headless mode failed due to an `SessionNotCreatedException`. To address this and follow the instructions, I will refine the `ChromeOptions` by ensuring all specified arguments (`--headless`, `--disable-gpu`, `--no-sandbox`, `--disable-dev-shm-usage`, `--window-size`, and `user-agent`) are correctly added, and integrate `webdriver_manager` for robust ChromeDriver setup, as was done in Exercise 3.



In [65]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
from datetime import datetime
import time
import json
from webdriver_manager.chrome import ChromeDriverManager

# Setup headless Chrome with refined options
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu') # Added for stability in headless environments
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--window-size=1920,1080') # Simulate a larger display
options.add_argument('user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36') # Mimic a real browser

# Use ChromeDriverManager to automatically handle ChromeDriver installation
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=options)

try:
    driver.get('https://www.bbc.com/innovation/technology')

    # Wait for dynamic content to load. Using more general selectors for article containers.
    wait = WebDriverWait(driver, 30) # Increased wait time to ensure page loads fully
    wait.until(EC.presence_of_element_located((
        By.CSS_SELECTOR,
        'div.gs-c-promo-card, article, div.qa-story-block, div[data-component="promo-list"]'
    )))

    # Give a bit more time for all elements to settle after initial load
    time.sleep(10) # Increased sleep after initial wait

    # Get page source after JS loads
    soup = BeautifulSoup(driver.page_source, 'html.parser')

    articles = []

    # Updated selectors based on general BBC news page structure analysis.
    # Looking for div elements that are likely wrappers for individual news items or promos.
    article_elements = soup.select(
        'div.gs-c-promo-card, ' # Common promo card container
        'div.qa-story-block, ' # Another common story block selector
        'article, ' # General semantic article tag
        'div[data-component="promo"]' # General promo component
    )

    print(f"Found {len(article_elements)} potential article containers.")

    if not article_elements:
        print("No article containers found with current selectors. Check page source for new selectors.")

    for article_el in article_elements:
        # Try multiple common title selectors within an article element
        title_el = article_el.select_one(
            'h3.gs-c-promo-heading__title a, ' # Title inside a link inside an h3
            'h2.gs-c-promo-heading__title a, ' # Title inside a link inside an h2
            'a.qa-story-headline, ' # Link with story headline class
            'a[data-testid="story-link"] h3, ' # Link containing an h3 as title
            'a[data-testid="story-link"] h2, ' # Link containing an h2 as title
            'h3 a, h2 a, p.gs-c-promo-summary' # More generic title patterns, p for summary if main title isn't found
        )
        # Try multiple common date selectors within an article element
        date_el = article_el.select_one(
            'time[datetime], ' # Standard time tag with datetime attribute (preferred)
            'span.gs-c-promo-meta__time, ' # Common meta time span
            'span.qa-story-date, ' # Story date span
            'span[data-testid="timestamp"]'
        )

        title = title_el.get_text(strip=True) if title_el else 'N/A'
        date_str = date_el.get('datetime') if date_el and date_el.get('datetime') else (date_el.get_text(strip=True) if date_el else 'N/A')

        if title != 'N/A' and date_str != 'N/A' and not date_str.lower().endswith(('ago', 'hours', 'minutes', 'yesterday', 'today')):
            try:
                pub_date = None

                # ISO format: '2023-12-01T12:00:00Z' or '2023-12-01T12:00:00+00:00'
                if 'T' in date_str and ('Z' in date_str or '+' in date_str):
                    pub_date = datetime.fromisoformat(date_str.replace('Z', '+00:00'))
                # Date like '12 Dec 2023'
                elif any(month_abbr in date_str for month_abbr in ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']) and len(date_str.split()) >= 3:
                    try: # Try 'DD Mon YYYY' like '21 Dec 2023'
                        pub_date = datetime.strptime(date_str, '%d %b %Y')
                    except ValueError:
                        try: # Try 'Mon DD, YYYY' like 'Dec 21, 2023'
                            pub_date = datetime.strptime(date_str, '%b %d, %Y')
                        except ValueError:
                            pass
                # Fallback for simpler YYYY-MM-DD or similar if it appears
                elif len(date_str) == 10 and date_str.count('-') == 2: # YYYY-MM-DD
                    try:
                        pub_date = datetime.strptime(date_str, '%Y-%m-%d')
                    except ValueError:
                        pass

                if pub_date:
                    month = pub_date.strftime('%B')  # e.g., 'December'
                    articles.append({'title': title, 'date': date_str, 'month': month})
                else:
                    # If unable to parse to a specific date for categorization, print a message.
                    print(f"Could not parse date format for '{date_str}' for title '{title}'. Skipping for month categorization.")

            except ValueError as ve:
                print(f"Failed to parse date '{date_str}' for title '{title}'. Error: {ve}. Skipping.")
        else:
            if date_str != 'N/A' and date_str.lower().endswith(('ago', 'hours', 'minutes', 'yesterday', 'today')):
                print(f"Skipping article '{title}' due to relative date format: '{date_str}'.")
            else:
                print(f"Skipping article due to missing title or date: Title='{title}', Date='{date_str}'.")

    print(f"Extracted {len(articles)} articles after parsing.")

    # Categorize by month
    categorized = {}
    for art in articles:
        month = art['month']
        if month not in categorized:
            categorized[month] = []
        categorized[month].append(art['title'])

    # Print categorized lists
    for month, titles in categorized.items():
        print(f"\n{month}:")
        for title in titles:
            print(f"- {title}")

    print(json.dumps(categorized, indent=2))  # Structured output

finally:
    driver.quit()


Found 1 potential article containers.
Skipping article due to missing title or date: Title='N/A', Date='N/A'.
Extracted 0 articles after parsing.
{}


#Exercise 5 : Scrape and Analyze Weather Data from a JavaScript-Enabled Weather Website
**Task:**

Visit https://www.accuweather.com/en/us/attica/30607/weather-forecast/2139413

Scrape weather forecast data including temperature, condition, and humidity.

Analyze the data to find the average temperature and most common weather condition.

**Instructions:**

Use Selenium to navigate to the weather forecast page of a specific city.

Extract and parse the HTML content, focusing on dynamically loaded weather data.

Using BeautifulSoup, extract relevant weather information like temperature, condition (sunny, cloudy, etc.), and humidity.

Calculate the average temperature and identify the most common weather condition.
Print the analysis results.


# Task
Scrape weather forecast data from `https://www.accuweather.com/en/us/attica/30607/weather-forecast/2139413`, then analyze and summarize the average temperature and the most common weather condition.

## Initialize Selenium WebDriver

### Subtask:
Set up the Selenium WebDriver (Chrome) in headless mode with necessary options for stable execution, using `webdriver_manager` for automatic ChromeDriver management.


**Reasoning**:
The subtask requires initializing the Selenium WebDriver with specific options. The provided instructions detail each step, including importing necessary modules, configuring ChromeOptions for headless and stable execution, using `webdriver_manager` for ChromeDriver, and finally initializing the WebDriver.



In [68]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

# Create an instance of Options to configure Chrome browser settings
options = Options()

# Add arguments for headless and stable execution
options.add_argument('--headless')
options.add_argument('--disable-gpu') # Added for stability in headless environments
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--window-size=1920,1080') # Simulate a larger display
options.add_argument('user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36') # Mimic a real browser

# Instantiate ChromeDriverManager and call its install() method, then pass to Service
service = Service(ChromeDriverManager().install())

# Initialize the Chrome WebDriver
driver = webdriver.Chrome(service=service, options=options)

print("Selenium WebDriver initialized successfully in headless mode.")

Selenium WebDriver initialized successfully in headless mode.


## Navigate to AccuWeather Page

### Subtask:
Use Selenium to navigate to the specified AccuWeather URL: https://www.accuweather.com/en/us/attica/30607/weather-forecast/2139413.


**Reasoning**:
The next step is to navigate to the specified AccuWeather URL using the initialized Selenium WebDriver and confirm the navigation.



In [69]:
accuweather_url = 'https://www.accuweather.com/en/us/attica/30607/weather-forecast/2139413'
driver.get(accuweather_url)
print(f"Navigated to AccuWeather page: {accuweather_url}")

Navigated to AccuWeather page: https://www.accuweather.com/en/us/attica/30607/weather-forecast/2139413


**Reasoning**:
After navigating to the AccuWeather page, the next step is to wait for the dynamic content to load to ensure all weather data is present before parsing the HTML.



In [70]:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time

# Wait for dynamic content to load, specifically for a common weather data element
wait = WebDriverWait(driver, 60) # Increased wait time

try:
    # Wait for the main content to load, using a known element that typically holds weather data
    wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'div.day-panel-container')))
    print("Main weather content loaded.")
except Exception as e:
    print(f"Error waiting for main weather content to load: {e}")

# Give it a bit more time for all elements to settle after initial load and potential cookie banners
time.sleep(5) # Additional sleep

# Try to dismiss any cookie consent banner that might appear and obstruct elements
try:
    # Look for common cookie consent button selectors
    accept_button = WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((
            By.XPATH, "//button[contains(., 'Accept')] | //button[contains(., 'Agree')] | //button[contains(., 'Got it')] | //button[contains(@aria-label, 'Close')] | //div[contains(@class, 'fc-dialog-container')]//button[text()='I Accept']"
        ))
    )
    accept_button.click()
    print("Clicked 'Accept/Agree/Got it/Close' button on cookie banner/modal.")
    time.sleep(3) # Give time for the banner to disappear
except Exception as e:
    print(f"No common 'Accept/Agree/Got it/Close' button found or not clickable within 10 seconds. Proceeding... Error: {e}")
    pass # If no cookie banner, continue

# Maximize window if not already done, for better element visibility
driver.maximize_window()
print("Window maximized (if not already).")

Error waiting for main weather content to load: HTTPConnectionPool(host='localhost', port=37807): Read timed out. (read timeout=120)
No common 'Accept/Agree/Got it/Close' button found or not clickable within 10 seconds. Proceeding... Error: HTTPConnectionPool(host='localhost', port=37807): Read timed out. (read timeout=120)


ReadTimeoutError: HTTPConnectionPool(host='localhost', port=37807): Read timed out. (read timeout=120)