### **Handling Dynamic Content**
Many modern websites use JavaScript to dynamically load content after the initial page load (e.g., using frameworks like React or Angular). This means the content you need to scrape might not be present in the raw HTML source when the page first loads.

### Selenium
Selenium stands out from other scraping tools because it interacts with a real browser. This makes it particularly useful for websites that rely heavily on JavaScript, AJAX, or dynamic content loading. Selenium can handle:
- Pages that load content dynamically with JavaScript.
- Interaction with forms, buttons, and scroll events.
- Real browser interactions

In [None]:
from selenium import webdriver
bookstoscrape_url = "https://books.toscrape.com/"

In [None]:
# TODO: Do 5 minute of web searching with 'selenium python' as the keyword and state out what you found in short sentence.

### Basic Usage of Selenium

In [None]:
"""
Objective: Creating a browser instance
"""
firefox_driver = webdriver.Firefox()

In [None]:
""" 
Objective: Opening bookstoscrape_url
"""
firefox_driver.get(bookstoscrape_url)

In [None]:
""" 
Objective: Save the page content to html variable then use Beautifulsoup to extract the data
"""
html = firefox_driver.page_source


In [None]:
""" 
Objective: Close the browser instance
"""
firefox_driver.quit()

### Browser Instance Variance

In [None]:
""" 
Objective: Compare the difference between Firefox webdriver and Chrome webdriver
"""
# TODO: Create a Chrome browser instance
# TODO: Create a Firefox browser instance
# TODO: Analyze the difference between firefox and chrome browser instance

In [None]:
""" 
Objective: Change the window size in Chrome webdriver

Set the window size to a particular size x and y --> driver.set_window_size(x, y)
Set the window size to max --> driver.maximize_window()
Get the current window size --> driver.get_window_size()
"""
# TODO: Change the window size of the firefox browser instance to 200 and 400
# TODO: Print the current window size of the chrome browser instance
# TODO: Change the window size of the chrome browser instance to full screen

In [None]:
# TODO: Close all the browser instance using quit method

In [None]:
""" 
Objective: Applying browser variance using Options
"""
# TODO: Execute this cell before continue
# TODO: Choose between these 2 method to add options into webdriver instance

## Method 1:
# from selenium.webdriver.chrome.options import Options
# options = Options()

## Method 2:
# options = webdriver.ChromeOptions()

In [None]:
""" 
Objective: Change the windows to full size with Options
Options is a way to create a browser instance with customization.

Example:
## Create the options object from one of two previous cell mentioned
# options = Options()

## Add customization as argument
options.add_argument("--start-maximized")

## You can add multiple argument after the other
options.add_argument("argument 1")
options.add_argument("argument 2")
options.add_experimental_option("experimental argument")

## Add options object to webdriver instance
driver = webdriver.Chrome(options=options)
"""
# TODO: Create options object
# TODO: Add argument to set the window size to max
# TODO: Create a browser instance with options

In [None]:
""" 
Objective: Run a headless browser
"""
# TODO: Create options object
# TODO: Add argument "--headless" to the options
# TODO: Create a browser instance with options
# TODO: Open bookstoscrape_url using get method
# TODO: print the page title using driver.title
# TODO: Close the browser instance using quit method


In [None]:
""" 
Objective: Opening a page without loading the image
"""
# TODO: Create options object
# TODO: Add argument "--blink-settings=imagesEnabled=false" to the options
# TODO: Create a browser instance with options
# TODO: Open bookstoscrape_url using get method

In [None]:
""" 
Objective: Explore any other options available
options.add_argument("--disable-gpu")  # Disable GPU rendering
options.add_argument("--no-sandbox")  # Disable sandbox for Docker
options.add_argument("--disable-dev-shm-usage")  # Prevent shared memory issues
options.add_argument("--disable-extensions")  # Disable extensions
options.add_argument("--disable-infobars")  # Remove info bars
"""
# TODO: Explore the official documentation from https://www.selenium.dev/documentation/webdriver/browsers/chrome/

In [None]:
""" 
Objective: Understand what information from a webdriver instance we can get
"""
# TODO: Create a new browser instance with any options you like
# TODO: Open any website from your own preference
# TODO: Print the current page title by using driver.title
# TODO: Print the current url by using driver.current_url
# TODO: Get the HTML content by using driver.page_source
# TODO: Compare the HTML content from selenium with HTML content from requests then provide your insight

### **Reflection**
Which is faster for retrieving HTML content: sending an HTTP request directly using the requests library or creating a browser instance with Selenium?

(answer here)

### **Exploration**
Explore what can be done manually in a real browser and what can be achieved using Selenium.