## What's the point of Selenium

Similar to what the lecture notes touch on, the internet is full of data
but unfortunately for us (as data scientists), this data is surrounded by lots of 
html to be more visually appealing for numerous reasons. But subsequently 
a lot of the data is embedded within html that is structured the same. Making it a 
repetitive task, the type of things computers are best at! Hence the reason why
web scraping can be a extremely helpful tool, to get data on just about anything
on the internet.  

Selenium allows us to navigate through websites entirely using code. 
This means that we are able to automate and scrape from sites without 
having to do it all ourselves. **It is important to follow restrictions 
when doing so, but this can be quite powerful.**

### Part 1

In this lab, we will practice some of the basic skills needed to start working with Selenium and collecting data from the web.

First, we need an example website to start working on this lab. Please run:

```
python3 lab6site.py
```

After the Flask app has launched, you can view the hosted site at `http://<VM IP ADDRESS>:5000`.  
From this, you’ll see a very simple website.  
Actually, in the backend, the website can detect whether the visitor is a person or a web scraping client!

If you click on the button, an alert will pop up… click on it.

The first task is to press the button using a Selenium WebDriver and observe the output.  
Here is some starter code:

I highly recommend poking around in the HTML via inspect element/dev tools—there are other ways to inspect the HTML, but I find this one the simplest.

In [10]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# — set up headless Chrome
options = webdriver.ChromeOptions()
options.add_argument('--headless')
driver = webdriver.Chrome(options=options)

try:
    # 1. load page
    driver.get("http://34.67.179.64:5000")

    # 2. locate & click the button
    #    adjust the locator to match the HTML (e.g. ID, class, tag)
    button = driver.find_element(By.ID, "BUTTON")
    button.click()

    # 3. wait up to 10s for the alert to be present
    WebDriverWait(driver, 10).until(EC.alert_is_present())

    # 4. switch to it, print its text, then accept
    alert = driver.switch_to.alert
    print("Alert text:", alert.text)
    alert.accept()

finally:
    driver.quit()



Alert text: You are using Selenium! Welcome, webscraper!


Hint: You will need to find a webelement that you can call .click() with

### Part 2

For the second part of this lab, we will be working on sending a string to an input box.

You might be thinking... *“I didn’t see an input box on that site.”*  
Well, look again! You're right—there isn't one that's **visibly** displayed,  
but if you inspect the HTML, you’ll find one hidden (what a sneaky TA).

There is also a hidden code in the HTML—find it.

Then, use Selenium to:
1. Enter the string into the hidden input box,
2. Press the button,
3. Output the alert box.

You can reuse the starter code from above to begin this part as well.


In [11]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# ── Set up headless Chrome ─────────────────────────────────────────────────────
options = webdriver.ChromeOptions()
options.add_argument('--headless')
driver = webdriver.Chrome(options=options)

try:
    driver.get("http://34.67.179.64:5000")

    inp = driver.find_element(By.ID, "secret-input")

    # 4. Type the secret into that hidden input and click the “Submit Hidden Answer” button
    inp.send_keys(secret)
    nn=driver.findElement(By.cssSelector("button[onclick='verifyInput()']"));

    # 5. Wait for the alert, print its text, then close it
    alert = WebDriverWait(driver, 5).until(EC.alert_is_present())
    print("Alert text:", alert.text)
    alert.accept()

finally:
    driver.quit()


None


UnexpectedAlertPresentException: Alert Text: Wrong input. Try again!
Message: unexpected alert open: {Alert text : Wrong input. Try again!}
  (Session info: chrome=138.0.7204.49)
Stacktrace:
#0 0x585bfa5b01da <unknown>
#1 0x585bfa05aab0 <unknown>
#2 0x585bfa0f8a00 <unknown>
#3 0x585bfa0d1f73 <unknown>
#4 0x585bfa09eaeb <unknown>
#5 0x585bfa09f751 <unknown>
#6 0x585bfa574aeb <unknown>
#7 0x585bfa5788c9 <unknown>
#8 0x585bfa55b8c9 <unknown>
#9 0x585bfa579488 <unknown>
#10 0x585bfa54007f <unknown>
#11 0x585bfa59d888 <unknown>
#12 0x585bfa59da66 <unknown>
#13 0x585bfa5af4f6 <unknown>
#14 0x7d3273894ac3 <unknown>


### Part 3

In this part, I want to give you the freedom to explore the power of Selenium and web scraping.

Go to **any website** you’re interested in—Amazon, eBay, ESPN, a movie theater, etc.—and try to extract a **meaningful piece of data**.  
By "meaningful," I mean something beyond just grabbing the raw HTML—extract something a user would actually care about.

Use **Inspect Element** to examine the HTML structure and locate the data you're interested in.  
Then, write a short script to extract and print that data using Selenium.

#### Example ideas:
- The price of an Amazon or eBay listing  
- The score of a recent sports game  
- The list of movies currently showing at a theater  

Feel free to get creative here—just make sure to comment your code and briefly explain what your scraper is doing.


In [6]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# ── 1) Chrome‐options for stealth ───────────────────────────────────────────────
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--window-size=1920,1080')
# Spoof a normal browser User‑Agent:
options.add_argument(
    "user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/114.0.0.0 Safari/537.36"
)
# Disable the automation flag
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)

driver = webdriver.Chrome(options=options)

# Remove webdriver from the navigator object
driver.execute_cdp_cmd(
    'Page.addScriptToEvaluateOnNewDocument',
    {
        'source': '''
            Object.defineProperty(navigator, 'webdriver', {
              get: () => undefined
            })
        '''
    }
)

try:
    # ── 2) Load your product page ────────────────────────────────────────────────
    URL = "https://www.amazon.com/dp/B08FC5L3RG"  # change to whatever you like
    driver.get(URL)

    # ── 3) Grab the product title ────────────────────────────────────────────────
    title_el = WebDriverWait(driver, 15).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, "#productTitle"))
    )
    product_title = title_el.text.strip()

    # ── 4) Grab the price (new CSS selector works in most layouts) ───────────────
    price_el = WebDriverWait(driver, 15).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, ".a-price .a-offscreen"))
    )
    product_price = price_el.text.strip()

    # ── 5) Print it out ──────────────────────────────────────────────────────────
    print("Product:", product_title)
    print("Price:  ", product_price)

finally:
    driver.quit()


TimeoutException: Message: 
Stacktrace:
#0 0x5b2b0087e1da <unknown>
#1 0x5b2b00328ab0 <unknown>
#2 0x5b2b0037a6f0 <unknown>
#3 0x5b2b0037a8e1 <unknown>
#4 0x5b2b003c8b94 <unknown>
#5 0x5b2b003a01cd <unknown>
#6 0x5b2b003c5fee <unknown>
#7 0x5b2b0039ff73 <unknown>
#8 0x5b2b0036caeb <unknown>
#9 0x5b2b0036d751 <unknown>
#10 0x5b2b00842aeb <unknown>
#11 0x5b2b008468c9 <unknown>
#12 0x5b2b008298c9 <unknown>
#13 0x5b2b00847488 <unknown>
#14 0x5b2b0080e07f <unknown>
#15 0x5b2b0086b888 <unknown>
#16 0x5b2b0086ba66 <unknown>
#17 0x5b2b0087d4f6 <unknown>
#18 0x713b4e294ac3 <unknown>
