# Browser Automation Homework
Due 7-25<br>
Completed by: **MARCO DALLA STELLA**

We're going to visit the real estate site Zillow.com and search "For sale" listings in a town of your choosing.

We'll collect the listings in the first 5 pages (or all if you like), and get a feel for the price range in that town.

Ultimately I want to know the median price of that town.

Note: if you get asked if you're a bot, just complete the challenges manually.

In [52]:
import os
import random
import time

from seleniumwire import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys

import chromedriver_binary

In [53]:
os.makedirs('data/', exist_ok=True)

### 1) Open the browser, hide automation signs, visit Zillow.com

In [54]:
def open_browser():
    """
    Opens a new automated browser window with all tell-tales of automated browser disabled
    """
    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    
    ## NOTE WHAT DO YOU DO TO HIDE BROWSER?
    # remove all signs of this being an automated browser
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)

    # open the browser with the new options
    driver = webdriver.Chrome(options=options)
    return driver

In [55]:
driver = open_browser()

In [56]:
# visit the page
url = 'https://zillow.com'
driver.get(url)

### 2) Find Search Box

Use selenium's `find_element` [function](https://selenium-python.readthedocs.io/locating-elements.html?highlight=find_element#locating-elements) to find the search box on the Zillow site.
You can use whichever `By` [option](https://selenium-python.readthedocs.io/api.html?highlight=By#locate-elements-by) you feel most comfortable.

In [57]:
search_box = driver.find_element(
    By.XPATH,
    '//*[@id="search-box-input"]'
)
search_box

<selenium.webdriver.remote.webelement.WebElement (session="27f0617a13269c97c06e033aa75598ef", element="B4B4E6FADCAAB8A25EFB027ABFE27625_element_19")>

### 3) Input a geography into search bar

After you've found `search_box` find a way to input or send `search_term` into the input field.

Feel free to change `search_term` to where ever you like.

In [58]:
search_term = 'Worcester, MA'
search_box.send_keys(search_term)

### 4) Make the search

Originally, I thought we could get away with just pressing "ENTER". If you try that you'll see that listings are not from the geography you're searching.

Instead, you'll see a list of suggestions. Click the first suggestion.

You can do that by first finding that suggestion, then [clicking](https://saucelabs.com/resources/blog/the-selenium-click-command) on it.

In [59]:
first_option = driver.find_element(
    By.XPATH,
    './/*[@id="react-autowhatever-1--item-0"]/div/div'
)
first_option

<selenium.webdriver.remote.webelement.WebElement (session="27f0617a13269c97c06e033aa75598ef", element="B4B4E6FADCAAB8A25EFB027ABFE27625_element_21")>

In [60]:
# how do you click `first_option` and how to make sure you don't throw an exception?
first_option.click()

### 5) Pick "For sale," if asked
You might be prompted to check for rentals or sales. This doesn't always show up, but be prepared to click "For sale" if you need to.

In [69]:
for_sale = driver.find_element(
    By.XPATH, '//button[normalize-space()="For sale"]'
)

In [70]:
for_sale.click() #TK what function to click? Same as 4.

### 6) What are the prices of the houses on the first page?
Print the text of each listing's property price below:

In [71]:
for card in driver.find_elements(
    By.XPATH,
    './/*[contains(@id, zpid)]/div/div[1]/div[2]/div/span'):
    print(card.text)

$585,000
$549,000
$530,000
$474,000
$829,000
$735,000
$480,000
$445,000
$539,000
$875,000
$465,000
$699,000
$1,350,000
$1,988,000
$2,950,000
$1,980,000
$2,899,000
$1,695,000
$2,499,000
$1,980,000
$3,500,000
$2,100,000
$1,750,000
$3,900,000
$2,900,000
$3,699,000
$1,300,000
$1,950,000
$2,200,000
$5,500,000
$1,980,000
$2,950,000
$2,790,000
$2,590,000
$1,880,000
$1,580,000
$2,100,000
$2,499,998
$3,350,000
$6,680,000
$5,500,000


Note: you _should_ see more than nine listings.

You'll need to find a way to scroll down the page to load each new card. From my tests, each page holds up to 40.

This is not a simple task! I found one way to do this below, can you find a better way to do this?

In [72]:
N = 0
while True:
    # get all the listings, and scroll to the last one, then wait two seconds.
    cards = driver.find_elements(By.XPATH, './/span[@data-test="property-card-price"]')
    last_listing = cards[-1]
    
    # you can use selenium to issue JavaScript commands:
    driver.execute_script("arguments[0].scrollIntoView();", last_listing)
    N_cards = len(cards)
    if N_cards == N:
        break
    N = N_cards
    time.sleep(2)

In [73]:
# how many postings do we have after loading them all?
len(cards)

41

Is there a better way to do this? Feel free to experiment, but it's not necessary for the assignment.

### 7) Save the results as HTML
Save the page source to `html_out` as an HTML file

In [74]:
html_out = 'data/zillow_selenium_test.html'

In [75]:
source = driver.page_source
with open('data/zillow_selenium_test.html', 'w') as f:
    f.write(source)

### 8) Go to the next page
After collecting the first page, go to the next one by clicking the "Next page" button.

In [21]:
# current_page = 2
# next_page = current_page+1

# next_page = driver.find_element(
#     By.XPATH,
#     f'.//*[@id="grid-search-results"]/div[2]/nav/ul/li[{next_page}]/a'
# )
# next_page

<selenium.webdriver.remote.webelement.WebElement (session="2ecb00fa6c1a5953e62181a34d3888c4", element="C6768DA281BEEAB8C016249820EADABD_element_652")>

In [22]:
# next_page.click()

In [76]:
n_pages = driver.find_elements(
    By.XPATH,
    '//*[contains(@id,"grid-search-results")]/div[2]/nav/ul/li'
)

n_pages = int(len(n_pages)/2)+1
n_pages

1

In [77]:
pages = range(2,n_pages)

for page in pages:
    print(f"Fetching page {page}")
          
    for card in driver.find_elements(
    By.XPATH,
    './/*[contains(@id, zpid)]/div/div[1]/div[2]/div/span'):
        print(card.text)
     
    next_page = driver.find_element(
    By.XPATH,
    f'.//*[@id="grid-search-results"]/div[2]/nav/ul/li[{page+1}]/a'
    )
    print(next_page)
    time.sleep(2)
    next_page.click()
    time.sleep(2)

pages

range(2, 1)

### 9) Cycle through each page of results
Above we outlined each step, now put it all together here and collect as many results as you can. Add some `time.sleep(2)` (or some other reasonable time) between each step.

You can stop after the 5th page to save time.

Note: you can parse price from the listings directly from Selenium here, or save each page as HTML and parse them after you collect time. I recommend the latter, but for the sake of the homework feel free to take the shortcut.

In [32]:
# first close the browser to start anew
driver.close()

In [27]:
def get_results_on_page(driver, fn_out):
    """
    Scrolls to load all listings and then saves them to `fn_out`.
    If you found a better approach, replace this function
    """
    N = 0
    while True:
        # get all the listings, and scroll to the last one, then wait two seconds.
        cards = driver.find_elements(By.XPATH, './/span[@data-test="property-card-price"]')
        last_listing = cards[-1]

        # you can use selenium to issue JavaScript commands:
        driver.execute_script("arguments[0].scrollIntoView();", last_listing)
        N_cards = len(cards)
        if N_cards == N:
            break
        N = N_cards
        time.sleep(2)
        
    # how to save what the emulator sees
    with open(fn_out, 'w') as f:
        f.write(driver.page_source)

In [38]:
search_term = 'Beacon, NY' # this can be anywhere

# open the browser and visit the `url`.
driver = open_browser()
url = 'https://zillow.com'
driver.get(url)
time.sleep(20)


# find the search box
search_box = driver.find_element(
    By.XPATH,
    '//*[@id="search-box-input"]'
)
search_box
search_box.send_keys(search_term)
search_box.send_keys(search_term)
time.sleep(2)

# select the first suggestion
first_option = driver.find_element(
    By.XPATH,
    './/*[@id="react-autowhatever-1--item-0"]/div/div'
)
first_option.click()
time.sleep(2)

# select only for sale listings...
for_sale = driver.find_element(
    By.XPATH, '//button[normalize-space()="For sale"]'
)
for_sale.click()
time.sleep(2)

# save each page of results
page_n = 0
next_page = page_n+1
time.sleep(2)

while len(next_page) > 0:
    fn_out = f'data/zillow_page_{page_n}.html'
    get_results_on_page(driver, fn_out)
    page_n += 1
    time.sleep(2)

    # stop after 10
    if page_n == 10:
        break
        
    # see if there are more pages of results
    next_page = driver.find_element(
        By.XPATH,
        f'.//*[contains(@id,"grid-search-results")'
    )
    if next_page:
        try:
            next_page[0].click()
        except Exception as e:
            print(e)
    time.sleep(2)

TypeError: object of type 'int' has no len()

### 10) Parse the prices

Parse the prices into a list or a Pandas Series, and list the median price.

In [2]:
# TK

## Extra credit
- What is the median price per square foot?
- Which realtor has the most listings?
- Can you stain listings over $1M in red and take a full-screenshot?