# Browser Automation Homework
Due 7-25<br>
Completed by: **TK YOUR NAME**

We're going to visit the real estate site Zillow.com and search "For sale" listings in a town of your choosing.

We'll collect the listings in the first 5 pages (or all if you like), and get a feel for the price range in that town.

Ultimately I want to know the median price of that town.

Note: if you get asked if you're a bot, just complete the challenges manually.

In [97]:
import os
import random
import time

from seleniumwire import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys

import chromedriver_binary

In [98]:
os.makedirs('data/', exist_ok=True)

### 1) Open the browser, hide automation signs, visit Zillow.com

In [99]:
def open_browser():
    """
    Opens a new automated browser window with all tell-tales of automated browser disabled
    """
    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    
    ## NOTE WHAT DO YOU DO TO HIDE BROWSER?
    # remove all signs of this being an automated browser
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)

    # open the browser with the new options
    driver = webdriver.Chrome(options=options)
    return driver

In [100]:
driver = open_browser()

In [101]:
# visit the page
url = 'https://zillow.com'
driver.get(url)

### 2) Find Search Box

Use selenium's `find_element` [function](https://selenium-python.readthedocs.io/locating-elements.html?highlight=find_element#locating-elements) to find the search box on the Zillow site.
You can use whichever `By` [option](https://selenium-python.readthedocs.io/api.html?highlight=By#locate-elements-by) you feel most comfortable.

In [105]:
search_box = driver.find_element(
    By.XPATH,
    './/input[@aria-label="Search: Suggestions appear below"]'
)
search_box

<selenium.webdriver.remote.webelement.WebElement (session="aa183aa84a9bb06d355c700fbcc1db76", element="1F9E9BA4F50DA370EBA048DE7C19F8B6_element_30")>

### 3) Input a geography into search bar

After you've found `search_box` find a way to input or send `search_term` into the input field.

Feel free to change `search_term` to where ever you like.

In [107]:
# run this twice to remove the current location option
search_box.clear()
search_term = 'Berkeley, CA'
search_box.send_keys(search_term)

### 4) Make the search

Originally, I thought we could get away with just pressing "ENTER". If you try that you'll see that listings are not from the geography you're searching.

Instead, you'll see a list of suggestions. Click the first suggestion.

You can do that by first finding that suggestion, then [clicking](https://saucelabs.com/resources/blog/the-selenium-click-command) on it.

In [108]:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

In [109]:
first_dropdown_element = driver.find_element(
    By.XPATH,
    '//*[@id="react-autowhatever-1--item-0"]'
)
first_dropdown_element

<selenium.webdriver.remote.webelement.WebElement (session="aa183aa84a9bb06d355c700fbcc1db76", element="1F9E9BA4F50DA370EBA048DE7C19F8B6_element_33")>

In [110]:
first_dropdown_element.click()

In [111]:
# elements = WebDriverWait(driver, 20).until(
#     EC.presence_of_all_elements_located(
#         (By.CSS_SELECTOR,
#          ".react-autosuggest__suggestion"
#         )
#     ) 
# )
# for el in elements:
#     if "current location" in el.text.lower():
#         continue
#     else:
#         el.click()
#         break

### 5) Pick "For sale," if asked
You might be prompted to check for rentals or sales. This doesn't always show up, but be prepared to click "For sale" if you need to.

In [112]:
for_sale = driver.find_element(
    By.XPATH, 
    '/html/body/div[10]/div/div[1]/div/div/div/ul/li[1]')

In [113]:
for_sale.click()

### 6) What are the prices of the houses on the first page?
Print the text of each listing's property price below:

In [114]:
cards_counter = 0
for card in driver.find_elements(
    By.XPATH,
    '//*[@id="grid-search-results"]/ul/li'):
    cards_counter += 1
    print(card.text)

969 Cragmont Ave, Berkeley, CA 94708
$1,595,000
4 bds2 ba1,835 sqft - House for sale
Open: Sun. 2-4pm
Image 1 of 48
Save this home
47 Canyon Rd, Berkeley, CA 94704
$1,095,000
3 bds2 ba1,493 sqft - House for sale
30 days on Zillow
Image 1 of 30
Save this home
Loading...
1445 Kains Ave, Berkeley, CA 94702
$950,000
3 bds2 ba1,140 sqft - Condo for sale
Open: Sun. 2-4:30pm
Image 1 of 57
Save this home
912 Spruce St, Berkeley, CA 94707
$995,000
3 bds2 ba1,588 sqft - House for sale
Open: Sun. 2-4:30pm
Image 1 of 58
Save this home
2339 Curtis St, Berkeley, CA 94702
$1,049,000
2 bds2 ba1,107 sqft - House for sale
Open: Fri. 5-7pm
Image 1 of 51
Save this home
1614 6th St, Berkeley, CA 94710
$975,000
3 bds1 ba1,475 sqft - House for sale
Open: Sun. 2-4pm
Image 1 of 24
Save this home
2356 Marin Ave, Berkeley, CA 94708
$1,800,000
4 bds3 ba2,140 sqft - House for sale
Open: Sat. 2-4pm
Image 1 of 45
Save this home
1228 Hopkins St, Berkeley, CA 94702
$720,000
3 bds2 ba1,061 sqft - House for sale
Open: S

Note: you _should_ see more than nine listings.

You'll need to find a way to scroll down the page to load each new card. From my tests, each page holds up to 40.

This is not a simple task! I found one way to do this below, can you find a better way to do this?

In [115]:
# N = 0
# while True:
#     # get all the listings, and scroll to the last one, then wait two seconds.
#     cards = driver.find_elements(By.XPATH, './/span[@data-test="property-card-price"]')
#     last_listing = cards[-1]
    
#     # you can use selenium to issue JavaScript commands:
#     driver.execute_script("arguments[0].scrollIntoView();", last_listing)
#     N_cards = len(cards)
#     if N_cards == N:
#         break
#     N = N_cards
#     time.sleep(2)

In [116]:
# how many postings do we have after loading them all?
cards_counter

42

Is there a better way to do this? Feel free to experiment, but it's not necessary for the assignment.

### 7) Save the results as HTML
Save the page source to `html_out` as an HTML file

In [117]:
html_out = 'data/zillow_selenium_test.html'

In [118]:
# TK save the page to `html_out`
source = driver.page_source
with open(html_out, 'w') as f:
    f.write(source)

### 8) Go to the next page
After collecting the first page, go to the next one by clicking the "Next page" button.

In [119]:
next_page = driver.find_element(
    By.XPATH,
    '//*[@id="grid-search-results"]/div[2]/nav/ul/li[5]'
)

In [None]:
next_page.click()
driver.page_source

In [121]:
driver.page_source

### 9) Cycle through each page of results
Above we outlined each step, now put it all together here and collect as many results as you can. Add some `time.sleep(2)` (or some other reasonable time) between each step.

You can stop after the 5th page to save time.

Note: you can parse price from the listings directly from Selenium here, or save each page as HTML and parse them after you collect time. I recommend the latter, but for the sake of the homework feel free to take the shortcut.

In [122]:
def get_results_on_page(driver, fn_out):
    """
    Scrolls to load all listings and then saves them to `fn_out`.
    If you found a better approach, replace this function
    """
    N = 0
    while True:
        # get all the listings, and scroll to the last one, then wait two seconds.
        cards = driver.find_elements(By.XPATH, './/span[@data-test="property-card-price"]')
        last_listing = cards[-1]

        # you can use selenium to issue JavaScript commands:
        driver.execute_script("arguments[0].scrollIntoView();", last_listing)
        N_cards = len(cards)
        if N_cards == N:
            break
        N = N_cards
        time.sleep(2)
        
    # how to save what the emulator sees
    with open(fn_out, 'w') as f:
        page_source = driver.execute_script("return document.documentElement.outerHTML")
        f.write(page_source)

In [123]:
# first close the browser to start anew
driver.close()

In [124]:
search_term = 'Beacon, NY' # this can be anywhere
os.makedirs('data/', exist_ok=True)

# open the browser and visit the `url`.
driver = open_browser()
url = 'https://zillow.com'
driver.get(url)

In [125]:
# find the search box
search_box = driver.find_element(
    By.XPATH,
    './/input[@aria-label="Search: Suggestions appear below"]'
)

# select the first suggestion
# run this twice to remove the current location option
search_box.clear()
search_box.send_keys(search_term)

In [126]:
search_box.clear()
search_box.send_keys(search_term)

In [127]:
first_dropdown_element = driver.find_element(
    By.XPATH,
    '//*[@id="react-autowhatever-1--item-0"]'
)
first_dropdown_element.click()

In [128]:
# select only for sale listings...
for_sale = driver.find_element(
    By.XPATH, 
    '/html/body/div[10]/div/div[1]/div/div/div/ul/li[1]')
for_sale.click()

In [129]:
##### save each page of results
page_n = 0

# save first page of results
fn_out = f'data/zillow_page_{page_n}.html'    
get_results_on_page(driver, fn_out)

In [130]:
# start watching a specific element for stale-state change
e = driver.find_element(By.ID, "grid-search-results")
print(e)

# find element to click through to next page
next_page = driver.find_elements(
    By.XPATH,
    '//*[@id="grid-search-results"]/div[2]/nav/ul/li[5]/a[contains(@aria-disabled, "false")]'
)
next_page[0].click()
print(len(next_page))

<selenium.webdriver.remote.webelement.WebElement (session="b7e7306e2c261119485893eed194ef18", element="87478D58AE0CA6F4FCAF443EA5C53908_element_93")>
1


In [131]:
# check for staleness of starting page element
EC.staleness_of(e)

<function selenium.webdriver.support.expected_conditions.staleness_of.<locals>._predicate(_)>

In [132]:
try:
    # wait until the watched element detaches
    WebDriverWait(driver, 20).until(EC.staleness_of(e));
finally:
    # then write the next page's content
    fn_out = f'data/zillow_page_testing_2.html'

    with open(fn_out, 'w') as f:
        print(driver.current_url)
        page_source = driver.execute_script("return document.documentElement.outerHTML")
        f.write(driver.page_source)

https://www.zillow.com/beacon-ny/2_p/?searchQueryState=%7B%22pagination%22%3A%7B%22currentPage%22%3A2%7D%2C%22usersSearchTerm%22%3A%22Beacon%2C%20NY%22%2C%22mapBounds%22%3A%7B%22west%22%3A-74.000115%2C%22east%22%3A-73.897991%2C%22south%22%3A41.439007%2C%22north%22%3A41.534705%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A28337%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Afalse%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%7D


TimeoutException: Message: 


In [134]:
######## IGNORE BELOW FOR NOW PLEASE

In [96]:
with open('./data/zillow_page_testing_6.html', 'w') as f:
    driver.page_source
    page_source = driver.execute_script("return document.documentElement.outerHTML")
    f.write(page_source)

In [45]:
driver.page_source



In [164]:
# fn_out = f'data/zillow_page_testing.html'
# print(fn_out)
# get_results_on_page(driver, fn_out)

data/zillow_page_testing.html


In [123]:
while len(next_page) > 0:
    page_n += 1
    print(page_n)
    next_page[0].click()
    fn_out = f'data/zillow_page_{page_n}.html'
    print(fn_out)
    get_results_on_page(driver, fn_out)

    # stop after 10
    if page_n == 10:
        break
        
    # see if there are more pages of results
    next_page = driver.find_elements(
        By.XPATH,
        '//*[@id="grid-search-results"]/div[2]/nav/ul/li[5]/a[contains(@aria-disabled, "false")]'
    )
    print(len(next_page))

    if next_page:
        try:
            next_page[0].click()
        except Exception as e:
            print(e)
    time.sleep(2)

0


### 10) Parse the prices

Parse the prices into a list or a Pandas Series, and list the median price.

In [2]:
# TK

## Extra credit
- What is the median price per square foot?
- Which realtor has the most listings?
- Can you stain listings over $1M in red and take a full-screenshot?