# Browser Automation Homework
Due 7-25<br>
Completed by: Stephanie Andrews

We're going to visit the real estate site Zillow.com and search "For sale" listings in a town of your choosing.

We'll collect the listings in the first 5 pages (or all if you like), and get a feel for the price range in that town.

Ultimately I want to know the median price of that town.

Note: if you get asked if you're a bot, just complete the challenges manually.

In [76]:
import os
import random
import time

from seleniumwire import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys

import chromedriver_binary

In [77]:
os.makedirs('data/', exist_ok=True)

### 1) Open the browser, hide automation signs, visit Zillow.com

In [78]:
def open_browser():
    """
    Opens a new automated browser window with all tell-tales of automated browser disabled
    """
    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    
    ## NOTE WHAT DO YOU DO TO HIDE BROWSER?
    # remove all signs of this being an automated browser
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)

    # open the browser with the new options
    driver = webdriver.Chrome(options=options)
    return driver

In [79]:
driver = open_browser()

In [80]:
# visit the page
url = 'https://zillow.com'
driver.get(url)

### 2) Find Search Box

Use selenium's `find_element` [function](https://selenium-python.readthedocs.io/locating-elements.html?highlight=find_element#locating-elements) to find the search box on the Zillow site.
You can use whichever `By` [option](https://selenium-python.readthedocs.io/api.html?highlight=By#locate-elements-by) you feel most comfortable.

In [81]:
search_box = driver.find_element(
    By.XPATH,
    './/input[@aria-label="Search: Suggestions appear below"]'
)
search_box

<selenium.webdriver.remote.webelement.WebElement (session="93e0d8070ee5f2a97d61848ecea9ba73", element="F9D0A8FA28D47A8B070012697E10CFE7_element_18")>

### 3) Input a geography into search bar

After you've found `search_box` find a way to input or send `search_term` into the input field.

Feel free to change `search_term` to where ever you like.

In [83]:
# run this twice to remove the current location option
search_box.clear()
search_term = 'Berkeley, CA'
search_box.send_keys(search_term)

### 4) Make the search

Originally, I thought we could get away with just pressing "ENTER". If you try that you'll see that listings are not from the geography you're searching.

Instead, you'll see a list of suggestions. Click the first suggestion.

You can do that by first finding that suggestion, then [clicking](https://saucelabs.com/resources/blog/the-selenium-click-command) on it.

In [84]:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

In [85]:
first_dropdown_element = driver.find_element(
    By.XPATH,
    '//*[@id="react-autowhatever-1--item-0"]'
)
first_dropdown_element

<selenium.webdriver.remote.webelement.WebElement (session="93e0d8070ee5f2a97d61848ecea9ba73", element="F9D0A8FA28D47A8B070012697E10CFE7_element_21")>

In [86]:
first_dropdown_element.click()

In [87]:
# elements = WebDriverWait(driver, 20).until(
#     EC.presence_of_all_elements_located(
#         (By.CSS_SELECTOR,
#          ".react-autosuggest__suggestion"
#         )
#     ) 
# )
# for el in elements:
#     if "current location" in el.text.lower():
#         continue
#     else:
#         el.click()
#         break

### 5) Pick "For sale," if asked
You might be prompted to check for rentals or sales. This doesn't always show up, but be prepared to click "For sale" if you need to.

In [88]:
for_sale = driver.find_element(
    By.XPATH, 
    '/html/body/div[10]/div/div[1]/div/div/div/ul/li[1]')

In [89]:
for_sale.click()

### 6) What are the prices of the houses on the first page?
Print the text of each listing's property price below:

In [90]:
cards_counter = 0
for card in driver.find_elements(
    By.XPATH,
    '//*[@id="grid-search-results"]/ul/li'):
    cards_counter += 1
    print(card.text)

1012 Addison St, Berkeley, CA 94710
$749,000
2 bds1 ba863 sqft - House for sale
3D Tour
Image 1 of 36
Save this home
969 Cragmont Ave, Berkeley, CA 94708
$1,595,000
4 bds2 ba1,835 sqft - House for sale
10 days on Zillow
Image 1 of 48
Save this home
Loading...
1009 Keith Ave, Berkeley, CA 94708
EASTWEST REALTY
$1,699,900
3 bds2 ba2,066 sqft - House for sale
23 hours ago
Image 1 of 36
Save this home
2339 Curtis St, Berkeley, CA 94702
$1,049,000
2 bds2 ba1,107 sqft - House for sale
Open: Mon. 5-7pm
Image 1 of 51
Save this home
1162 Arch St, Berkeley, CA 94708
$1,295,000
3 bds2 ba1,630 sqft - House for sale
3D Tour
Image 1 of 56
Save this home
912 Spruce St, Berkeley, CA 94707
$995,000
3 bds2 ba1,588 sqft - House for sale
12 days on Zillow
Image 1 of 58
Save this home
801 Contra Costa Ave, Berkeley, CA 94707
$1,295,000
3 bds2 ba1,393 sqft - House for sale
18 days on Zillow
Image 1 of 60
Save this home
1117 Channing Way, Berkeley, CA 94702
REAL ESTATE EBROKER, Marquetta Willingham-Broussard

Note: you _should_ see more than nine listings.

You'll need to find a way to scroll down the page to load each new card. From my tests, each page holds up to 40.

This is not a simple task! I found one way to do this below, can you find a better way to do this?

In [91]:
# N = 0
# while True:
#     # get all the listings, and scroll to the last one, then wait two seconds.
#     cards = driver.find_elements(By.XPATH, './/span[@data-test="property-card-price"]')
#     last_listing = cards[-1]
    
#     # you can use selenium to issue JavaScript commands:
#     N_cards = len(cards)
#     if N_cards == N:
#         break
#     N = N_cards
#     time.sleep(2)

In [92]:
# how many postings do we have after loading them all?
cards_counter

42

Is there a better way to do this? Feel free to experiment, but it's not necessary for the assignment.

### 7) Save the results as HTML
Save the page source to `html_out` as an HTML file

In [93]:
html_out = 'data/zillow_page_1.html'

In [94]:
# TK save the page to `html_out`
source = driver.page_source
with open(html_out, 'w') as f:
    f.write(source)

### 8) Go to the next page
After collecting the first page, go to the next one by clicking the "Next page" button.

In [95]:
next_page = driver.find_element(
    By.XPATH,
    '//*[@id="grid-search-results"]/div[2]/nav/ul/li[5]'
)

In [96]:
next_page.click()
driver.page_source



In [97]:
driver.page_source



### 9) Cycle through each page of results
Above we outlined each step, now put it all together here and collect as many results as you can. Add some `time.sleep(2)` (or some other reasonable time) between each step.

You can stop after the 5th page to save time.

Note: you can parse price from the listings directly from Selenium here, or save each page as HTML and parse them after you collect time. I recommend the latter, but for the sake of the homework feel free to take the shortcut.

In [98]:
# first close the browser to start anew
driver.close()

In [100]:
# search_term = 'Beacon, NY' # this can be anywhere
search_term = "Los Angeles, CA"
os.makedirs('data/', exist_ok=True)

# open the browser and visit the `url`.
driver = open_browser()
url = 'https://zillow.com'
driver.get(url)

In [101]:
# find the search box
search_box = driver.find_element(
    By.XPATH,
    './/input[@aria-label="Search: Suggestions appear below"]'
)

# select the first suggestion
# run this twice to remove the current location option
search_box.clear()
search_box.send_keys(search_term)

In [105]:
search_box.clear()
search_box.send_keys(search_term)

In [106]:
first_dropdown_element = driver.find_element(
    By.XPATH,
    '//*[@id="react-autowhatever-1--item-0"]'
)
first_dropdown_element.click()

In [107]:
# select only for sale listings...
for_sale = driver.find_element(
    By.XPATH, 
    '/html/body/div[10]/div/div[1]/div/div/div/ul/li[1]')
for_sale.click()

In [108]:
def get_results_on_page(driver, fn_out):
    """
    Scrolls to load all listings and then saves them to `fn_out`.
    If you found a better approach, replace this function
    """
    N = 0
    while True:
        # get all the listings, and scroll to the last one, then wait two seconds.
        cards = driver.find_elements(By.XPATH, './/span[@data-test="property-card-price"]')
        last_listing = cards[-1]

        # you can use selenium to issue JavaScript commands:
        driver.execute_script("arguments[0].scrollIntoView();", last_listing)
        N_cards = len(cards)
        if N_cards == N:
            break
        N = N_cards
        time.sleep(2)
        
    # how to save what the emulator sees
    with open(fn_out, 'w') as f:
        f.write(driver.page_source)

In [109]:
# save each page of results
page_n = 0
next_page = driver.find_elements(
    By.XPATH,
#     '//*[@id="grid-search-results"]/div[2]/nav/ul/li[5]'
    '//*[@id="grid-search-results"]/div[2]/nav/ul/li[5]/a[contains(@aria-disabled, "false")]'

)
while next_page:
    print(driver.current_url)
    fn_out = f'data/zillow_page_{page_n}.html'
    get_results_on_page(driver, fn_out)
    page_n += 1

    # stop after 10
    if page_n == 3:
        break
        
    # see if there are more pages of results
    next_page = driver.find_elements(
        By.XPATH,
#         '//*[@id="grid-search-results"]/div[2]/nav/ul/li[5]'
    '//*[@id="grid-search-results"]/div[2]/nav/ul/li[5]/a[contains(@aria-disabled, "false")]'
    )
    if next_page:
        try:
            next_page[0].click()
        except Exception as e:
            print(e)
    time.sleep(2)

https://www.zillow.com/homes/for_sale/Los-Angeles,-CA_rb/
https://www.zillow.com/los-angeles-ca/4_p/?searchQueryState=%7B%22pagination%22%3A%7B%22currentPage%22%3A4%7D%2C%22usersSearchTerm%22%3A%22Los%20Angeles%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-118.668176%2C%22east%22%3A-118.155289%2C%22south%22%3A33.703652%2C%22north%22%3A34.337306%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A12447%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Afalse%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%7D


### 10) Parse the prices

Parse the prices into a list or a Pandas Series, and list the median price.

In [110]:
import lxml.html
import pandas as pd
import re

from bs4 import BeautifulSoup

In [111]:
fpath = "./data/zillow_page_0.html"
zfile = open(fpath, "r")
zpage_html = zfile.read()
zfile.close()
# with file read does not work here???

In [112]:
soup = BeautifulSoup(zpage_html, "lxml")
soup.title

<title>Los Angeles CA Real Estate - Los Angeles CA Homes For Sale | Zillow</title>

In [113]:
property_data_list = []

for d in soup.select("div.property-card-data"):
    price_str = d.select_one("span[data-test='property-card-price']").string
    price = int(re.sub("[^\d]", "", price_str))
    property_dict = {
        "addr": d.select_one("address").string,
        "price": price
    }
    property_data_list.append(property_dict)

print(property_data_list)

[{'addr': '1397 W 38th St, Los Angeles, CA 90062', 'price': 500000}, {'addr': '20418 Marilla St, Chatsworth, CA 91311', 'price': 820000}, {'addr': '5139 Balboa Blvd UNIT 202, Encino, CA 91316', 'price': 599999}, {'addr': '8400 Zelzah Ave, Northridge, CA 91325', 'price': 1250000}, {'addr': '10842 Owens Pl, Tujunga, CA 91042', 'price': 749000}, {'addr': '4745 S La Villa Mari #J, Marina Del Rey, CA 90292', 'price': 1450000}, {'addr': '7727 Babcock Ave, North Hollywood, CA 91605', 'price': 699999}, {'addr': '20629 Archwood St, Winnetka, CA 91306', 'price': 879900}, {'addr': '10423 Scenario Ln, Los Angeles, CA 90077', 'price': 599000}, {'addr': '12380 Covello St, North Hollywood, CA 91605', 'price': 729000}, {'addr': '903 Glendale Blvd, Los Angeles, CA 90026', 'price': 1100000}, {'addr': '5363 Yolanda Ave, Tarzana, CA 91356', 'price': 1399000}, {'addr': '2952 W Avenue 34, Los Angeles, CA 90065', 'price': 895000}, {'addr': '11014 Omelveny Ave, San Fernando, CA 91340', 'price': 699900}, {'add

In [114]:
for prop in property_data_list:
    print(f'{prop["addr"]}\nPrice: ${prop["price"]:,.0f}\n')

1397 W 38th St, Los Angeles, CA 90062
Price: $500,000

20418 Marilla St, Chatsworth, CA 91311
Price: $820,000

5139 Balboa Blvd UNIT 202, Encino, CA 91316
Price: $599,999

8400 Zelzah Ave, Northridge, CA 91325
Price: $1,250,000

10842 Owens Pl, Tujunga, CA 91042
Price: $749,000

4745 S La Villa Mari #J, Marina Del Rey, CA 90292
Price: $1,450,000

7727 Babcock Ave, North Hollywood, CA 91605
Price: $699,999

20629 Archwood St, Winnetka, CA 91306
Price: $879,900

10423 Scenario Ln, Los Angeles, CA 90077
Price: $599,000

12380 Covello St, North Hollywood, CA 91605
Price: $729,000

903 Glendale Blvd, Los Angeles, CA 90026
Price: $1,100,000

5363 Yolanda Ave, Tarzana, CA 91356
Price: $1,399,000

2952 W Avenue 34, Los Angeles, CA 90065
Price: $895,000

11014 Omelveny Ave, San Fernando, CA 91340
Price: $699,900

12960 Fernmont St, Sylmar, CA 91342
Price: $699,999

16515 Casey St, North Hills, CA 91343
Price: $999,000

4515 Cezanne Ave, Woodland Hills, CA 91364
Price: $1,699,000

8311 Topeka Dr

In [115]:
zillow_df = pd.DataFrame(property_data_list)
zillow_df

Unnamed: 0,addr,price
0,"1397 W 38th St, Los Angeles, CA 90062",500000
1,"20418 Marilla St, Chatsworth, CA 91311",820000
2,"5139 Balboa Blvd UNIT 202, Encino, CA 91316",599999
3,"8400 Zelzah Ave, Northridge, CA 91325",1250000
4,"10842 Owens Pl, Tujunga, CA 91042",749000
5,"4745 S La Villa Mari #J, Marina Del Rey, CA 90292",1450000
6,"7727 Babcock Ave, North Hollywood, CA 91605",699999
7,"20629 Archwood St, Winnetka, CA 91306",879900
8,"10423 Scenario Ln, Los Angeles, CA 90077",599000
9,"12380 Covello St, North Hollywood, CA 91605",729000


In [116]:
print(f'The median price for this area is: ${zillow_df["price"].median():,.0f}')

The median price for this area is: $850,000


## Extra credit
- What is the median price per square foot?
- Which realtor has the most listings?
- Can you stain listings over $1M in red and take a full-screenshot?