<a href="https://colab.research.google.com/github/groovymarty/gracieslist/blob/main/scrape_aptdotcom.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Here is the apartments.com scraper!


## Execute these code blocks once to set things up.

In [5]:
# Here are the imports we need
import requests
import time
import random
from bs4 import BeautifulSoup
import pandas

In [10]:
# Functions to build and send requests
def build_url(where, page):
  if page == 1:
    return f"https://www.apartments.com/{where}/"
  else:
    return f"https://www.apartments.com/{where}/{page}/"

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0"}

def send_request(url):
  return requests.get(url, headers=headers).text

In [25]:
# Functions to process the HTML that comes back from a request
# Return array of result rows
def process_result(soup):
  rows = []
  placards = soup.find(id="placards")
  properties = placards.find_all("div", class_="property-info")
  for prop in properties:
    address_div = prop.find("div", class_="property-address")
    beds_div = prop.find("div", class_="bed-range")
    price_div = prop.find("div", class_="price-range")
    if address_div and beds_div and price_div:
      address = address_div.get_text()
      beds = beds_div.get_text()
      price = price_div.get_text()
      link = prop.find("a", class_="property-link").get("href")
      print(f'found: "{where}","{address}","{beds}","{price}"')
      rows.append({
          "Where": where,
          "Address": address,
          "Beds": beds,
          "Price": price,
          "Link": link
      })
  return rows


In [31]:
# Top-level functions to drive the scraping process
def random_delay():
  time.sleep(3+random.random()*5)

def scrape_all_pages(where):
  rows = []
  page = 1
  while True:
    print("delaying...")
    random_delay()
    print(f"getting {where} page {page}")
    # send request to site and get result
    html_text = send_request(build_url(where, page))
    # parse and process result
    soup = BeautifulSoup(html_text, "html.parser")
    rows.extend(process_result(soup))
    # pagination logic
    page_range = soup.find("span", class_="pageRange")
    if page_range:
        last_page = int(page_range.get_text().split()[-1])
    else:
        last_page = 1
    if page >= last_page:
        break
    else:
        page += 1
  return rows

## Below are the parameters.  Edit them as you wish.
You must execute this code block at least once, and again when you change any of the the parameter values.

In [44]:
places = []
places.append("wabasha-county-mn")
places.append("rochester-mn")

## This code block creates an empty dataframe to accumulate the results.
You must execute this code block at least once.  Run it again if you want to clear the results and start over.

In [45]:
df = pandas.DataFrame(columns=["Where", "Address", "Beds", "Price", "Link"])

## Execute the following code block to scrape the site.
It will fully scrape the site according to your parameters, logging messages to show its progress, and adding result rows to the dataframe.  Run as often as you like.

In [46]:
for where in places:
  rows = scrape_all_pages(where)
  df = df.append(rows)
df.drop_duplicates(inplace=True)
print("Done!")

delaying...
getting wabasha-county-mn page 1
found: "wabasha-county-mn","524 Phelps Ave Wabasha, MN 55981","3 Beds, 1 Bath","$799 /mo"
delaying...
getting rochester-mn page 1
delaying...
getting rochester-mn page 2
delaying...
getting rochester-mn page 3
delaying...
getting rochester-mn page 4
found: "rochester-mn","624 7th Ave SE Rochester, MN 55904","3 Beds, 1 Bath","$1,649 /mo"
found: "rochester-mn","1105 11th Ave SE Rochester, MN 55904","2 Beds, 1 Bath","$1,600 /mo"
found: "rochester-mn","1801 3rd St SW Rochester, MN 55902","4 Beds, 2 Baths","$1,600 /mo"
found: "rochester-mn","4626 35 St NW Rochester, MN 55901","4 Beds, 3 Baths","$2,500 /mo"
found: "rochester-mn","6339 30th Ave NW Rochester, MN 55901","2 Beds","$1,650"
found: "rochester-mn","3284 Allison Ln NE Rochester, MN 55906","4 Beds, 2 Baths","$3,350 /mo"
found: "rochester-mn","3284 Allison Ln NE Rochester, MN 55906","4 Beds, 2 Baths","$3,650 /mo"
found: "rochester-mn","1816 34th St NW Rochester, MN 55901","1 Bed, 1 Bath","$6

## Execute the following block to view the result dataframe.

In [47]:
df

Unnamed: 0,Where,Address,Beds,Price,Link
0,wabasha-county-mn,"524 Phelps Ave Wabasha, MN 55981","3 Beds, 1 Bath",$799 /mo,https://www.apartments.com/524-phelps-ave-waba...
0,rochester-mn,"624 7th Ave SE Rochester, MN 55904","3 Beds, 1 Bath","$1,649 /mo",https://www.apartments.com/624-7th-ave-se-roch...
1,rochester-mn,"1105 11th Ave SE Rochester, MN 55904","2 Beds, 1 Bath","$1,600 /mo",https://www.apartments.com/1105-11th-ave-se-ro...
2,rochester-mn,"1801 3rd St SW Rochester, MN 55902","4 Beds, 2 Baths","$1,600 /mo",https://www.apartments.com/1801-3rd-st-sw-roch...
3,rochester-mn,"4626 35 St NW Rochester, MN 55901","4 Beds, 3 Baths","$2,500 /mo",https://www.apartments.com/4626-35-st-nw-roche...
...,...,...,...,...,...
74,rochester-mn,"15 8th St NW Rochester, MN 55901","3 Beds, 2 Baths","$1,400 /mo",https://www.apartments.com/15-8th-st-nw-roches...
75,rochester-mn,"3620 4th Pl NW Rochester, MN 55901","4 Beds, 1.5 Baths","$1,900 /mo",https://www.apartments.com/3620-4th-pl-nw-roch...
76,rochester-mn,"1545 2nd Ave NE Rochester, MN 55906",2 Beds,$795,https://www.apartments.com/1545-2nd-ave-ne-roc...
77,rochester-mn,"7229 Genoa Rd NW Byron, MN 55920","4 Beds, 2 Baths","$2,300 /mo",https://www.apartments.com/4-br-2-bath-house-7...


## Execute the following block to save the result dataframe to a CSV file
Edit the file name as you wish.  Use the file explorer to the left to find the CSV file.  (Look in the "content" folder.)  Then you can download the file if desired.  Note the files in the "content" folder will go away when you close the Colab notebook.

In [48]:
df.to_csv("results.csv")