# Project 1: Accommodation Price Predictions

Accommodation costs share a significant portion of travellers' expenses. People always want to look for the best deals with rooms that fit their needs and preferences. However, with over thousands of choices over the Internet, people easily fall into a never-ending cycle of trying to manually targeting the best price. An intuitive and convenient solution to this common conundrum is to create a price estimator that does the cumbersome job for these travellers.

This project aims at building an accommodation price estimator that inputs relevant features about the stays and predicts their prices corresponding to these preferences. The collection of data requires web scraping on popular accommodation sites for a specific location, date and number of guests. To ensure consistency, the accommodations for 1 adult in Tokyo, Japan at a specific date will be explored and used as the input data.

In [1]:
# import webdriver
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import numpy as np
# define path of the webdriver
PATH = "C:\Program Files (x86)\chromedriver.exe"
options = webdriver.ChromeOptions()
options.add_argument("--lang=en")
options.add_argument("--start-maximized")
options. add_argument("--auto-open-devtools-for-tabs")
# create a driver instance by passing the driver path
driver = webdriver.Chrome(PATH, options=options)

  driver = webdriver.Chrome(PATH, options=options)


In [2]:
driver.get('https://www.airbnb.com/s/Tokyo--Japan/homes?tab_id=home_tab&refinement_paths%5B%5D=%2Fhomes&flexible_trip_lengths%5B%5D=one_week&place_id=ChIJ51cu8IcbXWARiRtXIothAS4&date_picker_type=calendar&checkin=2022-08-04&checkout=2022-08-06&adults=1&source=structured_search_input_header&search_type=autocomplete_click')

In [3]:
try:
    grid = WebDriverWait(driver, 20).until(
        EC.presence_of_element_located((By.XPATH, '//*[@id="site-content"]/div/div/div/div/div/div/div/div[5]/div/div[2]/div/div[3]/div/div/div/div/div/div/div/div/div'))
    )
except:
    driver.quit

In [4]:
items = grid.find_elements(By.CLASS_NAME, 'c4mnd7m')

In [5]:
print(items[0].text)

1 of 6 items showing
Rare find
Hotel room in Chuo City
Comfybed Ginza Long Term Stay Welcome Spacious Beds Single
1 small double bed
$37 
$32
 night
$32 per night, originally $37
·
$64 total
$64 total
View price breakdown
4.6 (113)


In [6]:
bnb_descriptions = []
bnb_beds = []
bnb_prices = []

In [7]:
def page_scraping():
    grid = driver.find_element(By.CLASS_NAME, 'gh7uyir')
    items = grid.find_elements(By.CLASS_NAME, 'c4mnd7m')
    for item in items:
        description = item.find_element(By.CLASS_NAME, 't1jojoys')
        bed = item.find_elements(By.CLASS_NAME, 'f15liw5s')[3]
        price = item.find_element(By.CLASS_NAME, '_14y1gc')
        bnb_descriptions.append(description.text)
        bnb_beds.append(bed.text)
        bnb_prices.append(price.text)


In [8]:
def page_scrolling():
    driver.execute_script(f"window.scrollTo(0, document.body.scrollHeight/2)")
    time.sleep(3)
    driver.execute_script(f"window.scrollTo(0, document.body.scrollHeight)")

In [9]:
def page_clicking(x):
    button = WebDriverWait(driver, 20).until(
        EC.visibility_of_element_located((By.CSS_SELECTOR, 'a[aria-label=Next]')))
    time.sleep(3)
    button.click()

In [10]:
def scraping_process():
    try:
        i = 1
        while True:
            try:
                wait_grid = WebDriverWait(driver, 20).until(
                    EC.presence_of_element_located((By.CLASS_NAME, "_14y1gc"))
                )
            except:
                break
            page_scraping()
            print(f'done scraping {i} pages')
            page_scrolling()
            print(f'done scrolling {i} pages')
            if i == 1:
                cross = WebDriverWait(driver, 20).until(
                    EC.visibility_of_element_located((By.XPATH, '/html/body/div[14]/div/div/div/div[3]/button'))).click()
            page_clicking(i)
            print(f'done clicking {i} pages')
            i += 1
    except:
        print('scraping done!')
        # driver.quit()

In [11]:
scraping_process()

done scraping 1 pages
done scrolling 1 pages
done clicking 1 pages
done scraping 2 pages
done scrolling 2 pages
done clicking 2 pages
done scraping 3 pages
done scrolling 3 pages
done clicking 3 pages
done scraping 4 pages
done scrolling 4 pages
done clicking 4 pages
done scraping 5 pages
done scrolling 5 pages
done clicking 5 pages
done scraping 6 pages
done scrolling 6 pages
done clicking 6 pages
done scraping 7 pages
done scrolling 7 pages
done clicking 7 pages
done scraping 8 pages
done scrolling 8 pages
done clicking 8 pages
done scraping 9 pages
done scrolling 9 pages
done clicking 9 pages
done scraping 10 pages
done scrolling 10 pages
done clicking 10 pages
done scraping 11 pages
done scrolling 11 pages
done clicking 11 pages
done scraping 12 pages
done scrolling 12 pages
done clicking 12 pages
done scraping 13 pages
done scrolling 13 pages
done clicking 13 pages
done scraping 14 pages
done scrolling 14 pages
done clicking 14 pages
done scraping 15 pages
done scrolling 15 pages


In [12]:
import pandas as pd
df = pd.DataFrame({'Description': bnb_descriptions, 'Number of Beds': bnb_beds, 'Price (HKD)': bnb_prices})

In [13]:
df.head()

Unnamed: 0,Description,Number of Beds,Price (HKD)
0,Hotel room in Chuo City,1 small double bed,"$37 \n$32\n night\n$32 per night, originally $37"
1,Private room in Sumida-ku,1 double bed,"$40 \n$23\n night\n$23 per night, originally $40"
2,Hotel room in 中央区,3 beds,"$160 \n$62\n night\n$62 per night, originally ..."
3,Hotel room in Chuo City,1 double bed,"$45 \n$36\n night\n$36 per night, originally $45"
4,Apartment in Chuo City,1 bed,"$79 \n$55\n night\n$55 per night, originally $79"


In [14]:
df.shape

(300, 3)

In [15]:
df.to_csv('airbnb_tokyo.csv', sep=',', index=False)