# Code to obtain data from HomeAway
This code will scrape data from HomeAway using RESTful API calls. It will:

1. Send a POST request to https://ws.homeaway.com/oauth/token using our API Client's credentials with a Basic Auth header to obtain an `access_token` for subsequent API calls
2. Send GET requests to https://ws.homeaway.com/public/search to obtain information about listings in Atlanta in the form of a JSON file. **This might need to be modified to be more specific in the future.**
    - These GET requests require dates of stays in order to obtain the average nightly price of every listing. Therefore, to be safe, these GET requests will obtain listings for the timeframe of **June 1, 2019 (2019-07-01) - June 9, 2019 (2019-07-01).**
3. Extract the following information from the JSON file and put it in a Pandas DataFrame:
    - listingId
    - priceQuote -> amount
    - priceQuote -> averageNightly
    - priceQuote -> fees
    - priceQuote -> rent
    - location -> lat
    - location -> lng
    - location -> city
    - location -> state
    - location -> country
    - reviewCount
    - reviewAverage
    - bathrooms
    - bedrooms
    - listingUrl
    - bookWithConfidence **(maybe??)**

In [5]:
# Imports
import http.client
import json
import time
import sys
import collections
import csv
import socket
import requests
import pandas as pd
import time

### Send a POST request to the OAuth Server

In [6]:
## construct request
url = "https://ws.homeaway.com/oauth/token"

payload = ""
headers = {
    'Authorization': "Basic NzViNDgyOTktZGEyNS00YWNkLThiYjctM2EyMTJkMWVmOTljOmEyN2M1YmNlLWRhOTYtNGEwNS1hYzRmLTE3NDBkMDFiYWJjNQ==",
    'cache-control': "no-cache",
    'Postman-Token': "f60f6d42-07c7-4b8f-aaa5-f0a03d307012"
    }

## send POST request
print("| - - POST - - |")
print("Sending POST request to", url)
print("Authorization:", headers['Authorization'])
print("Postman Token:", headers['Postman-Token'])
response = requests.request("POST", url, data=payload, headers=headers)
print("| - - SUCCESS - - |")
# print(response.text)

## extract `access_token`
access_token = response.json()['access_token']
print("Access Token is", access_token)

| - - POST - - |
Sending POST request to https://ws.homeaway.com/oauth/token
Authorization: Basic NzViNDgyOTktZGEyNS00YWNkLThiYjctM2EyMTJkMWVmOTljOmEyN2M1YmNlLWRhOTYtNGEwNS1hYzRmLTE3NDBkMDFiYWJjNQ==
Postman Token: f60f6d42-07c7-4b8f-aaa5-f0a03d307012
| - - SUCCESS - - |
Access Token is OTRhNTRhZWUtN2UwZS00NjU3LThkZTItYTNiMzA0YjYzY2Rj


In [47]:
url = "https://ws.homeaway.com/public/search"

val_pageSize = 25
val_pageNum = 22

date_availabilityStart = "2019-07-01"
date_availabilityEnd = "2019-07-09"

querystring = {"q":"Atlanta",
               "availabilityStart": date_availabilityStart,
               "availabilityEnd":date_availabilityEnd,
               "pageSize":str(val_pageSize),
               "page":str(val_pageNum)}

payload = ""
headers = {
    'Authorization': "Bearer " + access_token,
    'cache-control': "no-cache",
    'Postman-Token': "9a32c226-9a48-417f-b062-5070f405af71"
    }

response = requests.request("GET", url, data=payload, headers=headers, params=querystring).json()

In [48]:
response

{'nextPage': 'https://ws.homeaway.com/public/search?q=Atlanta&availabilityStart=2019-07-01&availabilityEnd=2019-07-09&pageSize=25&page=23',
 'prevPage': 'https://ws.homeaway.com/public/search?q=Atlanta&availabilityStart=2019-07-01&availabilityEnd=2019-07-09&pageSize=25&page=21',
 'pageSize': 25,
 'pageCount': 81,
 'page': 22,
 'size': 2009,
 'refinements': [],
 'entries': [{'listingId': '7415150',
   'listingSource': 'homeaway_us',
   'headline': 'ATL.CV 1116 - Deluxe Apartment One Bedroom 4 pax CV 1116',
   'description': 'Stake your claim to a life beyond the view. 1stHomeRent connects you to all of the best shopping, dining and activity of Old Fourth Ward, Inman Park and the Atlanta Beltline while allowing you the savings to enjoy them. Our apartment is designed w...',
   'accommodations': '1 BR, 1.0BA, Sleeps 4',
   'minStayRange': {'minStayHigh': None, 'minStayLow': None},
   'thumbnail': {'height': 100,
    'imageSize': 'SMALL',
    'secureUri': 'https://imagesus-ssl.homeaway.com

In [50]:
entries = response['entries']
numEntries = len(response['entries'])

In [53]:
numEntries

25

In [55]:
entries[1]

{'listingId': '7415154',
 'listingSource': 'homeaway_us',
 'headline': 'ATL.CV 1414 - Deluxe Apartment One Bedroom 4 pax CV 1414',
 'description': 'Stake your claim to a life beyond the view. 1stHomeRent connects you to all of the best shopping, dining and activity of Old Fourth Ward, Inman Park and the Atlanta Beltline while allowing you the savings to enjoy them. Our apartment is designed w...',
 'accommodations': '1 BR, 1.0BA, Sleeps 4',
 'minStayRange': {'minStayHigh': None, 'minStayLow': None},
 'thumbnail': {'height': 100,
  'imageSize': 'SMALL',
  'secureUri': 'https://imagesus-ssl.homeaway.com/mda01/51711ef9-75ed-4011-89f7-48a27228579d.1.1',
  'uri': 'http://imagesus.homeaway.com/mda01/51711ef9-75ed-4011-89f7-48a27228579d.1.1',
  'width': 133},
 'priceQuote': {'amount': 1726.2,
  'averageNightly': 215.78,
  'currencyUnits': 'USD',
  'fees': None,
  'other': None,
  'rent': 1726.2,
  'tax': None},
 'priceRanges': [],
 'location': {'lat': 33.760714,
  'lng': -84.374001,
  'city':

- listingId
- priceQuote -> amount
- priceQuote -> averageNightly
- priceQuote -> fees
- priceQuote -> rent
- location -> lat
- location -> lng
- location -> city
- location -> state
- location -> country
- reviewCount
- reviewAverage
- bathrooms
- bedrooms
- listingUrl
- bookWithConfidence **(maybe??)**

### Send GET requests to obtain listings information and extract to DataFrame

In [11]:
def parseEntry(entries, i):
    tmp_data = pd.DataFrame(
        data=[entries[i]['listingId'],
              entries[i]['priceQuote']['amount'],
              entries[i]['priceQuote']['averageNightly'],
              entries[i]['priceQuote']['fees'],
              entries[i]['priceQuote']['rent'],
              entries[i]['location']['lat'],
              entries[i]['location']['lng'],
              entries[i]['location']['city'],
              entries[i]['location']['state'],
              entries[i]['location']['country'],
              entries[i]['reviewCount'],
              entries[i]['reviewAverage'],
              entries[i]['bathrooms'],
              entries[i]['bedrooms'],
              entries[i]['listingUrl'],
              entries[i]['bookWithConfidence']
             ], 
        index=[
            'listingId',
            'priceAmt',
            'priceAvgNightly',
            'priceFees',
            'priceRent',
            'lat',
            'lng',
            'city',
            'state',
            'country',
            'reviewCount',
            'reviewAverage',
            'bathrooms',
            'bedrooms',
            'listingUrl',
            'bookWithConfidence'
        ]
    ).transpose()
    return(tmp_data)

In [12]:
# Initialize `data` DataFrame
data = pd.DataFrame(columns=[
        'listingId',
        'priceAmt',
        'priceAvgNightly',
        'priceFees',
        'priceRent',
        'lat',
        'lng',
        'city',
        'state',
        'country',
        'reviewCount',
        'reviewAverage',
        'bathrooms',
        'bedrooms',
        'listingUrl',
        'bookWithConfidence'
    ])

In [13]:
boolParse = True
i = 0
while(boolParse):
    time.sleep(1)
    i += 1 # increment pageNum counter by 1
    # construct GET request
    url = "https://ws.homeaway.com/public/search"

    val_pageSize = 25
    val_pageNum = i

    date_availabilityStart = "2019-07-01"
    date_availabilityEnd = "2019-07-09"

    querystring = {"q":"Atlanta",
                   "availabilityStart": date_availabilityStart,
                   "availabilityEnd":date_availabilityEnd,
                   "pageSize":str(val_pageSize),
                   "page":str(val_pageNum)}
    

    payload = ""
    headers = {
        'Authorization': "Bearer " + access_token,
        'cache-control': "no-cache",
        'Postman-Token': "9a32c226-9a48-417f-b062-5070f405af71"
        }


    # send GET request
    print("| - - GET - - |")
    print("| - - Sending POST request to", url)
    print("| - - Authorization:", headers['Authorization'])
    print("| - - Postman Token:", headers['Postman-Token'])
    print("| - - - - Query Params - - - - |")
    print("| - - - - - - q:", querystring['q'])
    print("| - - - - - - availabilityStart:", querystring['availabilityStart'])
    print("| - - - - - - availabilityEnd:", querystring['availabilityEnd'])
    print("| - - - - - - pageSize:", querystring['pageSize'])
    print("| - - - - - - page:", querystring['page'])
    response = requests.request("GET", url, data=payload, headers=headers, params=querystring).json()
    print("| - - SUCCESS - - |")
    
    # parse GET response to get entries
    entries = response['entries']
    numEntries = len(response['entries']) # calculate number of entries

    if numEntries > 0:
        # extract pertinent entries into a DataFrame
        print("| - - Parsing pertinent entries - - |")
        for j in range(numEntries):
            data = data.append(parseEntry(entries,j))
            print("| - - - - Row", j, "- - - - |")
    else:
        boolParse = False
        print("| - - COMPLETE - - |")

| - - GET - - |
| - - Sending POST request to https://ws.homeaway.com/public/search
| - - Authorization: Bearer OTRhNTRhZWUtN2UwZS00NjU3LThkZTItYTNiMzA0YjYzY2Rj
| - - Postman Token: 9a32c226-9a48-417f-b062-5070f405af71
| - - - - Query Params - - - - |
| - - - - - - q: Atlanta
| - - - - - - availabilityStart: 2019-07-01
| - - - - - - availabilityEnd: 2019-07-09
| - - - - - - pageSize: 25
| - - - - - - page: 1
| - - SUCCESS - - |
| - - Parsing pertinent entries - - |
| - - - - Row 0 - - - - |
| - - - - Row 1 - - - - |
| - - - - Row 2 - - - - |
| - - - - Row 3 - - - - |
| - - - - Row 4 - - - - |
| - - - - Row 5 - - - - |
| - - - - Row 6 - - - - |
| - - - - Row 7 - - - - |
| - - - - Row 8 - - - - |
| - - - - Row 9 - - - - |
| - - - - Row 10 - - - - |
| - - - - Row 11 - - - - |
| - - - - Row 12 - - - - |
| - - - - Row 13 - - - - |
| - - - - Row 14 - - - - |
| - - - - Row 15 - - - - |
| - - - - Row 16 - - - - |
| - - - - Row 17 - - - - |
| - - - - Row 18 - - - - |
| - - - - Row 19 - - - - |


| - - SUCCESS - - |
| - - Parsing pertinent entries - - |
| - - - - Row 0 - - - - |
| - - - - Row 1 - - - - |
| - - - - Row 2 - - - - |
| - - - - Row 3 - - - - |
| - - - - Row 4 - - - - |
| - - - - Row 5 - - - - |
| - - - - Row 6 - - - - |
| - - - - Row 7 - - - - |
| - - - - Row 8 - - - - |
| - - - - Row 9 - - - - |
| - - - - Row 10 - - - - |
| - - - - Row 11 - - - - |
| - - - - Row 12 - - - - |
| - - - - Row 13 - - - - |
| - - - - Row 14 - - - - |
| - - - - Row 15 - - - - |
| - - - - Row 16 - - - - |
| - - - - Row 17 - - - - |
| - - - - Row 18 - - - - |
| - - - - Row 19 - - - - |
| - - - - Row 20 - - - - |
| - - - - Row 21 - - - - |
| - - - - Row 22 - - - - |
| - - - - Row 23 - - - - |
| - - - - Row 24 - - - - |
| - - GET - - |
| - - Sending POST request to https://ws.homeaway.com/public/search
| - - Authorization: Bearer OTRhNTRhZWUtN2UwZS00NjU3LThkZTItYTNiMzA0YjYzY2Rj
| - - Postman Token: 9a32c226-9a48-417f-b062-5070f405af71
| - - - - Query Params - - - - |
| - - - - - - q: Atlanta


| - - GET - - |
| - - Sending POST request to https://ws.homeaway.com/public/search
| - - Authorization: Bearer OTRhNTRhZWUtN2UwZS00NjU3LThkZTItYTNiMzA0YjYzY2Rj
| - - Postman Token: 9a32c226-9a48-417f-b062-5070f405af71
| - - - - Query Params - - - - |
| - - - - - - q: Atlanta
| - - - - - - availabilityStart: 2019-07-01
| - - - - - - availabilityEnd: 2019-07-09
| - - - - - - pageSize: 25
| - - - - - - page: 16
| - - SUCCESS - - |
| - - Parsing pertinent entries - - |
| - - - - Row 0 - - - - |
| - - - - Row 1 - - - - |
| - - - - Row 2 - - - - |
| - - - - Row 3 - - - - |
| - - - - Row 4 - - - - |
| - - - - Row 5 - - - - |
| - - - - Row 6 - - - - |
| - - - - Row 7 - - - - |
| - - - - Row 8 - - - - |
| - - - - Row 9 - - - - |
| - - - - Row 10 - - - - |
| - - - - Row 11 - - - - |
| - - - - Row 12 - - - - |
| - - - - Row 13 - - - - |
| - - - - Row 14 - - - - |
| - - - - Row 15 - - - - |
| - - - - Row 16 - - - - |
| - - - - Row 17 - - - - |
| - - - - Row 18 - - - - |
| - - - - Row 19 - - - - |

| - - SUCCESS - - |
| - - Parsing pertinent entries - - |
| - - - - Row 0 - - - - |
| - - - - Row 1 - - - - |
| - - - - Row 2 - - - - |
| - - - - Row 3 - - - - |
| - - - - Row 4 - - - - |
| - - - - Row 5 - - - - |
| - - - - Row 6 - - - - |
| - - - - Row 7 - - - - |
| - - - - Row 8 - - - - |
| - - - - Row 9 - - - - |
| - - - - Row 10 - - - - |
| - - - - Row 11 - - - - |
| - - - - Row 12 - - - - |
| - - - - Row 13 - - - - |
| - - - - Row 14 - - - - |
| - - - - Row 15 - - - - |
| - - - - Row 16 - - - - |
| - - - - Row 17 - - - - |
| - - - - Row 18 - - - - |
| - - - - Row 19 - - - - |
| - - - - Row 20 - - - - |
| - - - - Row 21 - - - - |
| - - - - Row 22 - - - - |
| - - - - Row 23 - - - - |
| - - - - Row 24 - - - - |
| - - GET - - |
| - - Sending POST request to https://ws.homeaway.com/public/search
| - - Authorization: Bearer OTRhNTRhZWUtN2UwZS00NjU3LThkZTItYTNiMzA0YjYzY2Rj
| - - Postman Token: 9a32c226-9a48-417f-b062-5070f405af71
| - - - - Query Params - - - - |
| - - - - - - q: Atlanta


| - - GET - - |
| - - Sending POST request to https://ws.homeaway.com/public/search
| - - Authorization: Bearer OTRhNTRhZWUtN2UwZS00NjU3LThkZTItYTNiMzA0YjYzY2Rj
| - - Postman Token: 9a32c226-9a48-417f-b062-5070f405af71
| - - - - Query Params - - - - |
| - - - - - - q: Atlanta
| - - - - - - availabilityStart: 2019-07-01
| - - - - - - availabilityEnd: 2019-07-09
| - - - - - - pageSize: 25
| - - - - - - page: 33
| - - SUCCESS - - |
| - - Parsing pertinent entries - - |
| - - - - Row 0 - - - - |
| - - - - Row 1 - - - - |
| - - - - Row 2 - - - - |
| - - - - Row 3 - - - - |
| - - - - Row 4 - - - - |
| - - - - Row 5 - - - - |
| - - - - Row 6 - - - - |
| - - - - Row 7 - - - - |
| - - - - Row 8 - - - - |
| - - - - Row 9 - - - - |
| - - GET - - |
| - - Sending POST request to https://ws.homeaway.com/public/search
| - - Authorization: Bearer OTRhNTRhZWUtN2UwZS00NjU3LThkZTItYTNiMzA0YjYzY2Rj
| - - Postman Token: 9a32c226-9a48-417f-b062-5070f405af71
| - - - - Query Params - - - - |
| - - - - - - q: 

In [None]:
parseEntry(entries, 1)

In [14]:
data.shape

(719, 16)

In [16]:
data

Unnamed: 0,listingId,priceAmt,priceAvgNightly,priceFees,priceRent,lat,lng,city,state,country,reviewCount,reviewAverage,bathrooms,bedrooms,listingUrl,bookWithConfidence
0,7512702,1150,143.75,,1150,33.7783,-84.391,Atlanta,GA,US,0,0,1,1,https://www.homeaway.com/vacation-rental/p7512...,True
0,7554619,1520,190,,1520,33.7782,-84.3807,Atlanta,GA,US,1,5,1,1,https://www.homeaway.com/vacation-rental/p7554...,True
0,v625088,3464.76,433.1,,3464.76,33.7683,-84.3512,Atlanta,GA,US,19,4.78947,2,4,https://www.vrbo.com/625088?unitId=1172869,True
0,v1424850,1750,218.75,,1750,33.6701,-84.3831,Atlanta,GA,US,4,4.75,2,4,https://www.vrbo.com/1424850?unitId=1983355,True
0,7130050,918,114.75,,918,33.8105,-84.3674,Atlanta,GA,US,5,4.8,1,1,https://www.homeaway.com/vacation-rental/p7130...,True
0,7172028,1948,243.5,,1948,33.7733,-84.3868,Atlanta,GA,US,11,4.54545,2,2,https://www.homeaway.com/vacation-rental/p7172...,True
0,v382033,1520,190,,1520,33.7934,-84.4009,Atlanta,GA,US,18,4.64706,1,1,https://www.vrbo.com/382033?unitId=382033,True
0,v468017,1098,137.25,,1098,33.7874,-84.385,Atlanta,GA,US,15,4.42857,1,1,https://www.vrbo.com/468017?unitId=1051128,True
0,7263368,1920,240,,1920,33.7417,-84.3797,Atlanta,GA,US,3,4,2,3,https://www.homeaway.com/vacation-rental/p7263...,True
0,v1376109,1648,206,,1648,33.7225,-84.4337,Atlanta,GA,US,2,5,1,3,https://www.vrbo.com/1376109?unitId=1934463,True


In [17]:
# save output dataframe to CSV
data.to_csv("homeaway_data.csv")