# Data Acquisition
28 Jan 2023

In this project, the [Booking.com API available on RapidAPI](https://rapidapi.com/tipsters/api/booking-com/details) to acquire hotel data for London. Basically, an HTTP GET request is sent to the API to retrieve information. The resulting data is concatenated to create a single list of hotels. Finally, the resulting hotel data is written to a JSON file named `data.json` for the purpose of analysis.

---

In [None]:
# import necessary libraries
import requests
import collections
import json

To make a request to the Booking.com API, specific headers need to be included, such as the RapidAPI key and host, which are defined in the below headers variable. These headers contain the account information required by the API, which could be obatined from the [website](https://rapidapi.com/tipsters/api/booking-com/details) for free. It should be noted that the API has a limit of 550 calls per month for the free plan.

In [None]:
# account information
headers = {
	"X-RapidAPI-Key": "ff51bcfd91mshd10c23b2d26fe57p158509jsna62e43824f9c",
	"X-RapidAPI-Host": "booking-com.p.rapidapi.com"
}

Since the booking.com only shows the first 1000 results on the searching pages, we would get the data step by step through filtering the star rating of the hotels. Specifically, a `get_hotel` function is defined with the input of `star_rating` to construct the API request URL and query parameters, such as the destination ID, check-in and check-out dates, the number of adults and rooms, and other filters such as the currency and the hotel category (i.e., star rating). 

Note that the data is acquired on 28 Jan 2023, with the check-in and check-out date being 1 March 2023 and 2 March 2023 respectively (One night stay). Different specification will lead to different data and analysis.

In [None]:
def get_hotel(star_rating):
    url = "https://booking-com.p.rapidapi.com/v1/hotels/search"

    querystring = {"dest_id":"-2601889",
                   "order_by":"popularity",
                   "filter_by_currency":"GBP",
                   "adults_number":"2",
                   "room_number":"1",
                   "checkout_date":"2023-03-02",
                   "units":"metric",
                   "checkin_date":"2023-03-01",
                   "dest_type":"city",
                   "locale":"en-gb",
                   "include_adjacency":"false"}

    querystring["categories_filter_ids"] = f'class::{star_rating}'

    response = requests.request("GET", url, headers=headers, params=querystring)
    r = response.json()

    n_hotels = r['count']
    
    hotel_list = []

    for i in range((n_hotels-1)//20+1): 
        querystring["page_number"] = f"{i}"
        response = requests.request("GET", url, headers=headers, params=querystring)
        r = response.json()

        for hotel in r["result"]:
            hotel_list.append(hotel)
    
    return hotel_list

In [None]:
# Get hotel id sperately by star rating
hotel_5_star = get_hotel(5)
hotel_4_star = get_hotel(4)
hotel_3_star = get_hotel(3)
hotel_2_star = get_hotel(2)
hotel_1_star = get_hotel(1)
hotel_unrated = get_hotel(0)

# Concatenate all lists
hotel_list = hotel_5_star + hotel_4_star + hotel_3_star + hotel_2_star + hotel_1_star + hotel_unrated

In [None]:
len(hotel_list)

1813

In [None]:
# write a json file
with open('data/data.json', 'w') as f:
    json.dump(hotel_list, f)