Skip to content

quark1482/wbee

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WBee - Worker Bee

Cloudflare worker for scraping listings from vrbo.com.

Features

  • Makes use of the undocumented vrbo.com API on GraphQL.
  • Extracts a given number of listings for a specific location.
  • Stores the listing details in a Cloudflare's D1 database.
  • Returns a simplified JSON response with the relevant data.

Installation

Pre-requisites:

  1. npm.
  2. git.
  3. a Cloudflare account.

@ dash.cloudflare.com:

  • Click on Workers
  • Create a subdomain (choose an unique name - let's say its 'my-subdomain')
  • Create a service (enter 'wbee' as service name)
  • Test the service by clicking on Preview.
    It should go to https://wbee.my-subdomain.workers.dev,
    which shows a simple 'Hello world' message.

@ shell/command-line:

  • npm install -g wrangler
  • cd to any directory where you want to put the worker in.
    • git clone https://github.com/quark1482/wbee
    • cd wbee
    • npm install
    • wrangler login
    • wrangler d1 create wbee
      Copy the database_id. Let's say its '2ba86d35-c3e1-6a21-db83-e963b4789720'.
    • echo 'name = "wbee"' > wrangler.toml
    • echo 'main = "src/index.js"' >> wrangler.toml
    • echo 'compatibility_date = "2023-03-07"' >> wrangler.toml
    • echo '[[ d1_databases ]]' >> wrangler.toml
    • echo 'binding = "DB"' >> wrangler.toml
    • echo 'database_name = "wbee"' >> wrangler.toml
    • echo 'database_id = "2ba86d35-c3e1-6a21-db83-e963b4789720"' >> wrangler.toml
      Use the database_id value from the previous wrangler command.
    • wrangler publish
      Browsing to https://wbee.my-subdomain.workers.dev at this time, should show
      something like {"error":"Missing parameter 'location'"}, and it's fine.
    • wrangler d1 execute wbee --file=./schema.sql

Testing the worker:

  • Browse to https://wbee.my-subdomain.workers.dev?location=boston&count=10.
    Location can be a full name, like 'boston, massachusetts, united states'.
    Invalid locations should show {"error":"Unexpected content: suggestions array came empty"}.
    The parameter 'count' is optional and its default value is 50.
    Count is a 'maximum possible'. There could be fewer results for small cities.
  • @ dash.cloudflare.com, click to Workers, and then on D1.
    • The database 'wbee' is now visible. Click on its name.
    • 'Listings' appears in the list of tables. Click on it.
      The table is being cleared on every worker request, to save resources.
      Remove the 'Delete From' instruction in the function 'saveResults()'
      located in './src/index.js' to avoid this behavior, and publish the worker again.

Results

The response JSON includes a simplified array of listing details, which is pretty fast to gather,
compared to the vrbo.com API's internal GraphQL query result.

Pay attention to the file ./schema.sql to see how the Listings table is created:

CREATE TABLE Listings (
    ListingId       INT,
    URL             TEXT,
    Name            TEXT,
    Description     TEXT,
    Type            TEXT,
    Beds            INT,
    Bedrooms        INT,
    Bathrooms       INT,
    Guests          INT,
    Price           TEXT,
    Rating          REAL,
    Amenities       TEXT,
    Photos          TEXT,
    Location        TEXT,
    PRIMARY KEY (ListingId)
);

Given their 'composed' nature, the fields Price, Amenities, Photos and Location are stored
as JSON content, to overcome the SQLITE (the D1's underlying database engine) limitations.



This README file is under construction.

About

Worker Bee

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors