Skip to content

imin-ltd/rpde-scripts

Repository files navigation

rpde-scripts

Scripts that can be used to walk and analyse RPDE feeds.

Set-up

npm install

SCRIPT: walkRpde.js

This script walks an RPDE feed, starting from a given URL, and saves the feed pages to a directory. It will walk the feed until it reaches the end of the feed or until it reaches the END_URL or MAX_PAGES limit.

Running

Replacing the URL with the URL of the RPDE feed you want to walk:

START_URL=https://acme.com/api/rpde/session-series \
  REL_OUTPUT_DIR=output \
  node walkRpde.js

This will start outputting a list of files rpde-*.json e.g. rpde-1.json, rpde-2.json, etc until the end of the feed has been reached by the script. These files will all be saved in the REL_OUTPUT_DIR directory. So, in this case, they will be saved as output/rpde-1.json, output/rpde-2.json, etc.

You can then run queries like (you will need to install jq):

cat output/rpde-*.json | jq --slurp '[.[].items[]] | length'

Which will return how many total items there are

Options

Options are set using environment variables. They are:

  • START_URL - (REQUIRED unless CONTINUE_PREVIOUS_RUN) The URL to start walking the RPDE feed from
  • REL_OUTPUT_DIR - (REQUIRED) Relative Output Directory. If included, RPDE feed pages will be outputted to this directory. This directory is relative to the current working directory.
    • e.g. export REL_OUTPUT_DIR=slots will output to ./slots/rpde-1.json, ./slots/rpde-2.json, etc
  • CONTINUE_PREVIOUS_RUN - (REQUIRED unless START_URL is set) If set to true, the script will continue from the last page it downloaded. Otherwise, it will start from the beginning of the feed.
  • END_URL - (OPTIONAL) The URL to stop walking the RPDE feed at. If omitted, the script will walk to the very end of the feed
  • MAX_PAGES - (OPTIONAL) The maximum number of pages to walk. If omitted, the script will walk to the very end of the feed
    • e.g. export MAX_PAGES=10
  • API_KEY - (OPTIONAL) API key used to make RPDE requests. It will be set as the X-Api-Key HTTP header
  • BEARER_TOKEN - (OPTIONAL) Bearer token used to make RPDE requests. It will be set as the Authorization HTTP header

Some other example queries:

How many items in the feed have state: 'updated':

cat output/rpde-*.json | jq -s '[.[].items[] | select(.state == "updated")] | length'

Latest startDate for items in this ScheduledSession/Slot feed:

cat output/rpde-*.json | jq --slurp '[.[].items[].data.startDate] | max'

Get item with ID X

cat output/rpde-*.json | jq -s '[.[].items[] | select(.data["@id"] == "X")]'

Get the first item with state=updated

cat output/rpde-*.json | jq -s '[.[].items[] | select(.state == "updated")][0]'

Get the first 10 items with state=updated

cat output/rpde-*.json | jq -s '[.[].items[] | select(.state == "updated")][0:10]'

SCRIPT: perfTestRpde.js

You can get performance stats about an RPDE feed with perfTestRpde.js. Its interface is the same as walkRpde.js. e.g.:

export START_URL=<e.g. https://opensessions.io/api/rpde/session-series>
node perfTestRpde.js

This will not download the pages, but will instead walk to the end of the feed and then output stats about request times and total time taken. An example output:

Performance stats:

- Total time taken: 105711.00003899634
- Average request time: 600.5119533183223
- p90: 657.3589110001922
- p95: 808.9024389982224
- p99: 986.9369449988008

SCRIPT: uploadToDb.js

This is a script that uploads files generated by walkRpde.js to a PostgreSQL database.

Usage:

export REL_INPUT_DIR=output
export POSTGRES_USER=<user>
export POSTGRES_PASSWORD=<password>
export POSTGRES_HOST=<host>
export POSTGRES_DATABASE=<db>
export CLEAR_EXISTING=true
node walkRpde.js

Details of each option in uploadToDb.js Options.

uploadToDb.js Options

Options are set using environment variables. They are:

  • REL_INPUT_DIR - (REQUIRED) The directory containing the RPDE feed files to upload. This needs to be the same directory as the one set in walkRpde.js using the REL_OUTPUT_DIR environment variable.
  • POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_HOST, POSTGRES_DATABASE - (REQUIRED) PostgreSQL connection details.
  • CLEAR_EXISTING - (OPTIONAL) If set to true, the script will clear the existing data in the database before uploading the new data. Otherwise, the script will append the new data to the existing data.

License

Copyright © 2017 Imin Ltd.

Released under the MIT license.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published