Scripts that can be used to walk and analyse RPDE feeds.
npm install
This script walks an RPDE feed, starting from a given URL, and saves the feed pages to a directory. It will walk the feed until it reaches the end of the feed or until it reaches the END_URL
or MAX_PAGES
limit.
Replacing the URL with the URL of the RPDE feed you want to walk:
START_URL=https://acme.com/api/rpde/session-series \
REL_OUTPUT_DIR=output \
node walkRpde.js
This will start outputting a list of files rpde-*.json
e.g. rpde-1.json
, rpde-2.json
, etc until the end of the feed has been reached by the script. These files will all be saved in the REL_OUTPUT_DIR
directory. So, in this case, they will be saved as output/rpde-1.json
, output/rpde-2.json
, etc.
You can then run queries like (you will need to install jq
):
cat output/rpde-*.json | jq --slurp '[.[].items[]] | length'
Which will return how many total items there are
Options are set using environment variables. They are:
START_URL
- (REQUIRED unlessCONTINUE_PREVIOUS_RUN
) The URL to start walking the RPDE feed fromREL_OUTPUT_DIR
- (REQUIRED) Relative Output Directory. If included, RPDE feed pages will be outputted to this directory. This directory is relative to the current working directory.- e.g.
export REL_OUTPUT_DIR=slots
will output to./slots/rpde-1.json
,./slots/rpde-2.json
, etc
- e.g.
CONTINUE_PREVIOUS_RUN
- (REQUIRED unlessSTART_URL
is set) If set totrue
, the script will continue from the last page it downloaded. Otherwise, it will start from the beginning of the feed.END_URL
- (OPTIONAL) The URL to stop walking the RPDE feed at. If omitted, the script will walk to the very end of the feedMAX_PAGES
- (OPTIONAL) The maximum number of pages to walk. If omitted, the script will walk to the very end of the feed- e.g.
export MAX_PAGES=10
- e.g.
API_KEY
- (OPTIONAL) API key used to make RPDE requests. It will be set as theX-Api-Key
HTTP headerBEARER_TOKEN
- (OPTIONAL) Bearer token used to make RPDE requests. It will be set as theAuthorization
HTTP header
How many items in the feed have state: 'updated':
cat output/rpde-*.json | jq -s '[.[].items[] | select(.state == "updated")] | length'
Latest startDate
for items in this ScheduledSession/Slot feed:
cat output/rpde-*.json | jq --slurp '[.[].items[].data.startDate] | max'
Get item with ID X
cat output/rpde-*.json | jq -s '[.[].items[] | select(.data["@id"] == "X")]'
Get the first item with state=updated
cat output/rpde-*.json | jq -s '[.[].items[] | select(.state == "updated")][0]'
Get the first 10 items with state=updated
cat output/rpde-*.json | jq -s '[.[].items[] | select(.state == "updated")][0:10]'
You can get performance stats about an RPDE feed with perfTestRpde.js
. Its interface is the same as walkRpde.js
. e.g.:
export START_URL=<e.g. https://opensessions.io/api/rpde/session-series>
node perfTestRpde.js
This will not download the pages, but will instead walk to the end of the feed and then output stats about request times and total time taken. An example output:
Performance stats:
- Total time taken: 105711.00003899634
- Average request time: 600.5119533183223
- p90: 657.3589110001922
- p95: 808.9024389982224
- p99: 986.9369449988008
This is a script that uploads files generated by walkRpde.js
to a PostgreSQL database.
Usage:
export REL_INPUT_DIR=output
export POSTGRES_USER=<user>
export POSTGRES_PASSWORD=<password>
export POSTGRES_HOST=<host>
export POSTGRES_DATABASE=<db>
export CLEAR_EXISTING=true
node walkRpde.js
Details of each option in uploadToDb.js Options.
Options are set using environment variables. They are:
REL_INPUT_DIR
- (REQUIRED) The directory containing the RPDE feed files to upload. This needs to be the same directory as the one set inwalkRpde.js
using theREL_OUTPUT_DIR
environment variable.POSTGRES_USER
,POSTGRES_PASSWORD
,POSTGRES_HOST
,POSTGRES_DATABASE
- (REQUIRED) PostgreSQL connection details.CLEAR_EXISTING
- (OPTIONAL) If set totrue
, the script will clear the existing data in the database before uploading the new data. Otherwise, the script will append the new data to the existing data.
Copyright © 2017 Imin Ltd.
Released under the MIT license.