Skip to content
This repository has been archived by the owner on Apr 1, 2019. It is now read-only.
/ pomi-data-etl Public archive

Create a CSV and JSON file of POMI data for all suppliers for latest period

License

Notifications You must be signed in to change notification settings

nhsuk/pomi-data-etl

Repository files navigation

POMI Data ETL

GitHub Release Greenkeeper badge Build Status Coverage Status

ETL to download a number of data sets available under POMI collection. Patient Online Management Information (POMI) from the NHS Digital indicator portal.

The data sets used within this application are:

Additional information is available within the indicator portal. Search for the indicator reference number i.e. P02154

Run the application

Running scripts/start will bring up a docker container hosting a web server and initiate the scrape at a scheduled time. The default is 11pm. To test locally set an environment variable ETL_SCHEDULE to a new time, i.e. export ETL_SCHEDULE='25 15 * * *' to start the processing at 3:25pm. Note: the container time is GMT and does not take account of daylight saving, you may need to subtract an hour from the time.

Further details available here.

The scheduler can be completely disabled by setting the DISABLE_SCHEDULER variable to true. This sets the run date to run once in the future on Jan 1st, 2100.

Once initiated the scrape will download the files, strip out any records that are not for the current latest period (calculated based on the records), create csv file(s) containing those records in the output dir (./html/json/) and create JSON files containing an array of objects in the form of

{
  "PeriodEnd": "dd/mm/yyyyy",
  "GPPracticeCode": "B82050",
  "Supplier": "${supplier}"
}

Where ${supplier} will be one of the suppliers listed below ["EMIS","INPS","Informatica","Microtest","NK","TPP"]. Or one of these values with an (I) appended e.g. EMIS (I). The addition of (I) represents a GP that is now using the Informatica system.

Note: The list above was created by running jq -c '[.[].Supplier] | unique ' html/json/pomi.json

Upon completion the files will also be uploaded to the Azure storage location specified in the environment variable AZURE_STORAGE_CONNECTION_STRING. Each file will be uploaded twice, once to overwrite the current file and another date-stamped file, i.e. booking.json and 20170530-booking.json.

Azure Blob Storage

If the recommended environment variables are used the JSON file created will be available in Azure storage at:

The Microsoft Azure Storage Explorer may be used to browse the contents of blob storage.

Environment variables

Environment variables are expected to be managed by the environment in which the application is being run. This is best practice as described by twelve-factor.

Variable Description Default Required
AZURE_STORAGE_CONNECTION_STRING Azure storage connection string yes
AZURE_TIMEOUT_MINUTES Maximum wait time when uploading file to Azure 10
CONTAINER_NAME Azure storage container name etl-output
DISABLE_SCHEDULER set to 'true' to disable the scheduler false
ETL_SCHEDULE Time of day to run the upgrade. Syntax 0 23 * * * (11:00 pm)
LOG_LEVEL log level Depends on NODE_ENV
NODE_ENV node environment development

Architecture Decision Records

This repo uses Architecture Decision Records to record architectural decisions for this project. They are stored in doc/adr.