Register Transformer PSC is a data transformer for the OpenOwnership Register project. It processes bulk data published to AWS S3, such as emitted from AWS Kinesis Data Firehose, converts them into the Beneficial Ownership Data Standard (BODS) format, and stores records in Elasticsearch. Optionally, it can also use AWS Kinesis for processing streamed data (rather than bulk data published to AWS S3), or for publishing newly-transformed records to a different stream.
The transformation schema is BODS 0.2.
Install and boot Register.
Configure your environment using the example file:
cp .env.example .env
Create the Elasticsearch indexes:
docker compose run transformer-psc create-indexes
Run the tests:
docker compose run transformer-psc test
To transform the bulk data from a prefix in AWS S3:
docker compose run transformer-psc transform-bulk raw_data/source=PSC/year=2023/month=10/
To transform a stream:
docker compose run transformer-psc transform-stream