pip-data-management

Overview

pip-data-management is a microservice that deals with most operations relating to data persistence within the Court and Tribunal Hearings Service (CaTH hereafter) written with Spring Boot/Java.

In practice, the service is usually containerized within a hosted kubernetes environment within Azure.

Broadly speaking, this service has two main components relating to the persistence, validation, retrieval and manipulation of court publications and canonical location information (reference data).

Publications

Uploading publications to the service
Validation of publications before publication (JSON)
Retrieval of existing publications and metadata
Archival processes to handle mandated data retention periods

Locations

Retrieving individual locations or lists of locations
Uploading and validation of location reference data
Deletion of locations where appropriate.

Most interactions with pip-data-management are performed through the API (specified in API Documentation) either as a standalone service or via connections to other microservices.

Features and Functionality

Uploading/retrieval/deletion of publications into the service.
Request PDF/Excel generation for JSON payload if the payload size is less than the set limit (currently default to 2MB).
Interfacing with local or hosted Postgres instances for metadata and retrieval and Azure Blob Storage for raw files.
Parsing and validation of ingested json files.
Flyway for database modifications via SQL ingestion.
Secure/Insecure Mode: Use of bearer tokens for authentication with the secure instance (if desired)
Azure Blob Storage: Handles interactions with the CaTH Azure Blob Storage instance (or local Azurite emulator/Azure Storage Explorer instances)
Endpoints which interact with scheduled cronjobs for daily tasks (e.g. retention period checks for archival purposes within pip-cron-trigger)
OpenAPI Spec/Swagger-UI: Documents and allows users or developers to access API resources within the browser.
Integration tests using TestContainers for dummy database operations.

Architecture Diagram

The above diagram is somewhat simplified for readability (e.g. it does not include secure/insecure communications, but those are covered elsewhere).

Getting Started

Prerequisites

General

Java JDK 21 - this is used throughout all of our services.
REST client of some description (e.g. Curl, Insomnia, Postman). Swagger-UI can also be used to send requests.
Docker - used to run integration tests due to our use of TestContainers

Local development

Azurite - Local Azure emulator used along with Azure Storage explorer for local storage.
Azure Storage Explorer - Used for viewing and storing blobs within an Azurite instance locally.

Nice-to-haves

pip-dev-env - This repo provides a development environment wherein ensure all microservices, as well as external services (e.g. postgres & redis) are all running in tandem within the service. It eases the development process and is particularly helpful when working with cross-service communication, as it also reduces strain on local performance from having many separate IDE windows open.
PostgreSQL - for local development, it will help to install Postgres. Ensure your postgres instance matches the relevant environment variables. Most devs on the project are just using this within a docker container.
Some means of interfacing with the postgres database either locally or remotely. Good options include DataGrip, pgAdmin or psql. This will allow you to verify the impacts of your requests on the underlying database.

Installation

Clone the repository
Ensure all required environment variables have been set.
Build using the command ./gradlew clean build
Start the service using the command ./gradlew bootrun in the newly created directory.

Configuration

Environment Variables

Environment variables are used by the service to control its behaviour in various ways.

These variables can be found within various separate CaTH Azure keyvaults. You may need to obtain access to this via a support ticket.

Runtime secrets are stored in pip-ss-{env}-kv (where {env} is the environment where the given instance is running (e.g. production, staging, test, sandbox)).
Test secrets are stored in pip-bootstrap-{env}-kv with the same convention.

Get environment variables with python scripts

Python scripts to quickly grab all environment variables (subject to Azure permissions) are available for both runtime and test secrets.

Runtime secrets

Below is a table of currently used environment variables for starting the service, along with a descriptor of their purpose and whether they are optional or required.

Variable	Description	Required?
SPRING_PROFILES_ACTIVE	If set equal to `dev`, the application will run in insecure mode (i.e. no bearer token authentication required for incoming requests.) Note - if you wish to communicate with other services, you will need to set them all to run in insecure mode in the same way.	No
APP_URI	Uniform Resource Identifier - the location where the application expects to receive bearer tokens after a successful authentication process. The application then validates received bearer tokens using the AUD parameter in the token	No
CLIENT_ID	Unique ID for the application within Azure AD. Used to identify the application during authentication.	No
TENANT_ID	Directory unique ID assigned to our Azure AD tenant. Represents the organisation that owns and manages the Azure AD instance.	No
CLIENT_SECRET	Secret key for authentication requests to the service.	No
CONNECTION_STRING	Connection string for connecting to the Azure Blob Storage service. Only required when running the application locally via Azurite.	Yes
STORAGE_ACCOUNT_NAME	Azure storage account name used to construct the storage account endpoint. Not required when running the application locally.	No
DB_HOST	Postgres Hostname	Yes
DB_PORT	Postgres Port	Yes
DB_NAME	Postgres Db name	Yes
DB_USER	Postgres Username	Yes
DB_PASS	Postgres Password	Yes
ACCOUNT_MANAGEMENT_URL	URL used for connecting to the pip-account-management service. Defaults to staging if not provided.	No
CHANNEL_MANAGEMENT_URL	URL used for connecting to the pip-channel-management service. Defaults to staging if not provided.	No
PUBLICATION_SERVICES_URL	URL used for connecting to the pip-publication-services service. Defaults to staging if not provided.	No
SUBSCRIPTION_MANAGEMENT_URL	URL used for connecting to the pip-subscription-management service. Defaults to staging if not provided.	No
CHANNEL_MANAGEMENT_AZ_API	Used as part of the `scope` parameter when requesting a token from Azure. Used for service-to-service communication with the pip-channel-management service	No
SUBSCRIPTION_MANAGEMENT_AZ_API	Used as part of the `scope` parameter when requesting a token from Azure. Used for service-to-service communication with the pip-subscription-management service	No
PUBLICATION_SERVICES_AZ_API	Used as part of the `scope` parameter when requesting a token from Azure. Used for service-to-service communication with the pip-publication-services service	No
ACCOUNT_MANAGEMENT_AZ_API	Used as part of the `scope` parameter when requesting a token from Azure. Used for service-to-service communication with the account management service	No
ENABLE_TESTING_SUPPORT_API	Used to conditionally enable testing support API. Default to `false` for the production environment only.	No
MAX_PAYLOAD_SIZE	The maximum size of input payload before we stop generating the PDF, excel and email summary for the publication. Default to 2MB.	No

Additional Test secrets

Secrets required for getting tests to run correctly can be found in the below table:

Variable	Description
CLIENT_ID	As above
CLIENT_SECRET	As above
APP_URI	As above
SUBSCRIPTION_MANAGEMENT_AZ_API	As above
TENANT_ID	As above
ACCOUNT_MANAGEMENT_AZ_API	As above
PUBLICATION_SERVICES_AZ_API	As above
SYSTEM_ADMIN_PROVENANCE_ID	Value for the provenance of a system admin used as a header on authentication-bound tests.
TEST_USER_ID	User ID for a test account used as a header for most publication tests.

Application.yaml files

The service can also be adapted using the yaml files found in the following locations:

src/main/resources/application.yaml for changes to the behaviour of the service itself.
src/main/resources/application-dev.yaml for changes to the behaviour of the service when running locally.
src/functionalTest/resources/application-functional.yaml for changes to the application when it's running functional tests.
src/functionalTest/resources/application-view.yaml for changes to postgres view tests.
src/test/resources/application-test.yaml for changes to other test types (e.g. unit tests).

Fortify

We use Fortify to scan for security vulnerabilities. This is run as part of our nightly pipelines.

API Documentation

Our full API specification can be found within our Swagger-UI page. It can be accessed locally by starting the service and going to http://localhost:8090/swagger-ui/swagger-ui/index.html Alternatively, if you're on our VPN, you can access the swagger endpoint at our staging URL (ask a teammate to give you this).

Search Criteria

The 'search' field forms part of the response back from the POST /publication endpoint.

The field contains values extracted from the payload that are then used by users in the frontend to search for publications when setting up subscriptions.

The values are extracted using JPATH (Jayway implementation). This is an example of the extracted values:

{
  "cases":[{
    "caseUrn": "ExampleURN",
    "caseName": "ExampleName",
    "caseNumber": "ExampleNumber"
  }],
  "parties": [
    {
      "cases": [{
        "caseUrn": "ExampleURN",
        "caseName": "ExampleName",
        "caseNumber": "ExampleNumber"
      }],
      "organisations": [
        "Org name"
      ],
      "individuals": [
        {
          "forename": "Forename",
          "middleName": "M",
          "surname": "Surname"
        }
      ]
    }
  ]
}

The 'parties' section is used for searching the matched publications when adding subscriptions in the frontend. It excludes representatives, and blank / null party roles. For individual names, only the surname is used for searching. However the forenames and middle names are also stored so that full names can be displayed on the subscription tables.

The legacy 'cases' section is used for search by case Number, URN or Name for publications created previously before "parties" were added.

Examples

As mentioned, the full api documentation can be found within swagger-ui, but some of the most common operations are highlighted below.

Most of the communication with this service benefits from using secure authentication. While possible to stand up locally in insecure mode, to simulate a production environment it is better to use secure mode. Before sending in any requests to the service, you'll need to obtain a bearer token using the following approach:

Requesting a bearer token

To request a bearer token, sending a post request following this template:

curl --request POST \
  --url https://login.microsoftonline.com/{TENANT_ID}/oauth2/v2.0/token \
  --header 'Content-Type: multipart/form-data' \
  --form client_id={CLIENT_ID_FOR_ANOTHER_SERVICE} \
  --form scope={APP_URI}/.default \
  --form client_secret={CLIENT_SECRET_FOR_ANOTHER_SERVICE}\
  --form grant_type=client_credentials

You can copy the above curl command into either Postman or Insomnia and they will automatically be converted to the relevant formats for those programs.

Note - the _FOR_ANOTHER_SERVICE variables need to be extracted from another registered microservice within the broader CaTH umbrella (e.g. pip-subscription-management)

Uploading a new publication

The following request is a template which can be used to input a new list or publication at the /publication post endpoint.

curl --request POST \
  --url http://localhost:8090/publication \
  --header 'Authorization: Bearer {BEARER_TOKEN_HERE}' \
  --header 'Content-Type: application/json' \
  --header 'x-content-date: {DATE_IN_ISO_FORMAT_WITH_NO_OFFSET}' \
  --header 'x-court-id: {LOCATION_ID_OF_DESIRED_LOCATION}' \
  --header 'x-display-from: {DATE_IN_ISO_FORMAT_WITH_NO_OFFSET}' \
  --header 'x-display-to: {DATE_IN_ISO_FORMAT_WITH_NO_OFFSET}' \
  --header 'x-language: {ENGLISH or WELSH or BI_LINGUAL}' \
  --header 'x-list-type: {LIST_TYPE}' \
  --header 'x-provenance: {MANUAL_UPLOAD}' \
  --header 'x-source-artefact-id: {FILENAME_IF_PROVENANCE_IS_MANUAL_UPLOAD}' \
  --header 'x-type: {"LIST" or "GENERAL_PUBLICATION"}' \
  --data '{YOUR_JSON_HERE}'

Getting a list of all hearing locations

The following request returns a list of all hearing locations with metadata such as region, location-type, jurisdiction, welsh names etc. Hearing locations are ingested into the system using the reference data endpoint.

curl --request GET \                                                                                                    13:40:44
          --url http://localhost:8090/locations \
          --header 'Authorization: Bearer {BEARER_TOKEN_HERE}'

Getting a specific hearing location

The following request returns location metadata for an individual court.

curl --request GET \                                                                                                    13:40:44
          --url http://localhost:8090/locations/{LOCATION_ID_OF_DESIRED_LOCATION} \
          --header 'Authorization: Bearer {BEARER_TOKEN_HERE}'

Azure Blob Storage

This service uses Azure Blob storage to store the raw artefact data. This is configured in AzureBlobConfiguration.

The Workload Identity is used by default to authenticate with Azure Blob Storage which is present in the Azure environments. If the workload identity is not present (such as in a local environment), a connection string can be used instead by setting the CONNECTION_STRING environment variable.

For the local environment, Azurite docker images can be used to provide a local instance of Blob Storage.

Deployment

We use Jenkins as our CI/CD system. The deployment of this can be controlled within our application logic using the various Jenkinsfile-prepended files within the root directory of the repository.

Our builds run against our dev environment during the Jenkins build process. As this is a microservice, the build process involves standing up the service in a docker container in a Kubernetes cluster with the current staging master copies of the other interconnected microservices.

If your debugging leads you to conclude that you need to implement a pipeline fix, this can be done in the CNP Jenkins repo

Creating or debugging of SQL scripts with Flyway

Flyway is used to apply incremental schema changes (migrations) to our database.

Any modifications to the database schema must be done through flyway. Changes to the models will no longer be automatically applied to the database.

This behaviour can be overridden using the DB_UPDATE environment variable. This is useful for local development, but should not be used in production.

Pipeline

Flyway is enabled on the pipeline. It is only run on the pipeline, and not on startup. On startup, the app will validate that the flyway scripts have been applied.

Local

For local development, flyway is turned off by default. This is due to all tables existing within a single database locally. This can cause flyway to fail at startup due to mismatching scripts.

If you wish to test a flyway script locally, you will first need to clear the flyway_schema_history table then set the environment variable ENABLE_FLYWAY to true.

Monitoring and Logging

We utilise Azure Application Insights to store our logs. Ask a teammate for the specific resource in Azure to access these. Locally, we use Log4j.

In addition, this service is also monitored in production and staging environments by Dynatrace. The URL for viewing our specific Dynatrace instance can be had by asking a team member.

Application Insights

Application insights is configured via the lib/applicationinsights.json file. Alongside this, the Dockerfile is configured to copy in this file and also download the app insights client.

The client at runtime is attached as a javaagent, which allows it to send the logging to app insights.

To connect to app insights a connection string is used. This is configured to read from the KV Secret mounted inside the pod.

It is possible to connect to app insights locally, although somewhat tricky. The easiest way is to get the connection string from azure, set it as an environment variable (APPLICATIONINSIGHTS_CONNECTION_STRING), and add in the javaagent as VM argument. You will also need to remove / comment out the connection string line the config.

Security & Quality Considerations

We use a few automated tools to ensure quality and security within the service. A few examples can be found below:

SonarCloud - provides automated code analysis, finding vulnerabilities, bugs and code smells. Quality gates ensure that test coverage, code style and security are maintained where possible.
DependencyCheckAggregate - Ensures that dependencies are kept up to date and that those with known security vulnerabilities (based on the National Vulnerability Database(NVD)) are flagged to developers for mitigation or suppression.
JaCoCo Test Coverage - Produces code coverage metrics which allows developers to determine which lines of code are covered (or not) by unit testing. This also makes up one of SonarCloud's quality gates.
PMD - Static code analysis tool providing code quality guidance and identifying potential issues relating to coding standards, performance or security.
CheckStyle - Enforces coding standards and conventions such as formatting, naming conventions and structure.

Test Suite

This microservice is comprehensively tested using both unit and functional tests.

Unit tests

Unit tests can be run on demand using ./gradlew test.

Functional tests

Functional tests can be run using ./gradlew functional

For our functional tests, we are using Square's MockWebServer library. This allows us to test the full HTTP stack for our service-to-service interactions. We also use TestContainers to create throwaway postgres databases for testing to protect our prod and staging databases.

Contributing

We are happy to accept third-party contributions. See .github/CONTRIBUTING.md for more details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 385 Commits
.dependabot		.dependabot
.github		.github
bin		bin
charts/pip-data-management		charts/pip-data-management
config		config
gradle/wrapper		gradle/wrapper
infrastructure		infrastructure
lib		lib
src		src
.editorconfig		.editorconfig
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
Jenkinsfile_CNP		Jenkinsfile_CNP
Jenkinsfile_nightly		Jenkinsfile_nightly
Jenkinsfile_parameterized		Jenkinsfile_parameterized
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
data-man-arch.png		data-man-arch.png
docker-compose.yml		docker-compose.yml
gradlew		gradlew
gradlew.bat		gradlew.bat
hmctsLogo.png		hmctsLogo.png
lombok.config		lombok.config

License

hmcts/pip-data-management

Folders and files

Latest commit

History

Repository files navigation

pip-data-management

Table of Contents

Overview

Publications

Locations

Features and Functionality

Architecture Diagram

Getting Started

Prerequisites

General

Local development

Nice-to-haves

Installation

Configuration

Environment Variables

Get environment variables with python scripts

Runtime secrets

Additional Test secrets

Application.yaml files

Fortify

API Documentation

Search Criteria

Examples

Requesting a bearer token

Uploading a new publication

Getting a list of all hearing locations

Getting a specific hearing location

Azure Blob Storage

Deployment

Creating or debugging of SQL scripts with Flyway

Pipeline

Local

Monitoring and Logging

Application Insights

Security & Quality Considerations

Test Suite

Unit tests

Functional tests

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages