Refactor Validation code in preparation for Conditional Restrictions by joneubank · Pull Request #215 · overture-stack/lectern

joneubank · 2024-07-01T17:13:15Z

Summary

This PR refactors the validation package from the version brought in from the original js-lectern-client to a new form that will enable both the development of conditional restrictions, and for enabling the use of validation code in the Lectern server and in the browser.

To enable conditional restrictions we needed to be able to:

Validate fields individually - previously each different restriction test was applied record by record, and so we couldn't provide conditional checks to fields before selecting the tests for that field. Now when we validate a record, we can check the conditional restrictions on each field and only apply tests for the applicable restrictions.

To enable running validation in the browser we need to:

Remove Node specific modules - These were used in script validations to isolate the arbitrary script execution from the node global scope. Note: arbitrary script execution from untrusted dictionaries is always unsafe and so these are being phased out
Separate validation from data parsing - our use case for browser testing involves interacting with form data, not TSVs. Therefore we want to work on typed data.

Description of Changes

Separation of data parsing and data validation

Previously, data parsing/type conversion and data validation occurred together through use of the Lectern client's process, processRecords, and processSchemas functions. These functions would be provided the raw string values as submitted to a server (ie. from a TSV, as is expected for Lectern dictionary data), and the process functions would then validate the structure of this data, parse this data to convert it to the data types defined in a dictionary, and then validate those values using that dictionary.

With this refactor, validation is performed entirely on data that has already been transformed. This allowed us to write separate code for data conversion, which is also provided in this PR. The structure of data is clearly tested both during the data parsing and conversion step, and again during the data validation step. It is important that the data structure and data type checks is repeated at validation time so that we can validate data that is provided programatically through web forms or custom data parsers.

Functions for validating different groups of data (`field`, `record`, `schema`, `dictionary`)

There are now functions with declarative names for validating data of each of these levels: validateField(), validateRecord(), validateSchema(), and validateDictionary(). Notably, directly testing individual fields was not possible in the previous client implementation - record level processing was the lowest level available. Each of the new validation function requires as inputs only the relevant Lectern Dictionary data (Schema definition for validateSchema for instance). Importantly, each of these validation functions relies on the other validation functions to test its component parts... so validateDictionary uses validateSchema to ensure every schema is valid, and validateSchema uses validateRecord which uses validateField. This ensures the testing is done consistently across the validation suite. Each function introduces new validation tests that can only be done at that scale of data. For example, ForeignKey restrictions can only be tested when a full dictionary worth of data is provided.

Fully typed validation errors with logical structure

The validation functions now return a Result object that either indicates that the data is completely valid, or provides error as new structured data. Previously, errors were reported as a flat list of validation errors. Now, errors are grouped by schema, then grouped by record, and finally grouped by field.

Code Content

Validation package

The validation package now contains separate code for converting raw input values, and for validating data with restrictions from the lectern dictionaries.

Parse Value Functions

These are functions to parse raw string values from submitted TSVs and convert them into the proper data types in JS. These apply validations to the data value to ensure it matches the required type (string, number, integer, boolean). This also includes all logic for parsing array values based on the default delimiter value (,).

This functionality was only partially implemented in the previous client code. Array parsing was not previously done, and data conversion was bundles with validation. Now, data conversion can be done separately from validation, allowing for type conversion errors to be communicated and addressed before submitted for validation.

4 validation functions are provided:

parseFieldValue
parseRecordValues
parseSchemaValues
parseDictionaryValues

Validation functions

The validation library now exports the following 4 validation functions:

validateField
Validate a single field based on a SchemaField definition. This function requires as an argument the value of the entire DataRecord that this field belongs to. This will be used to determine which conditional restrictions need to be applied. This function tests that the field has the correct value type, and then tests all field restrictions:
- codeList
- range
- regex
- required
  Note that unique restriction cannot be tested at field level since we require all data for the schema.
validateRecord
Validates all fields in a record by running the validateField function for them. In addition, reports unrecognized fields in the record.
validateSchema
Validates all records belonging to a schema. This will apply tests for the following restrictions that require all schema data:
- unique
- uniqueKey
validateDictionary
Given data for all schemas, validates all data in a data set for a dictionary. This includes running validateSchema for each schema, as well as applying foreignKey tests where on each record of the schemas with this restriction.

Removed Features

Some small pieces of functionality have been intentionally removed:

default values: default values were being interpreted from a meta property, which is not a defined part of the lectern specification. This has been removed. If this functionality is needed we can create a standard mechanism in the Lectern spec, or recommend users add logic to process their data, applying defaults, before passing data to the Lectern client
script restrictions: script restriction testing has been removed. It will be replaced with conditional restrictions and the Lectern schema/documentation will be updated to reflect this. This was done because there is no safe way to run arbitrary code execution from a script in a Lectern dictionary, and the code that was in place to do this forced validation to happen on the server side.

Client package

The client package has had all data conversion and validation logic removed since this is now handled in the validation package. The processing code has been refactored to rely on the conversion and validation code instead.

Processing functions

The processing functions were providing a mechanism to perform both data conversion and data validation. To keep this functionality available, the old processing functions have been replaced with new ones performing similar roles.

Input type updates
The old processing functions required a pre-processing step that had already parsed all array fields into array data, but had not converted the values from raw text fields. Now, the inputs should strictly contain string values and the lectern standard conversion functionality will be applied. This will separate the values in arrays based on a default delimiter value of ,. The ability to customize the delimiter value will be added to the Lectern schema.
Processing Function name updates:
The names of the processing functions have been updated to correspond to the now common Lectern terminology.

processSchemas() -> processDictionary() : This function will process an entire dictionary worth of data, taking as input all records for all schemas in the dictionary.
processRecords() -> processSchema(): This function will process a schema worth of data, taking all records for a single schema.
process -> processRecord(): This function a single record, taking only one object as input for a single schema.

… more restrictive conditions

It should be mentioned here that the previously committed version of this had a mix up with the exclusive conditions, where the edge cases (value equal to the edge of the range) was incorrectly valdiated in the version imported from js-lectern-client. This is fixed now with tests to check these cases.

… whole array

…stead of data records

Original implementation had record processing logic built into the restriction test logic. This refactor sepearates restriction tests to apply to a specific value only, based on the defined restriction rule. This will enable future development to resolve the restrictions on each field based on conditional restrictions and then test based on the resolved conditions. The next step is to add validation testing for full data records, full entity data sets, and then full dictionary data sets. These each loop over the previous element but also introduce new tests. For example, entity data sets enforce unique field and uniqueKey restrictions. Dictionary datasets enforce foreignKey restrictions. Unit tests have been added to fieldValidation function, and to the individual restriction tests. The restriction tests check the detailed set of values vs restriction rules, while the fieldValidation testing is checking that these restrictions are enforced based on schemaField definitions. Note that script validations are not being run in this code, and that the unique constraint cannot be applied to a field. There is additional changes required to move the unique constraint to a field level property (like isArray) so that it can't be affected by conditional restrictions. It will be validated in the Schema level validation (entity dataset).

…ingle schema Applies the `validateRecord` validation to every record, plus runs validations that require knowledge of the entire set of records for this schema: - unique - uniqueKey

- applies foreign key restrictions - detects unrecognized schemas - collects foreign key errors with other record errors

- functions for converting values for field, record, schema, and dictionary - update to validation error type names to be shared with convert value types - includes test specs for each convert function exported

…ackage

joneubank · 2024-07-22T04:33:51Z

I still owe some doc files that describe the validation and data conversion functionality before this is ready to be merged. At this point, the code is ready for review.

* Change `FieldDetails` property `value` to `fieldValue` * Change TestResult invalid property to be `details`

joneubank added 30 commits June 22, 2024 01:04

Move restrictions and data value types to dedicated files

42dcf81

Add test folders to TS path but exclude from build

f2e0cc0

Undefined checks where unchecked index access was done

003ea18

Rename rangeToSymbol as rangeToText, change text priority to describe…

89bfe1c

… more restrictive conditions

Generalize isArray type predicates, move isDefined to common

2033ab6

Add test folders to TS path but exclude from build

ec3fe47

FieldRestriction type map and more granular codeList types

8bb1df7

Move shared type definition to commonTypes

85acf9e

Use updated restriciton and name types in dictionary defs

f9ade12

Declare separate types for singular and array data record values

24f7cad

Code autoformatting

79d4293

First pass types for reporting field resriction validation errors

d7c264f

Utility to convert a restriction test on a single value to apply to a…

9566522

… whole array

Rewrite restriction validation functions to apply to single fields in…

9f6f480

…stead of data records

Simplify type selection

37f9f49

Fix edge case tests and work for isWithinRange

646fcab

Documentation on process for refactored fieldValidation process

46d0e65

Fix generate script organization within monorepo

433ac14

WIP Documentation for Important Concepts

c1322b1

Move field validation code to /validateField

cc09c4d

Add validateRecord functionality with basic documentation

422d825

Move validateField tests to directory matching source directory name

56ee61c

Tests for validateRecord

d661eb8

Editing TSDoc for validation type utils

34972b6

Complete pnpm monorepo registration for scripts module

ebb6881

Infer predicates instead of declaring where possible

03b4701

Create function validateSchema to validate a list of records with s…

049b12b

…ingle schema Applies the `validateRecord` validation to every record, plus runs validations that require knowledge of the entire set of records for this schema: - unique - uniqueKey

Add copyright declaration to top of files

3cc6c4c

joneubank added 22 commits July 6, 2024 18:39

Update uniqueKey invalid test to check error detailed properties

6751b3e

Organize imports

5a97cd2

Add validateDictionary function plus related tests and fixtures

e94d617

- applies foreign key restrictions - detects unrecognized schemas - collects foreign key errors with other record errors

Standardize test fixture validation message text

63b4590

Remove range rule from test failure message

4b81f47

Add period to required test failure message

1288a96

Move resolveFieldRestrictions function to its own file

6df3edc

Rename foreign key type and schema to ForeignKeyRestriction

01e67c9

Rename testField/restriction specs to match source file names

b20a63e

Remove empty spec files

5fe9133

Add copyright text to test fixtures

aaa6ee6

Result type can have Failure data, remove Either type

4e77655

Add convertValue functionality

a814f80

- functions for converting values for field, record, schema, and dictionary - update to validation error type names to be shared with convert value types - includes test specs for each convert function exported

Fix simple typos

3d33b54

Update client processing functions to use new validation functions

3aa3dbb

Remove validation tests from client that are now done in validation p…

da1e28c

…ackage

Unique restriction rules enforced for array fields

3611c99

Remove deprecated types and test implementations from legacy client

e4d7b1d

Remove old data conversion implementation from client processing

c3b58b1

Remove old data valdiation pipeline from client processing

cea4f0e

Remove tests around non-standard default value setting

501a077

Test fixtures for unique restrictions with arrays

f5882cd

joneubank marked this pull request as ready for review July 22, 2024 04:32

Buwujiu assigned joneubank Jul 22, 2024

joneubank added 2 commits July 23, 2024 12:50

Update property names in RecordError and TestResult types (#216)

13949af

* Change `FieldDetails` property `value` to `fieldValue` * Change TestResult invalid property to be `details`

Rename convert functions as parse (#219)

3a0faa6

joneubank merged commit 1dee40a into develop Jul 27, 2024

joneubank deleted the feat/validations-refactor branch July 27, 2024 19:21

joneubank mentioned this pull request Jul 27, 2024

Refactor Lectern Validation and Parsing in preparation for conditional restrictions #217

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Validation code in preparation for Conditional Restrictions#215

Refactor Validation code in preparation for Conditional Restrictions#215
joneubank merged 60 commits intodevelopfrom
feat/validations-refactor

joneubank commented Jul 1, 2024 •

edited

Loading

Uh oh!

joneubank commented Jul 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joneubank commented Jul 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Description of Changes

Separation of data parsing and data validation

Functions for validating different groups of data (field, record, schema, dictionary)

Fully typed validation errors with logical structure

Code Content

Validation package

Parse Value Functions

Validation functions

Removed Features

Client package

Processing functions

Uh oh!

joneubank commented Jul 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joneubank commented Jul 1, 2024 •

edited

Loading

Functions for validating different groups of data (`field`, `record`, `schema`, `dictionary`)