Skip to content

Refactor Validation code in preparation for Conditional Restrictions#215

Merged
joneubank merged 60 commits intodevelopfrom
feat/validations-refactor
Jul 27, 2024
Merged

Refactor Validation code in preparation for Conditional Restrictions#215
joneubank merged 60 commits intodevelopfrom
feat/validations-refactor

Conversation

@joneubank
Copy link
Copy Markdown
Contributor

@joneubank joneubank commented Jul 1, 2024

Summary

This PR refactors the validation package from the version brought in from the original js-lectern-client to a new form that will enable both the development of conditional restrictions, and for enabling the use of validation code in the Lectern server and in the browser.

To enable conditional restrictions we needed to be able to:

  1. Validate fields individually - previously each different restriction test was applied record by record, and so we couldn't provide conditional checks to fields before selecting the tests for that field. Now when we validate a record, we can check the conditional restrictions on each field and only apply tests for the applicable restrictions.

To enable running validation in the browser we need to:

  1. Remove Node specific modules - These were used in script validations to isolate the arbitrary script execution from the node global scope. Note: arbitrary script execution from untrusted dictionaries is always unsafe and so these are being phased out
  2. Separate validation from data parsing - our use case for browser testing involves interacting with form data, not TSVs. Therefore we want to work on typed data.

Description of Changes

Separation of data parsing and data validation

Previously, data parsing/type conversion and data validation occurred together through use of the Lectern client's process, processRecords, and processSchemas functions. These functions would be provided the raw string values as submitted to a server (ie. from a TSV, as is expected for Lectern dictionary data), and the process functions would then validate the structure of this data, parse this data to convert it to the data types defined in a dictionary, and then validate those values using that dictionary.

With this refactor, validation is performed entirely on data that has already been transformed. This allowed us to write separate code for data conversion, which is also provided in this PR. The structure of data is clearly tested both during the data parsing and conversion step, and again during the data validation step. It is important that the data structure and data type checks is repeated at validation time so that we can validate data that is provided programatically through web forms or custom data parsers.

Functions for validating different groups of data (field, record, schema, dictionary)

There are now functions with declarative names for validating data of each of these levels: validateField(), validateRecord(), validateSchema(), and validateDictionary(). Notably, directly testing individual fields was not possible in the previous client implementation - record level processing was the lowest level available. Each of the new validation function requires as inputs only the relevant Lectern Dictionary data (Schema definition for validateSchema for instance). Importantly, each of these validation functions relies on the other validation functions to test its component parts... so validateDictionary uses validateSchema to ensure every schema is valid, and validateSchema uses validateRecord which uses validateField. This ensures the testing is done consistently across the validation suite. Each function introduces new validation tests that can only be done at that scale of data. For example, ForeignKey restrictions can only be tested when a full dictionary worth of data is provided.

Fully typed validation errors with logical structure

The validation functions now return a Result object that either indicates that the data is completely valid, or provides error as new structured data. Previously, errors were reported as a flat list of validation errors. Now, errors are grouped by schema, then grouped by record, and finally grouped by field.

Code Content

Validation package

The validation package now contains separate code for converting raw input values, and for validating data with restrictions from the lectern dictionaries.

Parse Value Functions

These are functions to parse raw string values from submitted TSVs and convert them into the proper data types in JS. These apply validations to the data value to ensure it matches the required type (string, number, integer, boolean). This also includes all logic for parsing array values based on the default delimiter value (,).

This functionality was only partially implemented in the previous client code. Array parsing was not previously done, and data conversion was bundles with validation. Now, data conversion can be done separately from validation, allowing for type conversion errors to be communicated and addressed before submitted for validation.

4 validation functions are provided:

  1. parseFieldValue
  2. parseRecordValues
  3. parseSchemaValues
  4. parseDictionaryValues

Validation functions

The validation library now exports the following 4 validation functions:

  1. validateField
    Validate a single field based on a SchemaField definition. This function requires as an argument the value of the entire DataRecord that this field belongs to. This will be used to determine which conditional restrictions need to be applied. This function tests that the field has the correct value type, and then tests all field restrictions:
    • codeList
    • range
    • regex
    • required
      Note that unique restriction cannot be tested at field level since we require all data for the schema.
  2. validateRecord
    Validates all fields in a record by running the validateField function for them. In addition, reports unrecognized fields in the record.
  3. validateSchema
    Validates all records belonging to a schema. This will apply tests for the following restrictions that require all schema data:
    • unique
    • uniqueKey
  4. validateDictionary
    Given data for all schemas, validates all data in a data set for a dictionary. This includes running validateSchema for each schema, as well as applying foreignKey tests where on each record of the schemas with this restriction.

Removed Features

Some small pieces of functionality have been intentionally removed:

  • default values: default values were being interpreted from a meta property, which is not a defined part of the lectern specification. This has been removed. If this functionality is needed we can create a standard mechanism in the Lectern spec, or recommend users add logic to process their data, applying defaults, before passing data to the Lectern client
  • script restrictions: script restriction testing has been removed. It will be replaced with conditional restrictions and the Lectern schema/documentation will be updated to reflect this. This was done because there is no safe way to run arbitrary code execution from a script in a Lectern dictionary, and the code that was in place to do this forced validation to happen on the server side.

Client package

The client package has had all data conversion and validation logic removed since this is now handled in the validation package. The processing code has been refactored to rely on the conversion and validation code instead.

Processing functions

The processing functions were providing a mechanism to perform both data conversion and data validation. To keep this functionality available, the old processing functions have been replaced with new ones performing similar roles.

  1. Input type updates
    The old processing functions required a pre-processing step that had already parsed all array fields into array data, but had not converted the values from raw text fields. Now, the inputs should strictly contain string values and the lectern standard conversion functionality will be applied. This will separate the values in arrays based on a default delimiter value of ,. The ability to customize the delimiter value will be added to the Lectern schema.
  2. Processing Function name updates:
    The names of the processing functions have been updated to correspond to the now common Lectern terminology.
  • processSchemas() -> processDictionary() : This function will process an entire dictionary worth of data, taking as input all records for all schemas in the dictionary.
  • processRecords() -> processSchema(): This function will process a schema worth of data, taking all records for a single schema.
  • process -> processRecord(): This function a single record, taking only one object as input for a single schema.

joneubank added 30 commits June 22, 2024 01:04
It should be mentioned here that the previously committed version of this had a mix up with the exclusive conditions, where the edge cases (value equal to the edge of the range) was incorrectly valdiated in the version imported from js-lectern-client. This is fixed now with tests to check these cases.
Original implementation had record processing logic built into the restriction test logic. This refactor sepearates restriction tests to apply to a specific value only, based on the defined restriction rule. This will enable future development to resolve the restrictions on each field based on conditional restrictions and then test based on the resolved conditions.

The next step is to add validation testing for full data records, full entity data sets, and then full dictionary data sets. These each loop over the previous element but also introduce new tests. For example, entity data sets enforce unique field and uniqueKey restrictions. Dictionary datasets enforce foreignKey restrictions.

Unit tests have been added to fieldValidation function, and to the individual restriction tests. The restriction tests check the detailed set of values vs restriction rules, while the fieldValidation testing is checking that these restrictions are enforced based on schemaField definitions.

Note that script validations are not being run in this code, and that the unique constraint cannot be applied to a field. There is additional changes required to move the unique constraint to a field level property (like isArray) so that it can't be affected by conditional restrictions. It will be validated in the Schema level validation (entity dataset).
…ingle schema

Applies the `validateRecord` validation to every record, plus runs validations that require knowledge of the entire set of records for this schema:
- unique
- uniqueKey
joneubank added 22 commits July 6, 2024 18:39
- applies foreign key restrictions
- detects unrecognized schemas
- collects foreign key errors with other record errors
- functions for converting values for field, record, schema, and dictionary
- update to validation error type names to be shared with convert value types
- includes test specs for each convert function exported
@joneubank joneubank marked this pull request as ready for review July 22, 2024 04:32
@joneubank
Copy link
Copy Markdown
Contributor Author

I still owe some doc files that describe the validation and data conversion functionality before this is ready to be merged. At this point, the code is ready for review.

* Change `FieldDetails` property `value` to `fieldValue`

* Change TestResult invalid property to be `details`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant