Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added schema validation to config files #1186

Merged
merged 15 commits into from
May 28, 2021
Merged

Added schema validation to config files #1186

merged 15 commits into from
May 28, 2021

Conversation

tgaddair
Copy link
Collaborator

The purpose of this PR is to catch typos or other misconfigurations in the Ludwig config.yaml before training.

One common occurrence is that users will set a parameter in the config expecting it to do something, but because Ludwig does not validate the schema and will assume reasonable defaults, this can result in errors that go undetected.

This PR introduces schema validation based on jsonschema that is called at LudwigModel creation time. It enforces that certain fields are set, and that certain fields (like preprocessing) do not have any unrecognized subfields.

Note that we do not apply strict subfield checking to every field in the config, in order to make it easier for users to switch between different encoders that take different parameters, for example. We do, however, check that feature types, encoders, tokenizers, and other common parameters all have valid values and that preprocessing in particular does not have unrecognized fields.

@tgaddair tgaddair marked this pull request as ready for review May 27, 2021 16:28
@tgaddair tgaddair requested a review from w4nderlust May 27, 2021 16:28
ludwig/features/binary_feature.py Outdated Show resolved Hide resolved
ludwig/features/vector_feature.py Outdated Show resolved Hide resolved
@w4nderlust w4nderlust merged commit 9f03752 into master May 28, 2021
@w4nderlust w4nderlust deleted the schema branch May 28, 2021 04:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants