feat: Add env var `LUDWIG_SCHEMA_VALIDATION_POLICY` to change marshmallow validation strictness #3226

tgaddair · 2023-03-08T18:05:42Z

Closes #3218.

…n config

ksbrar

My only question is how extensively have you tested this locally? (Last thing I was fiddling around with pytest env variables and was about to try Ludwig CLI).

In particular, I want to check if there are any breakages when RAISE is set for schemas that explicitly allow additional params via JSON. That means (based on a ctrl-f):

input features (top-level)
output features (top-level)
preprocessing
augmentation

Really, just one of these cases suffices as a sanity check for the rest.

(Also, I believe we're covered but we have to be careful that any internal parameters are in fact declared on their respective schemas (e.g. proc_column for input features) - otherwise RAISE will reject them.)

tgaddair · 2023-03-08T18:46:11Z

It's a good point @ksbrar, I haven't tested this extensively! If you have some cycles to test it out, please take a look and let me know what you find.

github-actions · 2023-03-08T20:06:06Z

Unit Test Results

        6 files         6 suites 6h 45m 55s ⏱️
  4 087 tests   4 044 ✔️   43 💤 0 ❌
12 282 runs 12 147 ✔️ 135 💤 0 ❌

Results for commit 8530a26.

♻️ This comment has been updated with latest results.

ksbrar · 2023-03-14T04:54:34Z

I've tested out this PR. It doesn't work as is because the deprecation warning PR I merged a little while ago is actually performing the wrong task and is too aggressive in filtering parameters.

This code (#3118) will filter out and log any and every unknown parameter as deprecated - which is the incorrect thing to do, as there is a clear distinction between a purely "unknown" or invalid parameter and a "deprecated" one - before marshmallow even has a chance to raise an error about the unknown parameters.

I just pushed a simple fix that gets us to what we want (a user sets this environment flag to raise, and they will see marshmallow validation errors for unknown parameters), but I want to sync with @justinxzhao (or @tgaddair ) about how deprecated parameters should be marked and parsed.

justinxzhao · 2023-03-14T21:20:30Z

Synced offline -- upgrading the config, removing known deprecated parameters, and logging warnings for them is handled by upgrade_config_dict_to_latest_version.

We should be good to proceed with this PR (just blocked on writing a test).

Follow-up PRs to continue progress towards making STRICT the default policy for Ludwig configs:

Make STRICT the default
- Clean up preload deprecation logging (feat: Raise deprecation warnings for unknown parameters #3118) as it's no longer needed
- Set policy to STRICT by default
- set additionalProperties on all jsonschema to False
- Remove os.environ var
Remove explicit call to JSON validation and rely purely on Marshmallow deserialziation
- Remove call to check_schema() in model_types::base.py (assuming parity)
- Keep check_schema() around for unit tests to verify pure-JSON-validation behavior.

ksbrar · 2023-03-14T23:32:21Z

@justinxzhao I spent some time and just couldn't get the environment variable switching to work properly for the test. If someone else wants to give it a go, here is some rough code showing a couple different attempts. I can clean it up/give some clarification if need be:

def rreload(module):
    from importlib import reload

    """Recursively reload modules."""
    reload(module)
    for attribute_name in dir(module):
        attribute = getattr(module, attribute_name)
        if type(attribute) is ModuleType:
            rreload(attribute)


# @mock.patch.dict(os.environ, {LUDWIG_SCHEMA_UNKNOWN_POLICY: "raise"})
def test_user_config_validation_policy():
    from importlib import reload, import_module
    import sys
    from ludwig.constants import LUDWIG_SCHEMA_VALIDATION_POLICY

    # from ludwig.schema.model_types import base

    config = {
        "input_features": [{"name": "foo", "type": "category"}],
        "output_features": [category_feature(encoder={"vocab_size": 2}, reduce_input="sum")],
        "combiner": {"type": "concat", "output_size": 14},
        "trainer": {"unknown": "parameter"},
    }

    # Test default policy (unknown parameters are ignored):
    # ModelConfig.from_dict(config)

    # reload(lsb)

    print(os.environ.get(LUDWIG_SCHEMA_VALIDATION_POLICY))
    os.environ[LUDWIG_SCHEMA_VALIDATION_POLICY] = RAISE
    print(os.environ.get(LUDWIG_SCHEMA_VALIDATION_POLICY))
    # reload(sys.modules["ludwig.schema.model_types.base"])
    # reload(sys.modules["ludwig.schema.model_types.ecd"])
    # reload(sys.modules["ludwig.schema.utils"])
    # reload(sys.modules["ludwig.schema"])

    # mod = import_module("ludwig.schema.model_types.base")
    import ludwig.schema.model_types.base as base

    rreload(base)

    # for name, module in sys.modules.items():
    #     if "ludwig.schema" in name:
    #         reload(module)

    # Test RAISE policy:
    # monkeypatch.setenv(LUDWIG_SCHEMA_UNKNOWN_POLICY, "raise")
    with pytest.raises(ConfigValidationError):
        base.ModelConfig.from_dict(config)

But changing the validation policy is easy enough to verify locally in a terminal, so I think we can proceed with just landing this PR.

ksbrar

Good to go from my standpoint.

…llow validation strictness (#3226) Co-authored-by: ksbrar <kabir@brar.xyz>

Added env var LUDWIG_SCHEMA_UNKNOWN_POLICY to handle unknown fields i…

6e37b98

…n config

tgaddair added the release-0.7 Needs cherry-pick into 0.7 release branch label Mar 8, 2023

tgaddair requested a review from ksbrar March 8, 2023 18:05

tgaddair mentioned this pull request Mar 8, 2023

feat: Allow users to change validation strictness policy #3224

Closed

ksbrar reviewed Mar 8, 2023

View reviewed changes

ksbrar added 2 commits March 14, 2023 00:53

remove unnecessary feature_type

8ab0d3a

change deprecation warning logging

cd2409d

change names

8530a26

ksbrar requested a review from justinxzhao March 14, 2023 23:32

ksbrar approved these changes Mar 14, 2023

View reviewed changes

ksbrar changed the title ~~Added env var LUDWIG_SCHEMA_UNKNOWN_POLICY to handle unknown fields in config~~ feat: Add env var LUDWIG_SCHEMA_VALIDATION_POLICY to change marshmallow validation strictness Mar 14, 2023

uppercase

7df93b5

justinxzhao approved these changes Mar 16, 2023

View reviewed changes

ksbrar merged commit d5d49b8 into master Mar 16, 2023

ksbrar deleted the unknown-env branch March 16, 2023 20:12

tgaddair added a commit that referenced this pull request Mar 16, 2023

feat: Add env var LUDWIG_SCHEMA_VALIDATION_POLICY to change marshma…

203f880

…llow validation strictness (#3226) Co-authored-by: ksbrar <kabir@brar.xyz>

tgaddair added a commit that referenced this pull request Mar 16, 2023

feat: Add env var LUDWIG_SCHEMA_VALIDATION_POLICY to change marshma…

b4000a0

…llow validation strictness (#3226) Co-authored-by: ksbrar <kabir@brar.xyz>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add env var `LUDWIG_SCHEMA_VALIDATION_POLICY` to change marshmallow validation strictness #3226

feat: Add env var `LUDWIG_SCHEMA_VALIDATION_POLICY` to change marshmallow validation strictness #3226

tgaddair commented Mar 8, 2023

ksbrar left a comment •

edited

tgaddair commented Mar 8, 2023

github-actions bot commented Mar 8, 2023 •

edited

ksbrar commented Mar 14, 2023 •

edited

justinxzhao commented Mar 14, 2023

ksbrar commented Mar 14, 2023 •

edited

ksbrar left a comment

feat: Add env var LUDWIG_SCHEMA_VALIDATION_POLICY to change marshmallow validation strictness #3226

feat: Add env var LUDWIG_SCHEMA_VALIDATION_POLICY to change marshmallow validation strictness #3226

Conversation

tgaddair commented Mar 8, 2023

ksbrar left a comment • edited

Choose a reason for hiding this comment

tgaddair commented Mar 8, 2023

github-actions bot commented Mar 8, 2023 • edited

Unit Test Results

ksbrar commented Mar 14, 2023 • edited

justinxzhao commented Mar 14, 2023

ksbrar commented Mar 14, 2023 • edited

ksbrar left a comment

Choose a reason for hiding this comment

feat: Add env var `LUDWIG_SCHEMA_VALIDATION_POLICY` to change marshmallow validation strictness #3226

feat: Add env var `LUDWIG_SCHEMA_VALIDATION_POLICY` to change marshmallow validation strictness #3226

ksbrar left a comment •

edited

github-actions bot commented Mar 8, 2023 •

edited

ksbrar commented Mar 14, 2023 •

edited

ksbrar commented Mar 14, 2023 •

edited