Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add combiner schema validation #1347

Merged
merged 63 commits into from
Oct 25, 2021
Merged

Conversation

ksbrar
Copy link
Collaborator

@ksbrar ksbrar commented Oct 4, 2021

The general approach here is as follows:

  1. Define a dataclass containing the config data for the combiner (e.g. ConcatCombinerConfig). The reason for making this a separate class instead of using the existing combiner (a tf.keras.Model or torch.nn.Module) is to separate out the params passed in internally (like input_features) from the pure user config values.
  2. Define default values, valid parameter ranges, data types, and other validation logic using Marshmallow, and specifically the marshamllow-dataclass package, which allows us to combine validation schema with the data.
  3. Serialize this schema into a JSON Schema using marshmallow-jsonschema, and validate all user configs against this schema. This is useful for allowing the schema to be consumed by other non-Python systems, like GUIs or code editors.

For this PR, we're only focusing on combiners, but the idea is to expand this to other parts of the config over time.

@ksbrar ksbrar self-assigned this Oct 4, 2021
@ksbrar ksbrar marked this pull request as ready for review October 4, 2021 12:03
@ksbrar ksbrar requested a review from tgaddair October 4, 2021 12:06
@ksbrar ksbrar requested a review from tgaddair October 7, 2021 21:39
Copy link
Collaborator

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work on this! I especially like all the tests.

Looks like a few tests are still failing in CI. Can you take a look?

ludwig/combiners/combiners.py Outdated Show resolved Hide resolved
ludwig/combiners/combiners.py Outdated Show resolved Hide resolved
ludwig/combiners/combiners.py Outdated Show resolved Hide resolved
return SequenceCombinerParams


class TabNetCombinerParams(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably err on the side of being more permissive here, since I don't know offhand what the valid ranges are for all these.

I would say we should make the following changes:

  • relaxation_factor: NonNegativeFloat
  • bn_epsilon: float
  • bn_momentum: float
  • sparsity: float

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

ludwig/combiners/combiners.py Outdated Show resolved Hide resolved
ludwig/combiners/combiners.py Outdated Show resolved Hide resolved
conds.append(combiner_cond)
return conds

def get_custom_definitions():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. What does this do?

Copy link
Collaborator Author

@ksbrar ksbrar Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pulls out any $refs (e.g. for the enums) from each combiner's schema and moves it to a root-level 'definitions' sub-schema. This way there is less nested stuff and the logic/schemas are cleaner overall. But we can move it to a different part of the full schema if need be.

tests/ludwig/utils/test_schema.py Outdated Show resolved Hide resolved
@ksbrar ksbrar requested a review from tgaddair October 20, 2021 18:42
@tgaddair
Copy link
Collaborator

@ksbrar, wish you had mentioned this concern earlier, I could have saved you some time ;)

https://pypi.org/project/marshmallow-dataclass/

Basically, there's a package we can use to combine the schema and the data into a single object.

@tgaddair tgaddair merged commit 2eaffcd into master Oct 25, 2021
@tgaddair tgaddair deleted the add_combiner_schema_validation branch October 25, 2021 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants