Skip to content

Add ability to store and load constraints from file #2891

@frances-h

Description

@frances-h

Problem Description

With the auto-detection of constraints, we are now able to easily discover and understand which constraints to apply to complex, multi-table datasets. However, there's no good way to list or store the constraints that should be used for a given dataset. This makes it difficult to perform demos or benchmarks of datasets with constraints.

We should add a way to store the constraints that should be applied to a dataset, and load them to apply them to a synthesizer.

Expected behavior

Add the set_constraints method to all synthesizers. Given a JSON file of constraints, the method should instantiate and apply all the constraints to the synthesizer.

The get_constraints method should also be modified to now have a new parameter, output_filepath. By default, this parameter should be None. If provided, then the method should write the constraints currently applied to the synthesizer to the given JSON file.

Constraints JSON File Spec

The constraints for a dataset can be specified as a list of dictionaries, where each dictionary defines a particular constraint. Each dictionary has the following keys:

  • class_name: The name of the constraint class. These can either be classes in the cag module, or the cag.sandbox module. Ad-hoc programmable constraints are not supported here! They must be added to the sandbox module to be used.
  • parameters: A dictionary that contains all the input parameters for that particular constraint class.

The constraints should be listed in the order in which they need to be applied.

<synthesizer>.set_constraints

Given the file of constraints, this method should create the constraint objects and add them to the synthesizer.

  • If there are already any constraints that have been added to the synthesizer, then this method should delete them and warn the user that existing constraints are being delete error that we cannot set constraints if constraints have already been added to the synthesizer.
  • This method should add constraints from the file one at a time. The constraint classes should be found in the main sdv.cag module or in sdv.cag.sandbox (we should check both places, preferencing sdv.cag module).
  • If a constraint cannot be added for some reason (eg. the class is not found or a table/column it's referencing cannot be found), then it should produce a warning to the user (saying that the constraint cannot be added) and then skip over to the next constraint.

Parameters:

  • filepath (str, required): A filepath of the constraints JSON file, which should be in the format specified in the previous section.

Output: None

NOTE: After setting constraints from a file, a user should still be able to add additional constraints through the add_constraints method.

<synthesizer>.get_constraints

This function already exists. Currently it returns a list of all the constraint objects that it contains.

We should modify this function to include a parameter called output_filepath. If provided, the function should write a JSON file with all the constraints information to the file.

Parameters:

  • output_filepath (str, optional): An optional string containing the path to a JSON file to write the constraints. The JSON file should not already exist in the filesystem. Defaults to None.

Output: The function should always return a list of constraints. If the output filepath is provided, then it should additionally write a file with the constraints JSON.

Additional context

Moved to Community from datacebo/sdv-enterprise#2060

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions