You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, I would like a metric that checks the integrity of my inter-table relationships.
Expected behavior
Add a new column_pairs metric that calculates the percent of foreign keys values that reference a real parent value.
This metric takes in primary key and foreign key column pairs.
Attributes
The metric should have the following attributes:
name: 'ReferentialIntegrity'
goal: Goal.MAXIMIZE
min_value: 0.0
max_value: 1.0
Methods
The metric should also define the following methods
compute(real_data, synthetic_data): Compute the score for the metric. The returned score should be the percent of foreign key values that reference a value in the primary key column.
Parameters:
(required) real_data: a tuple of 2 pandas.Series objects. The first is the primary key column and the second is the foreign key column from the real data. (Note that this is different than other column_pair metrics)
(required) synthetic_data: a tuple of 2 pandas.Series objects. The first is the primary key column and the second is the foreign key column from the synthetic data. (Note that this is different than other column_pair metrics)
Problem Description
As a user, I would like a metric that checks the integrity of my inter-table relationships.
Expected behavior
column_pairs
metric that calculates the percent of foreign keys values that reference a real parent value.Attributes
The metric should have the following attributes:
name
:'ReferentialIntegrity'
goal
:Goal.MAXIMIZE
min_value
: 0.0max_value
: 1.0Methods
The metric should also define the following methods
compute(real_data, synthetic_data)
: Compute the score for the metric. The returned score should be the percent of foreign key values that reference a value in the primary key column.real_data
: a tuple of 2pandas.Series
objects. The first is the primary key column and the second is the foreign key column from the real data. (Note that this is different than othercolumn_pair
metrics)synthetic_data
: a tuple of 2pandas.Series
objects. The first is the primary key column and the second is the foreign key column from the synthetic data. (Note that this is different than othercolumn_pair
metrics)The text was updated successfully, but these errors were encountered: