Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reports should not crash if there are no relationships #481

Closed
npatki opened this issue Oct 24, 2023 · 0 comments · Fixed by #489
Closed

Reports should not crash if there are no relationships #481

npatki opened this issue Oct 24, 2023 · 0 comments · Fixed by #489
Assignees
Labels
bug Something isn't working
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Oct 24, 2023

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDMetrics version: 0.11.1 (latest)
  • Python version: 3.10
  • Operating System: Linux (Colab Notebook)

Error Description

When any metric or property in the report can't be easily computed, I expect the report to be fault tolerant: The report should skip over the metric/property (recording a NaN for the score). This will allow me to still see the other metrics/properties that can be computed. So overall, the report should not crash.

In practice, I see that the report crashes in a specific case: If there is no 'relationships' section in the metadata.

Note that multi-table metadata without any 'relationships' is invalid from the SDV perspective. However, SDMetrics should still be able to handle it in a fault-tolerant way.

Steps to reproduce

from sdmetrics import load_demo
from sdmetrics.reports.multi_table import QualityReport

real_data, synthetic_data, metadata = load_demo(modality='multi_table')
del metadata['relationships']
report = QualityReport()

report.generate(real_data, synthetic_data, metadata, verbose=True)

Output:

KeyError: 'relationships'

See stack trace below.
stack_trace.txt

Expected

The properties that rely on relationships (such as Cardinality or the new Intertable Trends) should be recorded as NaN instead of causing the report to crash. The tqdm progress bar can progress to 100%.

Generating report ...
(1/4) Evaluating Column Shapes: : 100%|██████████| 13/13 [00:00<00:00, 846.49it/s]
(2/4) Evaluating Column Pair Trends: : 100%|██████████| 22/22 [00:00<00:00, 131.90it/s]
(3/4) Evaluating Cardinality: 100%|██████████| 22/22 [00:00<00:00, 131.90it/s]
(4/4) Evaluating Intertable Trends: 100%|██████████| 22/22 [00:00<00:00, 131.90it/s]

Overall Quality Score: 90.4%

Properties:
- Column Shapes: 75.2%
- Column Pair Trends: 90.37%
- Cardinality: NaN
- Intertable Trends: NaN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants