Skip to content

CategoricalCAP metric returns 0 if no overlap in known fields #692

@frances-h

Description

@frances-h

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDMetrics version:
  • Python version:
  • Operating System:

Error Description

The CategoricalCAP metric skips evaluation of rows where the known value does not occur in the synthetic value. However, if all rows are skipped the metric is returning a score of 0. In this case, we should return NaN as the score since nothing was computed.

We should also check the other CAP metrics and make sure they also return NaN in this case.

Steps to reproduce

import pandas as pd
from sdv.single_table import CategoricalCAP

real_data = pd.DataFrame(data={
    'col_A': ['a', 'b'],
    'col_B': ['yes', 'yes']
})

synthetic_data = pd.DataFrame(data={
    'col_A': ['x', 'x'],
    'col_B': ['yes', 'yes'] 
})

CategoricalCAP.compute(real_data, synthetic_data, key_fields=['col_A'], sensitive_fields=['col_B'])

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions