Skip to content

When running Quality Report, ContingencySimilarity produces a RuntimeWarning (The values in the array are unorderable.) #656

@npatki

Description

@npatki

Environment Details

  • SDMetrics version: 0.16.0 (latest)

Error Description

The ContingencySimilarity metric produces a RuntimeWarning whenever there are NaN values in a column and there are different combinations of values that appear in the real and synthetic data.

Fortunately, it appears the the computed score is unaffected -- i.e. if you replace all the NaN values in the real/synthetic data with a non-null value, the warning goes away and the score remains the same. So this is not really a concern for the overall quality score, but it is still annoying to see the warning printed out so many times.

Steps to reproduce

import pandas as pd
import numpy as np

from sdmetrics.column_pairs import ContingencySimilarity

real_data = pd.DataFrame(data={
    'A': ['value']*4,
    'B': ['1', '2', '3', np.nan]
})

synthetic_data = pd.DataFrame(data={
    'A': ['value']*3,
    'B': ['1', '2', np.nan]
})

ContingencySimilarity.compute(
    real_data=real_data[['A', 'B']],
    synthetic_data=synthetic_data[['A', 'B']]
)
/usr/local/lib/python3.10/dist-packages/sdmetrics/column_pairs/statistical/contingency_similarity.py:47: RuntimeWarning: The values in the array are unorderable. Pass `sort=False` to suppress this warning.
  combined_index = contingency_real.index.union(contingency_synthetic.index)
0.75

Additional Context

From @pvk-developer: This may be happening due to the changes we made in: #625 Before we used crosstab and now we use union.

When using union, it doesn't really seem like the result needs to be sorted, so we might be able to fix it by just turning the sorting off.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions