The `DCRBaselineProtection` metric is not creating the correct size of random data

### Environment Details
* SDMetrics version: 0.19.1 (DCR Branch)
* Python version: Python 3.11
* Operating System: Linux Colab

### Error Description
The new `DCRBaselineProtection` metric is a measure of privacy of the synthetic data. It asks the Q: _If I were to use random data instead of synthetic data, how much more private would it be?_

For an accurate comparison point, the size of the random data should be the same as the size of the synthetic data. Otherwise, the distance to closest record calculation may give an unfair advantage to either the synthetic or random dataset.

By default, the size of the random dataset is correct. However, if I use the `num_rows_subsample` option, it is not correct. 

For eg, let's say my synthetic data has 50K rows but when calling the metric, I ask for a subsample of 1000 rows only. In this case, the metric should create only 1000 random data rows (to match the synthetic data subsample). Instead, it is currently creating the full 50K rows.

```python
>>> DCRBaselineProtection.compute(
      real_data=real_df,
      synthetic_data=synthetic_df,
      metadata=my_metadata,
      num_rows_subsample=1000,
      num_iterations=3)
0.58808
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The `DCRBaselineProtection` metric is not creating the correct size of random data #743

Environment Details

Error Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The DCRBaselineProtection metric is not creating the correct size of random data #743

Description

Environment Details

Error Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

The `DCRBaselineProtection` metric is not creating the correct size of random data #743