The `DCRBaselineProtection` metric is not producing the expected score

### Environment Details
* SDMetrics version: 0.19.1 (DCR Branch)
* Python version: Python 3.11
* Operating System: Linux Colab

### Error Description
The `DCRBaselineProtection` metric is not producing the correct score. There seems to be something wrong in the computation of the median DCR between a dataset and the real data.

We discussed there potentially being 2 different root causes to this issue:

1. Making sure that the DCR computation done for: `synthetic vs. real` and `random vs. real` datasets. More particularly, we want to compare:
    - The _median_ DCR for a synthetic data row and the real dataset and
    - The _median_ DCR for a random data row and the real dataset
2. For numerical data, the distance computation should be based on the range of the _real data column_ (not the synthetic data or random data)

### Steps to reproduce

In this case, I expect the median distance between synthetic and real to be 0.1 
```python
from sdmetrics.single_table.privacy import DCRBaselineProtection
import pandas as pd
import numpy as np

real_data = pd.DataFrame(data={
    'A': [0, 10, 3, 4, 1]}) # the range of this column is 10

synthetic_data = pd.DataFrame(data={
    'A': [5],}) # the DCR between this row and the real dataset is 1/10 = 0.1 

metadata = {
    'columns': {
        'A': { 'sdtype': 'numerical' },}}

# I expect that the median distance between real data and synthetic is 0.1
DCRBaselineProtection.compute_breakdown(
    real_data=real_data,
    synthetic_data=synthetic_data,
    metadata = metadata)
```

In this case, I expect the median distance between synthetic and real to be 0 because there is a null value in both the real and synthetic data.
```python
from sdmetrics.single_table.privacy import DCRBaselineProtection
import pandas as pd
import numpy as np

real_data = pd.DataFrame(data={
    'A': [0, 10, 3, 4, 1, np.nan], })

synthetic_data = pd.DataFrame(data={
    'A': [np.nan], # the DCR between this an the real data is 0 (np.nan exactly matches)})

metadata = {
    'columns': {
        'A': { 'sdtype': 'numerical' }}}

# I expect that the median distance between real data and synthetic is 0
DCRBaselineProtection.compute_breakdown(
    real_data=real_data,
    synthetic_data=synthetic_data,
    metadata = metadata)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The `DCRBaselineProtection` metric is not producing the expected score #742

Environment Details

Error Description

Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The DCRBaselineProtection metric is not producing the expected score #742

Description

Environment Details

Error Description

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

The `DCRBaselineProtection` metric is not producing the expected score #742