DCROverfitting and DCRBaseline metrics produce too many warnings about missing columns.

### Environment Details

Please indicate the following details about the environment in which you found the bug:

* SDMetrics version: 0.19.1 (DCR Branch)
* Python version: Python 3.11
* Operating System: Jupyter, MacOS

### Error Description

User Warning appears multiple times:
```
'The columns ('billing_address', 'credit_card_number', 'guest_email') are in the metadata but they are not present in the data.' 
```
This is caused by the `_process_data_with_metadata` -> `_remove_missing_columns_metadata` multiple times when sanitize the input.

_[Edit from Neha] In our meeting on Mar 13, 2025 we discussed the following as the root cause: Internally, the metrics will drop any columns that are not used for the computation (i.e. id and PII columns). However, these columns are not removed from the metadata, which is causing this consistency. The warnings do not have any impact on the final score._


### Steps to reproduce
From BugHunt [link](https://colab.research.google.com/drive/1vU_CNVm84vEFGrFy832LCHlmTSf5d0Ei#scrollTo=hQhSNDrAVafa&line=12&uniqifier=1):
```
from sdv.datasets.demo import download_demo
from sdv.single_table import GaussianCopulaSynthesizer
from sklearn.model_selection import train_test_split

real_data, metadata = download_demo('single_table', 'fake_hotel_guests')

# Use train_test_split to have a training data set and a holdout set
train_df, holdout_df = train_test_split(real_data, test_size=0.2)
synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.fit(train_df)
synthetic_data = synthesizer.sample(1000)

num_rows_subsample = 100
num_iterations = 1
# SDMetrics uses dictionary of sinlge table metadata and not the Metadata class
metadata_dict = metadata._convert_to_single_table().to_dict()
compute_breakdown_result = DCROverfittingProtection.compute_breakdown(
    train_df, synthetic_data, holdout_df, metadata_dict, num_rows_subsample,num_iterations
)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DCROverfitting and DCRBaseline metrics produce too many warnings about missing columns. #737

Environment Details

Error Description

Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DCROverfitting and DCRBaseline metrics produce too many warnings about missing columns. #737

Description

Environment Details

Error Description

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions