-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Problem Description
Many different privacy metrics (including DCRBaselineProtection, DCROverfittingProtection, BinaryClassifierPrecisionEfficacy, etc.) allow the user to input metadata that describes the different column's sdtypes.
The handling of datetime columns is currently done via the internal _convert_datetime_columns
function. We should improve this function for a more consistent handling of datetime columns.
Expected behavior
SDMetrics should not be responsible for trying to infer datetime formats from datetime strings. This means that SDMetrics should only accept the following types of inputs:
- Columns that are present as
string
values with accompanying metadata that includes adatetime_format
string - Columns that are already converted to
pd.datetime
(In this case nodatetime_format
specification is needed)
If the datetime values are present as string
but there is no datetime_format
in the metadata, we should throw an error.
Error: Datetime column 'start_date' does not have a specified 'datetime_format'. Please add a the required
datetime_format to the metadata or convert this column to 'pd.datetime' to bypass this requirement.
SDMetrics should expect metadata in the SDV 1.0 format. This means that the datetime format should be specified via the 'datetime_format'
property ... not 'format'
.