Skip to content

Improve handling of datetime columns for metrics #740

@npatki

Description

@npatki

Problem Description

Many different privacy metrics (including DCRBaselineProtection, DCROverfittingProtection, BinaryClassifierPrecisionEfficacy, etc.) allow the user to input metadata that describes the different column's sdtypes.

The handling of datetime columns is currently done via the internal _convert_datetime_columns function. We should improve this function for a more consistent handling of datetime columns.

Expected behavior

SDMetrics should not be responsible for trying to infer datetime formats from datetime strings. This means that SDMetrics should only accept the following types of inputs:

  1. Columns that are present as string values with accompanying metadata that includes a datetime_format string
  2. Columns that are already converted to pd.datetime (In this case no datetime_format specification is needed)

If the datetime values are present as string but there is no datetime_format in the metadata, we should throw an error.

Error: Datetime column 'start_date' does not have a specified 'datetime_format'. Please add a the required 
datetime_format to the metadata or convert this column to 'pd.datetime' to bypass this requirement.

SDMetrics should expect metadata in the SDV 1.0 format. This means that the datetime format should be specified via the 'datetime_format' property ... not 'format'.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions