Skip to content

Consolidate handling of datetime columns #741

@npatki

Description

@npatki

Problem Description

Our reports (Quality Report, Diagnostic Report) and some of our metrics (eg. DCRBaselineProtection, etc.) currently require the user to input metadata, which describes the different sdtypes in each column. Based on the sdtype, the report/metric internally applies different computation to the column.

For datetime columns, the report/metric is also ultimately responsible for converting different formats. But currently, there are two different places we are doing this:

We should consolidate this logic into a single utility function.

Expected behavior

Consolidate the datetime logic into a single utility function. Make sure the logic follows the behavior below.

SDMetrics should not be responsible for trying to infer datetime formats from datetime strings. This means that SDMetrics should only accept the following types of inputs:

  • Columns that are present as string values with accompanying metadata that includes a datetime_format string OR
  • Columns that are already converted to pd.datetime (In this case no datetime_format specification is needed)

If the datetime values are present as string but there is no datetime_format in the metadata, we should throw an error.

Error: Datetime column 'start_date' does not have a specified 'datetime_format'. Please add a the required 
datetime_format to the metadata or convert this column to 'pd.datetime' to bypass this requirement.

SDMetrics should expect metadata in the SDV 1.0 format. This means that the datetime format should be specified via the 'datetime_format' property ... not 'format'.

Additional Context

In #740 we are updating the logic of the _convert_datetime_columns function that all the metrics use to the same logic described above.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions