-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Problem Description
Our reports (Quality Report, Diagnostic Report) and some of our metrics (eg. DCRBaselineProtection, etc.) currently require the user to input metadata, which describes the different sdtypes in each column. Based on the sdtype, the report/metric internally applies different computation to the column.
For datetime columns, the report/metric is also ultimately responsible for converting different formats. But currently, there are two different places we are doing this:
- Metrics are using the
_convert_datetime_columns
function - Report are using the this other
convert_to_datetime
function
We should consolidate this logic into a single utility function.
Expected behavior
Consolidate the datetime logic into a single utility function. Make sure the logic follows the behavior below.
SDMetrics should not be responsible for trying to infer datetime formats from datetime strings. This means that SDMetrics should only accept the following types of inputs:
- Columns that are present as string values with accompanying metadata that includes a
datetime_format
string OR - Columns that are already converted to
pd.datetime
(In this case nodatetime_format
specification is needed)
If the datetime values are present as string but there is no datetime_format in the metadata, we should throw an error.
Error: Datetime column 'start_date' does not have a specified 'datetime_format'. Please add a the required
datetime_format to the metadata or convert this column to 'pd.datetime' to bypass this requirement.
SDMetrics should expect metadata in the SDV 1.0 format. This means that the datetime format should be specified via the 'datetime_format' property ... not 'format'.
Additional Context
In #740 we are updating the logic of the _convert_datetime_columns
function that all the metrics use to the same logic described above.