-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve datetime handling in ScalarInequality
and ScalarRange
#853
Improve datetime handling in ScalarInequality
and ScalarRange
#853
Conversation
c5206b5
to
2a7b48d
Compare
Codecov Report
@@ Coverage Diff @@
## master #853 +/- ##
==========================================
+ Coverage 67.65% 68.15% +0.49%
==========================================
Files 38 38
Lines 2823 2867 +44
==========================================
+ Hits 1910 1954 +44
Misses 913 913
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Mostly minor comments about restructuring
sdv/constraints/tabular.py
Outdated
@@ -424,10 +425,17 @@ def _validate_inputs(column_name, value, relation): | |||
if not (isinstance(value, (int, float)) or is_datetime_type(value)): | |||
raise ValueError('`value` must be a number or datetime.') | |||
|
|||
if is_datetime_type(value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is called above can we capture the result in a variable?
sdv/constraints/tabular.py
Outdated
return cast_to_datetime64(value) | ||
|
||
return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the _validate_inputs
method should return a value. If we want to group the logic we can make a parse_value
method that calls _validate_inputs
and then parses to datetime if necessary
@@ -531,6 +541,11 @@ def reverse_transform(self, table_data): | |||
original_column = self._value - diff_column | |||
|
|||
table_data[self._column_name] = pd.Series(original_column).astype(self._dtype) | |||
if self._is_datetime and self._datetime_format: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there ever a case where self._datetime_format
is True, but self._is_datetime
isn't? There shouldn't be right because the former is only set if the latter is True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If self._is_datetime
is True
but we are not able to detect
the format of the date time, then we can't format with None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the other way around. Couldn't the if statement just be if self._datetime_format
? Why do we need to check the self._is_datetime
as well?
Has the test coverage been updated? If not it seems like some cases are missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, I just had a few questions + some typos.
@@ -531,6 +541,11 @@ def reverse_transform(self, table_data): | |||
original_column = self._value - diff_column | |||
|
|||
table_data[self._column_name] = pd.Series(original_column).astype(self._dtype) | |||
if self._is_datetime and self._datetime_format: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the self._datetime_format
check really necessary? It's only False when the data is empty or filled with nans, but in those cases self._is_datetime
should be False, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes self._datetime_format
may not be learnt (it can return None
) since the learning of the format does not support ms
:
In [32]: detect_datetime_format('2021-02-02T00:00:00.000000')
In [33]: detect_datetime_format('2021-02-02T00:00:00')
Out[33]: '%Y-%m-%dT%H:%M:%S'
|
||
if self._is_datetime: | ||
table_data[self.column_name] = pd.to_datetime(data) | ||
table_data[self._column_name] = pd.to_datetime(data) | ||
if self._datetime_format: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, do we really need this check?
``numpy.datetime64`` value or values. | ||
""" | ||
if isinstance(value, str): | ||
value = pd.to_datetime(value).to_datetime64() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to do to_datetime
and to_datetime64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used it to ensure it ends up being numpy.datetime64[ns]
that way all the datetimes are in the same magnitude.
sdv/constraints/utils.py
Outdated
|
||
|
||
def detect_datetime_format(value): | ||
"""Detect if possible the datetime format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring missing a word?
def test__validate_inputs_with_numerical_value(self): | ||
"""Test the ``_validate_inputs`` method. | ||
|
||
Ensure the method crashes when the column name is not a string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect, method doesn't crash.
def test__validate_inputs_with_datetime_value(self): | ||
"""Test the ``_validate_inputs`` method. | ||
|
||
Ensure the method crashes when the column name is not a string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect, method doesn't crash.
tests/unit/constraints/test_utils.py
Outdated
def test_is_datetime_type_with_datetime_str(): | ||
"""Test the ``is_datetime_type`` function when an valid datetime string is passed. | ||
|
||
Expect to return False when an int variable is passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be expected to return True.
tests/unit/constraints/test_utils.py
Outdated
Input: | ||
- string | ||
Output: | ||
- True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
False
tests/unit/constraints/test_utils.py
Outdated
def test_is_datetime_type_with_invalid_str(): | ||
"""Test the ``is_datetime_type`` function when an invalid string is passed. | ||
|
||
Expect to return False when an int variable is passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Str, not int
|
||
# Assert | ||
pd.testing.assert_frame_equal(table_data, out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you delete this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple small comments and address Felipe's comment on the test file and then this is good to go!
sdv/constraints/tabular.py
Outdated
if value_is_datetime: | ||
if not isinstance(value, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can just be an and on one level right? ie
if value_is_datetime and not isinstance(value, str):
942bd90
to
be3e27c
Compare
sdv/constraints/tabular.py
Outdated
raise ValueError('`value` must be a number or datetime.') | ||
|
||
if value_is_datetime not isinstance(value, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should have an and in between
2413f01
to
8594098
Compare
8594098
to
aa33b3e
Compare
Resolves #819