Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GreaterThan constraint raises TypeError when using datetime #596

Closed
amontanez24 opened this issue Sep 20, 2021 · 2 comments · Fixed by #597
Closed

GreaterThan constraint raises TypeError when using datetime #596

amontanez24 opened this issue Sep 20, 2021 · 2 comments · Fixed by #597
Assignees
Labels
bug Something isn't working
Milestone

Comments

@amontanez24
Copy link
Contributor

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDV version: Any
  • Python version: Any
  • Operating System: Any

Error Description

When using the GreaterThan constraint with a datetime value as the high scalar, it raises the following error

    438         high = self._get_value(table_data, 'high')
    439 
--> 440         return self.operator(high, low).all(axis=1)
    441 
    442     def _transform(self, table_data):

TypeError: '>=' not supported between instances of 'Timestamp' and 'int'

I suspect this might have a similar fix to issue #590. I think it has to do with the way we are doing is_valid in the GreaterThan constraint.

Steps to reproduce

I used the student_placements dataset in this case.

transform_gt_constraint = GreaterThan(
    low='start_date',
    high='end_date',
    handling_strategy='reject_sampling')

upper_limit_constraint = GreaterThan(
    low='end_date',
    high=pd.to_datetime('now'),
    scalar='high',
    handling_strategy='reject_sampling')

# Add the constraints and fit the data
constraints = [transform_gt_constraint, upper_limit_constraint]
multi_constraint_gc = GaussianCopula(constraints=constraints)
multi_constraint_gc.fit(student_placements)
@amontanez24 amontanez24 added bug Something isn't working pending review labels Sep 20, 2021
@npatki
Copy link
Contributor

npatki commented Sep 20, 2021

I am able to reproduce this error.

Note that if you specify transform instead of reject_sampling, there is a different error:

/usr/local/lib/python3.7/dist-packages/sdv/constraints/tabular.py in _transform(self, table_data)
    458         """
    459         table_data = table_data.copy()
--> 460         diff = self._get_value(table_data, 'high') - self._get_value(table_data, 'low')
    461 
    462         if self._is_datetime:

UFuncTypeError: ufunc 'subtract' cannot use operands with types dtype('O') and dtype('<M8[ns]')

Steps to Reproduce

Do a similar thing as above, except only input a single constraint using the transform strategy

upper_limit_constraint = GreaterThan(
    low='end_date',
    high=pd.to_datetime('now'),
    scalar='high',
    handling_strategy='transform')

# Add the constraints and fit the data
constraints = [upper_limit_constraint]
multi_constraint_gc = GaussianCopula(constraints=constraints)
multi_constraint_gc.fit(student_placements)

@katxiao
Copy link
Contributor

katxiao commented Sep 20, 2021

One workaround to this issue is to use a numpy.datetime64 instead of pandas.Timestamp to specify the datetime.

We are working on a fix now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants