Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Unique constraint cannot handle non-default index #617

Closed
csala opened this issue Oct 28, 2021 · 0 comments · Fixed by #619
Closed

The Unique constraint cannot handle non-default index #617

csala opened this issue Oct 28, 2021 · 0 comments · Fixed by #619
Labels
bug Something isn't working
Milestone

Comments

@csala
Copy link
Contributor

csala commented Oct 28, 2021

Environment Details

  • SDV version: v0.12.0

Error Description

The internal implementation of the Unique constraint relies on the input data index.
Because of this, errors such as #616 and #610 happen, but other weird behaviors can be observed, such as the constraint consider that valid data is invalid:

In [30]: data = pd.DataFrame({'unique': [1, 2, 3]}, index=[0, 0, 0])

In [31]: unique.is_valid(data)
Out[31]: 
0     True
1    False
2    False
dtype: bool

Or producing inconsistent results:

In [38]: data = pd.DataFrame({'unique': [1, 2, 1]}, index=[0, 2, 1])

In [39]: unique.is_valid(data)
Out[39]: 
0     True
1    False
2     True
dtype: bool

We should review its implementation to prevent all these scenarios.

@csala csala added bug Something isn't working pending review labels Oct 28, 2021
katxiao pushed a commit that referenced this issue Nov 17, 2021
…#619)

* fix unique constraint on data with column named index
- alters implementation of Unique constraint
- adds test for this behavior

* - tests to show problem of #616 and #617
- first implementation that fixes this

* refactors unique constraint
- removes usage of list
- uses cumcount on group by
  - numbers occurences of group
  - group nr. 0 is first occurence of this group
  - cumcount equal to 0 are the valid rows

* reference Github issues in test

* remove the call of reset_index
- keeps the original index specified

* remove metadata specification
- pass constraint directly as parameter

* remove metadata specification
- specify constraint directly in model creation
- change docstrings to new test

* add test unique constraint with index column
- unique constraint set on test_column
- column named index in dataframe

* use older graphviz version
- version 0.18.1 errors

* return to old setup.py
@amontanez24 amontanez24 added this to the 0.13.0 milestone Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants