AttributeError on UniqueCombinations constraint with non-strings #196

LihuaXiong2020 · 2020-09-17T22:27:45Z

SDV version: 0.4.0
Python version: 3.6.8
Operating System: Windows

Description & What I did

Defined certain columns of the dataframe as categorical in the metadata class
Specified UniqueCombinations constraint based on those columns
Passed in the constraint to SDV with GaussianCopula
called sdv.fit() and had "AttributeError: Can only use .str accessor with string values...", which can be traced back to line 98 (_validate_separator function) of sdv/constraints/tabular.py

Reproduce

`
import pandas as pd
from sdv.constraints import UniqueCombinations
from sdv.tabular import GaussianCopula

df = pd.DataFrame({"cat_a": [1,2,3], "cat_b": [4,5,6], "value": [0.5, 1.0, 1.5]})
unique_comb_segments = UniqueCombinations(
columns=[
"cat_a",
"cat_b"
],
handling_strategy="transform"
)
model = GaussianCopula(constraints=[unique_comb_segments])
model.fit(df)

`

Error:

`
AttributeError Traceback (most recent call last)
in
8 )
9 model = GaussianCopula(constraints=[unique_comb_segments])
---> 10 model.fit(df)

~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/sdv/tabular/base.py in fit(self, data)
100 """
101 if not self._metadata_fitted:
--> 102 self._metadata.fit(data)
103
104 self._num_rows = len(data)

~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/sdv/metadata/table.py in fit(self, data)
446 data = self._anonymize(data)
447
--> 448 data = self._fit_transform_constraints(data)
449 self._fit_hyper_transformer(data)
450 self.fitted = True

~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/sdv/metadata/table.py in _fit_transform_constraints(self, data)
330 self._constraints[idx] = constraint
331
--> 332 data = constraint.fit_transform(data)
333
334 return data

~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/sdv/constraints/base.py in fit_transform(self, table_data)
124 Transformed data.
125 """
--> 126 self.fit(table_data)
127 return self.transform(table_data)
128

~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/sdv/constraints/tabular.py in fit(self, table_data)
119 """
120 self._separator = '#'
--> 121 while not self._valid_separator(table_data):
122 self._separator += '#'
123

~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/sdv/constraints/tabular.py in _valid_separator(self, table_data)
96 """
97 for column in self._columns:
---> 98 if table_data[column].str.contains(self._separator).any():
99 return False
100

~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/pandas/core/generic.py in getattr(self, name)
5173 or name in self._accessors
5174 ):
-> 5175 return object.getattribute(self, name)
5176 else:
5177 if self._info_axis._can_hold_identifiers_and_holds_name(name):

~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/pandas/core/accessor.py in get(self, obj, cls)
173 # we're accessing the attribute of the class, i.e., Dataset.geo
174 return self._accessor
--> 175 accessor_obj = self._accessor(obj)
176 # Replace the property with the accessor object. Inspired by:
177 # http://www.pydanny.com/cached-property.html

~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/pandas/core/strings.py in init(self, data)
1915
1916 def init(self, data):
-> 1917 self._inferred_dtype = self._validate(data)
1918 self._is_categorical = is_categorical_dtype(data)
1919

~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/pandas/core/strings.py in _validate(data)
1965
1966 if inferred_dtype not in allowed_types:
-> 1967 raise AttributeError("Can only use .str accessor with string " "values!")
1968 return inferred_dtype
1969

AttributeError: Can only use .str accessor with string values!
`

csala · 2020-09-18T13:01:46Z

Thanks for reporting this @LihuaXiong2020

I think that the problem is not really the categorical data in general but just the categorical data made of integer values, so the title might be a bit misleading.

Would you mind editing the title to something like: "AttributeError when using UniqueCombinations constraint with integer values"?

It would also be helpful if you could post a short snippet of code showing how to reproduce the error.

LihuaXiong2020 · 2020-09-18T19:16:23Z

Hi @csala, I think it's not just for integers, cuz I transformed the integers in to categoricals and it appears UniqueCombinations can only work with strings. Would it be possible to extend it to cover other dtypes?

Sure, I'll try to construct a reproducible example.

csala · 2020-09-18T19:28:06Z

Oh, yes, I actually meant this: Values that are not strings, independently on the type that they have in the metadata.
I further updated the title to reflect that.

This will be a tricky one, because even if we convert the values into strings on the fly inside the constraint, if we have mixed types it will be hard to keep track of what the original type was.

For example, if we have a column that contains two categories with different dtypes, like ["a", 1], converting the 1 into a string when we combine this column with the other one will be easy, but knowing that we have to cast the "1" back to a 1 when we split the columns again will be harder. And, going even further, we might have something like ["1", 1] (so, integers and their string representation mixed). We will have to think carefully about how to keep track of this!

LihuaXiong2020 · 2020-09-18T20:52:47Z

Hi @csala, I reproduced in python 3.7 but it's the same as python 3.6.8. The specification of the Categorical type through metadata is also omitted, as it's the same case with pure integers.

csala added the bug Something isn't working label Sep 18, 2020

csala modified the milestones: 0.4.1, 0.4.2 Sep 18, 2020

LihuaXiong2020 changed the title ~~Categorical data incompatible with UniqueCombinations~~ AttributeError when using UniqueCombinations constraint with integer values Sep 18, 2020

csala changed the title ~~AttributeError when using UniqueCombinations constraint with integer values~~ AttributeError on UniqueCombinations constraint with non-strings Sep 18, 2020

csala modified the milestones: 0.4.2, 0.4.3 Sep 19, 2020

csala modified the milestones: 0.4.3, 0.4.4 Sep 28, 2020

csala modified the milestones: 0.4.4, 0.4.5 Oct 6, 2020

csala modified the milestones: 0.4.5, 0.4.6 Oct 16, 2020

csala removed this from the 0.4.6 milestone Nov 23, 2020

csala mentioned this issue May 20, 2021

UniqueCombination constraint with numerical values #434

Closed

katxiao linked a pull request Jun 25, 2021 that will close this issue

Expand UniqueCombinations constraint to handle non-strings #480

Closed

This was referenced Jun 25, 2021

Expand UniqueCombinations constraint to handle non-strings #480

Closed

Expand UniqueCombinations constraint to handle non-strings #481

Merged

katxiao removed a link to a pull request Jun 25, 2021

Expand UniqueCombinations constraint to handle non-strings #480

Closed

katxiao linked a pull request Jun 25, 2021 that will close this issue

Expand UniqueCombinations constraint to handle non-strings #481

Merged

csala closed this as completed in #481 Jun 29, 2021

csala assigned katxiao Jun 29, 2021

csala added this to the 0.11.0 milestone Jun 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError on UniqueCombinations constraint with non-strings #196

AttributeError on UniqueCombinations constraint with non-strings #196

LihuaXiong2020 commented Sep 17, 2020 •

edited

csala commented Sep 18, 2020 •

edited

LihuaXiong2020 commented Sep 18, 2020

csala commented Sep 18, 2020

LihuaXiong2020 commented Sep 18, 2020 •

edited

AttributeError on UniqueCombinations constraint with non-strings #196

AttributeError on UniqueCombinations constraint with non-strings #196

Comments

LihuaXiong2020 commented Sep 17, 2020 • edited

Description & What I did

Reproduce

csala commented Sep 18, 2020 • edited

LihuaXiong2020 commented Sep 18, 2020

csala commented Sep 18, 2020

LihuaXiong2020 commented Sep 18, 2020 • edited

LihuaXiong2020 commented Sep 17, 2020 •

edited

csala commented Sep 18, 2020 •

edited

LihuaXiong2020 commented Sep 18, 2020 •

edited