Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to use constraints and conditions in the same time. #379

Closed
dyuliu opened this issue Apr 9, 2021 · 1 comment · Fixed by #433
Closed

Not able to use constraints and conditions in the same time. #379

dyuliu opened this issue Apr 9, 2021 · 1 comment · Fixed by #433
Assignees
Labels
bug Something isn't working
Milestone

Comments

@dyuliu
Copy link
Contributor

dyuliu commented Apr 9, 2021

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDV version: 0.9.0
  • Python version: Anaconda python 3.6.13
  • Operating System: macOS 11.2.3

Error Description

Not able to use constraints and conditions in the same time.

Steps to reproduce

Let’s take this tutorial notebook as example (https://github.com/sdv-dev/SDV/blob/master/tutorials/single_table_data/05_Handling_Constraints.ipynb), if you try to generated samples using conditions like the following, error came out (see the screenshot).

sampled = gc.sample(10, conditions={'age': 40})

You will see the error message like the following

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-9-5410734405d8> in <module>
----> 1 sampled = gc.sample(10, conditions={'age': 40, 'age_when_joined': 30})

~/anaconda3/envs/sreg/lib/python3.6/site-packages/sdv/tabular/base.py in sample(self, num_rows, max_retries, max_rows_multiplier, conditions, float_rtol, graceful_reject_sampling)
    386                 raise ValueError(f'Invalid column name `{column}`')
    387 
--> 388             if len(self._metadata.transform(conditions[[column]]).columns) == 0:
    389                 raise ValueError(f'Conditioning on column `{column}` is not possible')
    390 

~/anaconda3/envs/sreg/lib/python3.6/site-packages/sdv/metadata/table.py in transform(self, data)
    515         LOGGER.debug('Transforming constraints for table %s', self.name)
    516         for constraint in self._constraints:
--> 517             data = constraint.transform(data)
    518 
    519         LOGGER.debug('Transforming table %s', self.name)

~/anaconda3/envs/sreg/lib/python3.6/site-packages/sdv/constraints/tabular.py in transform(self, table_data)
    164         # print(table_data[self._columns].values.tolist())
    165 
--> 166         lists_series = pd.Series(table_data[self._columns].values.tolist())
    167         table_data = table_data.drop(self._columns, axis=1)
    168         table_data[self._joint_column] = lists_series.str.join(self._separator)

~/anaconda3/envs/sreg/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2910             if is_iterator(key):
   2911                 key = list(key)
-> 2912             indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
   2913 
   2914         # take() does not accept boolean indexers

~/anaconda3/envs/sreg/lib/python3.6/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1252             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1253 
-> 1254         self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
   1255         return keyarr, indexer
   1256 

~/anaconda3/envs/sreg/lib/python3.6/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1296             if missing == len(indexer):
   1297                 axis_name = self.obj._get_axis_name(axis)
-> 1298                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1299 
   1300             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Index(['company', 'department'], dtype='object')] are in the [columns]"

If you took further step and look into the original code:

~/anaconda3/envs/sreg/lib/python3.6/site-packages/sdv/tabular/base.py
--> 388             if len(self._metadata.transform(conditions[[column]]).columns) == 0:
    389                 raise ValueError(f'Conditioning on column `{column}` is not possible')

You will find you only passing one single condition column to the metadata.transform, but inside metadata.transform, all the constraints even irrelevant to the condition columns will be checked. And this causes the bug.

@dyuliu dyuliu added bug Something isn't working pending review labels Apr 9, 2021
@amontanez24
Copy link
Contributor

@dyuliu Thanks for filing! We will take a look at this now

@sync-by-unito sync-by-unito bot changed the title Not able to use constraints and conditions in the same time. SDV - #379 Not able to use constraints and conditions in the same time. Apr 13, 2021
@sync-by-unito sync-by-unito bot changed the title SDV - #379 Not able to use constraints and conditions in the same time. Not able to use constraints and conditions in the same time. Apr 13, 2021
@katxiao katxiao added this to the 0.10.0 milestone May 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants