Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot fit twice if I add constraints: ValueError: There are non-numerical values in your data. #1258

Closed
npatki opened this issue Feb 12, 2023 · 0 comments
Assignees
Labels
bug Something isn't working data:multi-table Related to multi-table, relational datasets
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Feb 12, 2023

Environment Details

  • SDV version: SDV 1.0 (in progress)
  • Python version: 3.8
  • Operating System: Linux (Colab Notebook)

Error Description

If I apply constraints to HMASynthesizer, then I am unable to call fit more than once. I get a ValueError every time I try to fit a second time.

Steps to reproduce

Observe that the first fit call succeeds but the second one fails.

from sdv.multi_table import HMASynthesizer
from sdv.datasets.demo import download_demo

real_data, metadata = download_demo(
    modality='multi_table',
    dataset_name='fake_hotels'
)

synthesizer = HMASynthesizer(metadata)

fixed_location_combinations = {
    'constraint_class': 'FixedCombinations',
    'table_name': 'hotels',
    'constraint_parameters': {
        'column_names': ['city', 'state']
    } 
}

synthesizer.add_constraints([fixed_location_combinations])

synthesizer.fit(real_data)
print('FITTING 1: DONE')

synthesizer.fit(real_data)
print('FITTING 2: DONE')

Stack Trace

FITTING 1: DONE
/usr/local/lib/python3.8/dist-packages/rdt/hyper_transformer.py:400: UserWarning: For this change to take effect, please refit your data using 'fit' or 'fit_transform'.
  warnings.warn(self._REFIT_MESSAGE)
/usr/local/lib/python3.8/dist-packages/sdv/single_table/base.py:398: UserWarning: This model has already been fitted. To use the new preprocessed data, please refit the model using 'fit' or 'fit_processed_data'.
  warnings.warn(
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-10-3b08a927e949>](https://localhost:8080/#) in <module>
     21 synthesizer.fit(real_data)
     22 print('FITTING 1: DONE')
---> 23 synthesizer.fit(real_data)
     24 print('FITTING 2: DONE')

8 frames
[/usr/local/lib/python3.8/dist-packages/sdv/multi_table/base.py](https://localhost:8080/#) in fit(self, data)
    324         self._fitted = False
    325         processed_data = self.preprocess(data)
--> 326         self.fit_processed_data(processed_data)
    327 
    328     def reset_sampling(self):

[/usr/local/lib/python3.8/dist-packages/sdv/multi_table/base.py](https://localhost:8080/#) in fit_processed_data(self, processed_data)
    309                 Dictionary mapping each table name to a preprocessed ``pandas.DataFrame``.
    310         """
--> 311         self._fit(processed_data)
    312         self._fitted = True
    313         self._fitted_date = datetime.datetime.today().strftime('%Y-%m-%d')

[/usr/local/lib/python3.8/dist-packages/sdv/multi_table/hma.py](https://localhost:8080/#) in _fit(self, processed_data)
    207         for table_name in processed_data:
    208             if not parent_map.get(table_name):
--> 209                 self._model_table(table_name, processed_data)
    210 
    211         LOGGER.info('Modeling Complete')

[/usr/local/lib/python3.8/dist-packages/sdv/multi_table/hma.py](https://localhost:8080/#) in _model_table(self, table_name, tables)
    180         self._table_sizes[table_name] = len(table)
    181 
--> 182         table = self._extend_table(table, tables, table_name)
    183         keys = self._pop_foreign_keys(table, table_name)
    184         self._clear_nans(table)

[/usr/local/lib/python3.8/dist-packages/sdv/multi_table/hma.py](https://localhost:8080/#) in _extend_table(self, table, tables, table_name)
    117         for child_name in self.metadata._get_child_map()[table_name]:
    118             if child_name not in self._modeled_tables:
--> 119                 child_table = self._model_table(child_name, tables)
    120             else:
    121                 child_table = tables[child_name]

[/usr/local/lib/python3.8/dist-packages/sdv/multi_table/hma.py](https://localhost:8080/#) in _model_table(self, table_name, tables)
    186                     table_name, table.shape)
    187 
--> 188         self._table_synthesizers[table_name].fit_processed_data(table)
    189 
    190         for name, values in keys.items():

[/usr/local/lib/python3.8/dist-packages/sdv/single_table/base.py](https://localhost:8080/#) in fit_processed_data(self, processed_data)
    419                 The transformed data used to fit the model to.
    420         """
--> 421         self._fit(processed_data)
    422         self._fitted = True
    423         self._fitted_date = datetime.datetime.today().strftime('%Y-%m-%d')

[/usr/local/lib/python3.8/dist-packages/sdv/single_table/copulas.py](https://localhost:8080/#) in _fit(self, processed_data)
    130         with warnings.catch_warnings():
    131             warnings.filterwarnings('ignore', module='scipy')
--> 132             self._model.fit(processed_data)
    133 
    134     def _warn_for_update_transformers(self, column_name_to_transformer):

[/usr/local/lib/python3.8/dist-packages/copulas/__init__.py](https://localhost:8080/#) in decorated(self, X, *args, **kwargs)
    251 
    252         if not (np.issubdtype(W.dtype, np.floating) or np.issubdtype(W.dtype, np.integer)):
--> 253             raise ValueError('There are non-numerical values in your data.')
    254 
    255         if np.isnan(W).any().any():

ValueError: There are non-numerical values in your data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data:multi-table Related to multi-table, relational datasets
Projects
None yet
Development

No branches or pull requests

3 participants