Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas performance warning #223

Closed
oriel9p opened this issue Jun 7, 2022 · 1 comment
Closed

Pandas performance warning #223

oriel9p opened this issue Jun 7, 2022 · 1 comment
Labels
bug Something isn't working resolution:duplicate This issue or pull request already exists

Comments

@oriel9p
Copy link

oriel9p commented Jun 7, 2022

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • CTGAN version: latest (June 7th, 2022)
  • Python version: 3.9.10
  • Operating System: windows (python venv)

Error Description

pandas performance warning

Steps to reproduce

I was simply running the model with arbitrary columns and the mimic-iv data.

no crash, multiple performance warnings, I believe it has something to do with the version of the pandas and not using the suggested pd.concat on the training process of the ctgan model
@oriel9p oriel9p added bug Something isn't working pending review This issue needs to be further reviewed, so work cannot be started labels Jun 7, 2022
@npatki
Copy link
Contributor

npatki commented Jul 14, 2022

Hi @oriel9p, nice to meet you. I can confirm that there are many warnings when fitting the model. Many are repeated. There seem to be only 2 unique sources that I'm pasting below:

/usr/local/lib/python3.7/dist-packages/sklearn/mixture/_base.py:282: ConvergenceWarning: Initialization 1 did not converge. Try different init parameters, or increase max_iter, tol or check for degenerate data.
  ConvergenceWarning,

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data[column_name] = data[column_name].to_numpy().flatten()
/usr/local/lib/python3.7/dist-packages/ctgan/data_transformer.py:111: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

For the first (ConvergenceWarning), we are using #202 to track the possibility of silencing these if they do not impact the end use.

For the second (SettingWithCopyWarning), the change may require some more investigation. You can follow along in #215.

Since we have existing issues for these warnings, I'll close this issue off as a duplicate. If there is other output you are observing, feel free to reply and I'll reopen this issue.

@npatki npatki closed this as completed Jul 14, 2022
@npatki npatki added resolution:duplicate This issue or pull request already exists and removed pending review This issue needs to be further reviewed, so work cannot be started labels Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working resolution:duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants