Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider initiating generator with synthesizer #4

Open
kevinykuo opened this issue Nov 6, 2019 · 4 comments
Open

Consider initiating generator with synthesizer #4

kevinykuo opened this issue Nov 6, 2019 · 4 comments
Labels
internal The issue doesn't change the API or functionality

Comments

@kevinykuo
Copy link
Contributor

kevinykuo commented Nov 6, 2019

Not too familiar with pytorch so let me know if this makes sense...

It seems like we're instantiating the model each time fit() is called https://github.com/DAI-Lab/CTGAN/blob/7aa29685045ffdba84bd87432354c133e05699e6/ctgan/ctgan_model.py#L458-L465
Would it make sense to do this once when we instantiate CTGANSynthesizer so we can e.g. look at the behavior of generated data as we train for more epochs?

@Baukebrenninkmeijer
Copy link
Contributor

Both have advantages. With the parameters in fit, you can more easily change out the training parameters and try different things. But your point is very valid as well.

@csala
Copy link
Contributor

csala commented Nov 26, 2019

I like the idea behind your suggestion, @kevinykuo , of moving the epochs argument to the fitmethod and allowing one to resume a previous fitting process (see #5 (comment)).

However, doing this is a bit more complex than it looks, because the Generator instance needs to be passed the data_dim argument, which is deduced from the data that is currently only known during fit.

This means that we cannot simply move the creation of this instance to the __init__ method, but rather figure out another way to implement a "warm start" behavior.

One option would be still create all the model instances inside the fit method, but only do it if they do not exist beforehand.
However, if this is done, some checks need to be also added to make sure that the data which is passed to second fit calls is still compatible with the model instances (if not the same).

@kevinykuo
Copy link
Contributor Author

Got it, sounds like there a couple ways to proceed, dictated by what you think a "model" represents, i.e., if it should be identified with the metadata of a dataset.

  1. (The option you mention above) Instantiate the generator at the first fit call and cache it.
  2. Require the user to pass some sort of metadata or a sample dataset at model instantiation.

@csala csala added discussion needed internal The issue doesn't change the API or functionality labels Jun 22, 2020
@Baukebrenninkmeijer
Copy link
Contributor

This is now mostly solved, correct? I think this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal The issue doesn't change the API or functionality
Projects
None yet
Development

No branches or pull requests

4 participants