Consider initiating generator with synthesizer #4

kevinykuo · 2019-11-06T16:45:54Z

Not too familiar with pytorch so let me know if this makes sense...

It seems like we're instantiating the model each time fit() is called https://github.com/DAI-Lab/CTGAN/blob/7aa29685045ffdba84bd87432354c133e05699e6/ctgan/ctgan_model.py#L458-L465
Would it make sense to do this once when we instantiate CTGANSynthesizer so we can e.g. look at the behavior of generated data as we train for more epochs?

The text was updated successfully, but these errors were encountered:

Baukebrenninkmeijer · 2019-11-26T15:32:27Z

Both have advantages. With the parameters in fit, you can more easily change out the training parameters and try different things. But your point is very valid as well.

csala · 2019-11-26T16:43:04Z

I like the idea behind your suggestion, @kevinykuo , of moving the epochs argument to the fitmethod and allowing one to resume a previous fitting process (see #5 (comment)).

However, doing this is a bit more complex than it looks, because the Generator instance needs to be passed the data_dim argument, which is deduced from the data that is currently only known during fit.

This means that we cannot simply move the creation of this instance to the __init__ method, but rather figure out another way to implement a "warm start" behavior.

One option would be still create all the model instances inside the fit method, but only do it if they do not exist beforehand.
However, if this is done, some checks need to be also added to make sure that the data which is passed to second fit calls is still compatible with the model instances (if not the same).

kevinykuo · 2019-11-26T17:18:52Z

Got it, sounds like there a couple ways to proceed, dictated by what you think a "model" represents, i.e., if it should be identified with the metadata of a dataset.

(The option you mention above) Instantiate the generator at the first fit call and cache it.
Require the user to pass some sort of metadata or a sample dataset at model instantiation.

Baukebrenninkmeijer · 2020-12-03T12:43:12Z

This is now mostly solved, correct? I think this can be closed.

csala mentioned this issue Nov 26, 2019

Consider moving training related parameters to fit() #5

Closed

csala mentioned this issue Dec 5, 2019

Reorganize the project structure #10

Closed

csala added discussion needed internal The issue doesn't change the API or functionality labels Jun 22, 2020

npatki removed the needs discussion label Jul 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider initiating generator with synthesizer #4

Consider initiating generator with synthesizer #4

kevinykuo commented Nov 6, 2019 •

edited

Loading

Baukebrenninkmeijer commented Nov 26, 2019

csala commented Nov 26, 2019

kevinykuo commented Nov 26, 2019

Baukebrenninkmeijer commented Dec 3, 2020

Consider initiating generator with synthesizer #4

Consider initiating generator with synthesizer #4

Comments

kevinykuo commented Nov 6, 2019 • edited Loading

Baukebrenninkmeijer commented Nov 26, 2019

csala commented Nov 26, 2019

kevinykuo commented Nov 26, 2019

Baukebrenninkmeijer commented Dec 3, 2020

kevinykuo commented Nov 6, 2019 •

edited

Loading