Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider verbosity parameter for per-epoch losses #15

Closed
kevinykuo opened this issue Jan 8, 2020 · 8 comments
Closed

Consider verbosity parameter for per-epoch losses #15

kevinykuo opened this issue Jan 8, 2020 · 8 comments
Labels
feature request Request for a new feature

Comments

@kevinykuo
Copy link
Contributor

kevinykuo commented Jan 8, 2020

Either on/off or maybe a frequency (e.g. every N epochs)

@csala csala added this to the 0.2.1 milestone Jan 9, 2020
@csala csala added internal The issue doesn't change the API or functionality good first issue labels Jan 9, 2020
@kevinykuo
Copy link
Contributor Author

@csala I think for this we should have a verbose parameter that turns the printing on/off. However, in either case I think it'd be helpful for fit() to return data of the training history, so users can inspect/plot it afterwards. Maybe a wrapper around a pandas data frame but you'd probably have a better idea on what the most Pythonic approach is. Let me know your thoughts on this and I'd be happy to whip something together.

@csala csala removed this from the 0.2.1 milestone Jan 27, 2020
@csala csala modified the milestone: 0.2.1 Jan 27, 2020
@oregonpillow
Copy link
Contributor

Any updates on this issue?
Playing around in Colab I put this together : https://colab.research.google.com/drive/1JA_Ap1bQDmlhm_tC1k8RL0MNYKBluJNa
and added some new arguments to the fit() class in synthesizer.py . However I'm certain that my methods of implementation are probably completely off. Any feedback greatly appreciated.

Args:
            train_data (numpy.ndarray or pandas.DataFrame):
                Training Data. It must be a 2-dimensional numpy array or a
                pandas.DataFrame.
            discrete_columns (list-like):
                List of discrete columns to be used to generate the Conditional
                Vector. If ``train_data`` is a Numpy array, this list should
                contain the integer indices of the columns. Otherwise, if it is
                a ``pandas.DataFrame``, this list should contain the column names.
            verbosity (boolean):
                Choose to display epochs during the run. Defaults to ``True``.
            epochs (int):
                Number of training epochs. Defaults to 300.
            log_frequency (boolean):
                Whether to use log frequency of categorical levels in conditional
                sampling. Defaults to ``True``.
            gpu_stats (boolean):
                Whether to display gpu stats for each epoch. Fitting may be slowed down
                with this option turned on. Only supports nvidia GPUs at this time.
                Defaults to ``False``.
            early_stopping (boolean):
                Whether to stop fitting early if loss function has not improved for
                specified number 'patience' of epochs. Defaults to ``False``.
            patience (int):
                Number of epochs to monitor to see if loss function improves.
                Defaults to ``10`` if early_stopping turned on. 
            logging (boolean):
                Whether to store the generator loss and discriminator loss into a csv
                log file with timestamp. Defaults to ``False``.

@elisim
Copy link

elisim commented Nov 4, 2020

@csala it will be very helpful.
IMHO, something similar to Keras model.fit output, may be considered.

ctgan = CTGANSynthesizer()
hist = ctgan.fit(data, discrete_columns)

where hist is a dictionary containing the generator and discriminator loss per epoch, and may be extended to other metrics in the future.

@Baukebrenninkmeijer
Copy link
Contributor

Baukebrenninkmeijer commented Dec 3, 2020

In my own implementation I added loops using tqdm (progress bars) for both the epochs and steps. You can add logging information like loss there as well.

Related to how this information should be logged and also the proposal @oregonpillow did, I think the following:

  1. The information that you're logging is really good and I like it a lot! The GPU stats are also a nice added bonus.
  2. The histogram should not be returned by fit. To me at least, this does not feel intuitive. I think this information can be logged as a attribute, like ctgan.hist or ctgan.logs or something.
  3. Writing directly to files seems a bit much for an implementation in CTGAN.
  4. I think an option to facilitate many of these things is using a callback systems, similar to FastAI. We call on_epoch_end, on_epoch_start and other methods on the objects in ctgan.callbacks. These callbacks can be anything, ranging from logging objects to early stopping.

@oregonpillow
Copy link
Contributor

I'll be honest, the only reason I added GPU status was because I liked watching the temperature go up with more epochs 😏

@csala csala added feature request Request for a new feature needs discussion and removed good first issue internal The issue doesn't change the API or functionality labels Sep 6, 2021
@NadeemNicoR
Copy link

Can i please know what is the metric used here in the loss calculation
Epoch 105, Loss G: -7.7396, Loss D: -0.3223, this is what i get when i try to fit the model over the training data

@Baukebrenninkmeijer
Copy link
Contributor

@NadeemNicoR De metric is raw logit output iirc. The loss of G is just the average error of the samples produced by G. The loss of D is the loss of G - the loss of the real samples. I'm doing this by heart, so let me know if this is incorrect.

@npatki
Copy link
Contributor

npatki commented Jul 11, 2022

#147 addressed this issue so I'm closing it off. For further discussion about the verbosity parameter, let's use the overall SDV GitHub.

@npatki npatki closed this as completed Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

7 participants