Tracking metrics while training #460

onacrame · 2021-06-07T10:14:51Z

Generator/Discriminator loss is not particularly interpretable. If one prints out these metrics via the verbose option, it's difficult to assess whether the GAN is improving or not. I've run scenarios for my data where the G/D loss followed a similar pattern, but one model was "better" in its synthetic outputs after playing around with certain hyperparameters like batch size. The base parameters, I've found, are not one size fits all and depend on the data at hand

With a GAN for images, one can see the output and judge it based on appearance to the human eye, but with tabular data this is not the case.

A callback during training that calculates certain metrics after every N epochs such as KS score would be useful. Or a visual representation of the synthetic data vs the real data via kernel density plots for certain features might also give better insight into how training is going.

Also an ability to save the "best" iteration would be useful.

nhenscheid · 2021-10-19T17:09:28Z

Agreed, I think training diagnostics could be improved. I like the suggestion of storing and/or displaying intermediate synthetic data to evaluate "statistics of interest". Having this implemented as a callback is probably the way to go since each user will have slightly different set of stats they wish to view.

npatki · 2022-06-30T17:33:52Z

Hi @onacrame and @nhenscheid, thanks for filing this issue. We'll keep it open and use it to communicate any updates we make on this front. Are there any particular statistics that you're interested in?

To help us prioritize this feature request, it would be great to learn a little more about your use case. What kind of datasets are you working with and how are you planning to use the synthetic data?

RaphaelRoyerRivard · 2023-12-29T21:43:48Z

Wow, this is quite an old issue and I am currently facing the same problem. My PARSynthesizer is training and I don't know if it is getting better or worse because I have no metric to track its progress. I thought that adding a callback would be possible but after searching for a few hours, it seems like it is not. I'm writing here to revive this potential feature.

For what it is worth, my use case is that I need to generate fake sequences of position reports (GPS locations). I am not getting great results with PAR yet and I am trying to figure out why, which proves quite difficult because I can't track the progress of the models during a training. I also checked the synthesizer._model.loss_values, but unlike the CTGANSynthesizer that provides a loss for the G and the D, it shows only a single loss.

onacrame added feature request Request for a new feature pending review labels Jun 7, 2021

npatki added under discussion Issue is currently being discussed and removed pending review labels Jun 30, 2022

npatki mentioned this issue Jul 11, 2022

Add callback sdv-dev/CTGAN#149

Closed

npatki removed the under discussion Issue is currently being discussed label Jul 12, 2022

npatki mentioned this issue Jul 12, 2022

Saving/Using best model after training sdv-dev/CTGAN#180

Closed

npatki mentioned this issue Aug 30, 2022

Best number of epochs #222

Closed

npatki mentioned this issue Nov 29, 2022

Ability to provide user-defined callback function to CTGANSyntesizer sdv-dev/CTGAN#256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking metrics while training #460

Tracking metrics while training #460

onacrame commented Jun 7, 2021 •

edited

Loading

nhenscheid commented Oct 19, 2021

npatki commented Jun 30, 2022

RaphaelRoyerRivard commented Dec 29, 2023 •

edited

Loading

Tracking metrics while training #460

Tracking metrics while training #460

Comments

onacrame commented Jun 7, 2021 • edited Loading

nhenscheid commented Oct 19, 2021

npatki commented Jun 30, 2022

RaphaelRoyerRivard commented Dec 29, 2023 • edited Loading

onacrame commented Jun 7, 2021 •

edited

Loading

RaphaelRoyerRivard commented Dec 29, 2023 •

edited

Loading