Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking metrics while training #460

Open
onacrame opened this issue Jun 7, 2021 · 3 comments
Open

Tracking metrics while training #460

onacrame opened this issue Jun 7, 2021 · 3 comments
Labels
feature request Request for a new feature

Comments

@onacrame
Copy link

onacrame commented Jun 7, 2021

Generator/Discriminator loss is not particularly interpretable. If one prints out these metrics via the verbose option, it's difficult to assess whether the GAN is improving or not. I've run scenarios for my data where the G/D loss followed a similar pattern, but one model was "better" in its synthetic outputs after playing around with certain hyperparameters like batch size. The base parameters, I've found, are not one size fits all and depend on the data at hand

With a GAN for images, one can see the output and judge it based on appearance to the human eye, but with tabular data this is not the case.

A callback during training that calculates certain metrics after every N epochs such as KS score would be useful. Or a visual representation of the synthetic data vs the real data via kernel density plots for certain features might also give better insight into how training is going.

Also an ability to save the "best" iteration would be useful.

@onacrame onacrame added feature request Request for a new feature pending review labels Jun 7, 2021
@nhenscheid
Copy link

Agreed, I think training diagnostics could be improved. I like the suggestion of storing and/or displaying intermediate synthetic data to evaluate "statistics of interest". Having this implemented as a callback is probably the way to go since each user will have slightly different set of stats they wish to view.

@npatki
Copy link
Contributor

npatki commented Jun 30, 2022

Hi @onacrame and @nhenscheid, thanks for filing this issue. We'll keep it open and use it to communicate any updates we make on this front. Are there any particular statistics that you're interested in?

To help us prioritize this feature request, it would be great to learn a little more about your use case. What kind of datasets are you working with and how are you planning to use the synthetic data?

@RaphaelRoyerRivard
Copy link

RaphaelRoyerRivard commented Dec 29, 2023

Wow, this is quite an old issue and I am currently facing the same problem. My PARSynthesizer is training and I don't know if it is getting better or worse because I have no metric to track its progress. I thought that adding a callback would be possible but after searching for a few hours, it seems like it is not. I'm writing here to revive this potential feature.

For what it is worth, my use case is that I need to generate fake sequences of position reports (GPS locations). I am not getting great results with PAR yet and I am trying to figure out why, which proves quite difficult because I can't track the progress of the models during a training. I also checked the synthesizer._model.loss_values, but unlike the CTGANSynthesizer that provides a loss for the G and the D, it shows only a single loss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

4 participants