Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TVAESynthesizer Model Details and Parameters #2094

Closed
pnimeesha opened this issue Jun 27, 2024 · 2 comments
Closed

TVAESynthesizer Model Details and Parameters #2094

pnimeesha opened this issue Jun 27, 2024 · 2 comments
Labels
question General question about the software resolution:resolved The issue was fixed, the question was answered, etc.

Comments

@pnimeesha
Copy link

Environment details

  • SDV version: 1.13.1
  • Python version: 3.9.12
  • Operating System: Windows

Problem description

  1. My first question is related to the paper here. In the section 4.5 (TVAE model), you mention that the model outputs a joint distribution of 2Nc + Nd variables. you also mention in the equation (attached below) about two variables αbar i,j and αhat i,j. Can you please explain these a bit and also about the combined distribution (last line in the equation)?
    tvae_model_equation

  2. I have observed that I cannot change the activation function in the TVAESynthesizer. Below is the snippet for the model params I could change (mentioned in sdv docs using synthesizer.get_parameters()).

  • Do you have any reasoning for not allowing the change in activation function and for using the ones mentioned in the paper?
  • l2scale-Regularization term default value is 1e-5. Can you please explain exactly the role of l2scale and how it effects the model?
  • I see that loss_factor for the reconstruction error has default value of 2. The total loss = reconstruction_loss + kl_loss. Does kl_loss also has any scaling factor and how would that effect the training and total loss?
  • The code line - synthesizer.get_loss_values() gives only the total loss, Is there a way I can track the reconstruction_loss and kl_loss separately?
  • Why is that the batch_size always should be a multiple of 10 and not a number like 512 or 256 (which are generally used for training process) ?

image

I hope my questions are clear.
Thanks in advance!

@pnimeesha pnimeesha added new Automatic label applied to new issues question General question about the software labels Jun 27, 2024
@srinify
Copy link
Contributor

srinify commented Jul 3, 2024

Hi there @pnimeesha

The SDV library got its start at MIT but is now shepherded by the DataCebo team. Our focus at DataCebo is to help enterprises create synthetic data.

Re: your questions about the math or the algorithms -- nearly all of the paper's authors are independent and not affiliated with DataCebo, so I'd recommend reaching out to them directly to get your questions answered about the intricacies of the models & techniques :)

Re: your questions around why some specific functionality isn't available (e.g. loss values aren't exposed for tinkering or not allowing changes in activation functions) we prioritize features based on the needs of teams & projects who are creating enterprise quality synthetic data. So we'd definitely love to learn more about your use case here for enterprise synthetic data if that's relevant!

For TVAE Synthesizer in particular, the code lives here: https://github.com/sdv-dev/CTGAN/ I recommend reading our license (https://github.com/sdv-dev/CTGAN/blob/main/LICENSE) and exploring any possible and relevant customizations

@srinify srinify added under discussion Issue is currently being discussed and removed new Automatic label applied to new issues labels Jul 3, 2024
@pnimeesha
Copy link
Author

Hi Srini,

Thank you for the reply!
I will mail the authors of the paper for the explanation and check out the code from git link.

@srinify srinify added resolution:resolved The issue was fixed, the question was answered, etc. and removed under discussion Issue is currently being discussed labels Jul 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question about the software resolution:resolved The issue was fixed, the question was answered, etc.
Projects
None yet
Development

No branches or pull requests

2 participants