Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo fo datasets containing discrete variables? #14

Open
xintao-xiang opened this issue Jun 5, 2021 · 24 comments
Open

Demo fo datasets containing discrete variables? #14

xintao-xiang opened this issue Jun 5, 2021 · 24 comments

Comments

@xintao-xiang
Copy link

xintao-xiang commented Jun 5, 2021

Hi,

I'm trying to apply it for a dataset with discrete variables.
I noticed that there is a MLPDEncoder in modules. However, in the paper, you said that we could still use Equation (6) as the encoder. Is there any particular reason that you add softmax in that encoder?

I'll really appreciate if you could provide a sample code for dealing with discrete variables.

Thank you very much.

====To add
In MLPDiscreteDecoder, it seems like you assume all the discrete variables are of the same size, which does not make sense in practice. How do you deal with it? And what to do wtih mixing types of continuous and discrete variables?

@ItsyPetkov
Copy link

Hey @xintao-xiang, I think by using Equation (6) I think they mean the architecture remains the same. As you can see in the MLPDEncoder they use the same architecture and just apply one-hot encoding to the input and softmax to the output. So I am trying to get the discrete version of the code to work as well. However, so far, I have manged only to prove, based on the paper, that I need to use the MLPDEncoder.

I have no idea whether to use the original Decoder and DiscreteDecoder. Also I am not sure if I have to change the loss functions. Let me know how you get on with your code. Maybe we can help each other out?

@xintao-xiang
Copy link
Author

Hi @ItsyPetkov ,
I can tell how I tried to implement it for your reference, though I did not have any proof if it is what we want.

I assume that the encoder is trying to get a latent space, so I still use the MLPEncoder for all the variables. The shape of weight depends on the MAX dimension of input variable (usually the one-hot representation for discrete variables with most values). For variables with smaller dimensions, I just insert 0s and do not use them in calculation.
For the decoder, I modify the MLPDiscreteDecoder, and similar to the encoder, all the output variables are assumed to have the same dimension. Different softmax layers are used (somewhat hard-coded) for the outputs of different discrete variables just like the image shows (the first one has 6 discrete values while the other two have 2 discrete values).
image
For the loss function, I use nll_catogorical for discrete variables, with others the same in the code provided by the author.

Hope that helps and I'll really appreciate if you have any idea to share.

@ItsyPetkov
Copy link

Hi @xintao-xiang if you are using the MLPEncoder then how do you one-hot encode or do you do something else?

@xintao-xiang
Copy link
Author

xintao-xiang commented Jun 27, 2021

@ItsyPetkov I just one-hot encode all the discrete ones and forward it the encoder. Say we have X1 (2 values), X2 (3 values), then I will one-hot encode them and insert a 0's column to X1, so now we have Nx2x3 data matrix. Then just take input dimension as 3 and hidden dimension whatever we want.
And again I don't know if it is correct or not, but looks reasonable as the latent space is just some representation that does not need softmax...

@ItsyPetkov
Copy link

@xintao-xiang Yeah, alright makes sense. I do the same thing but with the MLPDEncoder with benchmark data which has a finite cardinality of 4. Meaning that for every piece of data in dataset there are only 4 possible categories. So the output of my version of the encoder is of shape XY4. However, what is the output shape of your decoder?

@xintao-xiang
Copy link
Author

@ItsyPetkov The output shape of decoder is just the same as the input of encoder. So following the example, the output shape is Nx2x3 but with softmax of two dimensions in X1, and softmax of three dimensions in X2. Then I just ignore the redundant dimensions and calculate the loss with meaningful ones.

@ItsyPetkov
Copy link

@xintao-xiang Alright yeah makes sense. I did the same thing. The only difference I see so far is the fact that your KL-Divergence term is calculated using the same function the authors have provided. However, they have also provided two such functions for categorical data as well. Maybe try using them? They are in the utils.py. I haven't tried to use them yet so I do not know what will happen but it is worth a shot?

@xintao-xiang
Copy link
Author

@ItsyPetkov Yeah it is worth a try but I did not see any mathematical insight of using the two, do you have any idea?

@ItsyPetkov
Copy link

@xintao-xiang well I have tried both of them and they do not improve the result at all. But I think my version of the model is wrong because I use softmax in the encoder so I cannot say if it is good to use them or not.

@ItsyPetkov
Copy link

Hey @xintao-xiang have you tried to check the torch.matmul() line in the forward function of the Discrete Decoder. There is broadcasting that happens there. That might be causing the result to be wrong.

@xintao-xiang
Copy link
Author

Hi @ItsyPetkov , matmul should only broadcast matrix A, which should be correct.

@ItsyPetkov
Copy link

@xintao-xiang hmm well if that is the case I literally have no idea where a potential mistake might be. What do you do in your forward function for the Decoder. I assume you go through the identity function, then the matmul matrix multiplication and then the result goes through the subsequent layers which you have. Is that assumption correct or do you have more stuff added in there?

@xintao-xiang
Copy link
Author

@ItsyPetkov Yes, that's correct. Did any problem raised in using this setting on your side?

@ItsyPetkov
Copy link

@xintao-xiang No, that is the problem. I cannot prove that what I am doing is right at this point. :(

@xintao-xiang
Copy link
Author

@ItsyPetkov Well, I'm not sure if that's correct either. But I guess you could try creating a synthetic dataset with some really simple relationships and see if that works as expected. And please tell me if you do that because I'm also curious about the result hhhhhha :)

@ItsyPetkov
Copy link

@xintao-xiang I managed to get my hands on one of the benchmark datasets so I am testing with that but the true positive rate is 43% and the false discovery rate is about 66%. We are on the right track but it is not complete right at the moment.

@xintao-xiang
Copy link
Author

@ItsyPetkov Have you tried tuning the parameters? Such model sometimes can be sensitive to hyperparameters.

@ItsyPetkov
Copy link

@xintao-xiang Not really, that is a good idea though. I'll try and see what I find. Thank you!

@ItsyPetkov
Copy link

@xintao-xiang what are you using for the one hot encoding of the data prior to feeding into the encoder? Are you using nn.Embedding?

1 similar comment
@ItsyPetkov
Copy link

@xintao-xiang what are you using for the one hot encoding of the data prior to feeding into the encoder? Are you using nn.Embedding?

@xintao-xiang
Copy link
Author

@ItsyPetkov Sorry for the late reply, I use one hot encoding. But I guess in theory nn.Embedding should also work.

@ItsyPetkov
Copy link

@xintao-xiang I think there is a fundamental problem with the model as it is AE not VAE. You need to add reparameterization step and you need to fix the KLD as it is wrong.

@xintao-xiang
Copy link
Author

@ItsyPetkov Yes, it looks like AE not VAE. But does that give better results? In fact I have noticed this and modified the code, but it would produce some strange results and it cannot even manage to reconstruct the input samples.

@ItsyPetkov
Copy link

@xintao-xiang In theory it should. I haven't managed to do it yet though. I have only managed to get the same result. I did it through tweaking hyperparameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants