-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support mini-batch training in VAE #3
Comments
The current implementation in #10 mostly works, but with a gotcha concerning the scaling of losses. In fact, there are three open questions, and one puzzle which is probably related. Question 1: scaling between neighbours and non-neighboursWhen computing the reconstruction loss on the adjacency matrix, we previously used
Question 2: scaling between KL loss and reconstruction lossAfter the correction for point 1 of question 1, we are effectively training with a KL loss and a reconstruction loss that have more or less the same magnitude (something around So, three possibilities:
Question 3: mini-batch sparsityIn the case where we have a sparse network, mini-batching with randomly selected nodes will almost always select nodes that are not directly connected, such that the adjacency matrix restricted to the mini-batch will almost always be the identity matrix. So training on this is bound to be very bad. Should we instead select nodes that have a bigger chance of being connected? Or maybe we can scale the reconstruction loss so that the training works even if most of the sub-adjacency matrices it sees are identity matrices (but it jumps on the occasion when it sees a connection between nodes)? Puzzle: why does the KL fall to 0When training with the situation described in question 2 (so equally scaled KL and reconstruction), most often the KL drops to almost 0 ( |
After the meeting with Marton and @jaklevab. Question 1
Question 2 Question 3 Puzzle |
Note that a mini-batch with convolutions needs access to all the neighbouring nodes it will include in convolutions, on top of the nodes in the mini-batch for which we compute an update.
The text was updated successfully, but these errors were encountered: