batch_normalize=True doesn't work accurately with phase=Phase.* setting #23
Comments
In most typical usage, batch_normalization is only applied during training and the moving average is tracked for inference time when batch size tends to be 1. Because of this, Phase.infer and Phase.test use variables in the graph that tracked the stddev/mean of the batches during training. I feel like the infer/test paths may need better documentation to clear this up. Are you having the problem when using pt.Phase.train as well? |
Yes, that is correct @eiderman . I use |
I've checked the implementation and it should be doing the correct thing. @jramapuram, would you mind explaining the graph to me? Also, how does this impact the evaluation metrics for the relevant loss on the test set? |
I have a convolutional variational autoencoder which is mapping to a two dimensional latent space. Thus, it disentangles the manifold seen above (of MNIST). When I use do not use the My train/test objects are simply this [note in train the phase is default valued to with tf.variable_scope("z"): # Encode our data into z and return the mean and covariance
self.z_mean, self.z_log_sigma_sq = self.encoder(self.inputs, latent_size)
self.z = tf.add(self.z_mean,
tf.mul(tf.sqrt(tf.exp(self.z_log_sigma_sq)), eps))
# Get the reconstructed mean from the decoder
self.x_reconstr_mean = self.decoder(self.z, self.input_size)
self.z_summary = tf.histogram_summary("z", self.z)
with tf.variable_scope("z", reuse=True): # The test z
self.z_mean_test, self.z_log_sigma_sq_test = self.encoder(self.inputs, latent_size, phase=pt.Phase.test)
self.z_test = tf.add(self.z_mean_test,
tf.mul(tf.sqrt(tf.exp(self.z_log_sigma_sq_test)), eps))
# Get the reconstructed mean from the decoder
self.x_reconstr_mean_test = self.decoder(self.z_test, self.input_size, phase=pt.Phase.test) |
Batch normalization is behaving correctly, but I would really like to understand this phenomenon more because it may have modeling implications on best practice for BN. One experiment that may help to verify it is whether your test results are as good when running smaller batches than all 10k. It may be that normalizing the output based on all test examples results in a cleaner embedding. The default inference behavior of BN is geared towards generating correct and stable predictions for small batch sizes. It would be interesting to see how the accuracy changes on the test set if you were to attach a softmax layer to the embedding (and not training lower layers by using Yet another aspect that would be interesting to test is which projection works better as a VAE. Since one of the goals is to make a decoder that can be easily sampled to generate new results, I suspect that a denser region of digits may work better since there is less likely to be junk spaces that produce non-digits within the samples space. |
@eiderman : Will give it a shot for smaller batch sizes (i.e. same as training). However, this still doesn't answer why it would work when no phase parameter is provided. Does batch normalization turn off without a provided phase parameter? I'm not sure the softmax layer makes any sense. This is a pure unsupervised problem. There are no class labels that can be provided to update the softmax's weights & biases. I'm assuming you would be talking about a softmax+cross-entropy as an optimization objective. |
Jason, with the phase set, batch normalization looks like:
If you do not set the phase, it defaults to 'train' in both cases. This
To test 1& 2, I would recommend either computing the test reconstruction To test 3, Just sample from the model and make sure to hit the white space On Sat, Jun 4, 2016 at 5:23 AM, Jason Ramapuram notifications@github.com
|
@eiderman : I updated my logic to do inference using only def plot_2d_cvae(sess, source, cvae):
z_mu = []
y_sample = []
for _ in range(np.floor(10000.0 / FLAGS.batch_size).astype(int)):
x_sample, y = source.test.next_batch(FLAGS.batch_size)
z_mu.append(cvae.transform(sess, x_sample))
y_sample.append(y)
z_mu = np.vstack(z_mu)
y_sample = np.vstack(y_sample)
print 'z.shape = ', z_mu.shape, ' | y_sample.shape = ', y_sample.shape
plt.figure(figsize=(8, 6))
plt.scatter(z_mu[:, 0], z_mu[:, 1], c=np.argmax(y_sample, 1))
plt.colorbar()
plt.savefig("models/2d_cluster.png", bbox_inches='tight')
#plt.show() When the phase is set to test it looks like the same issue is present: However, setting phase=train for both test & train, To address your points:
Listed below is reconstruction when Phase.test is set accurately: And here is when using Phase.train : When using batch normalization with the running mean it appears to be projecting to ~ the same location (as per the reconstruction). Thus I believe that there is either something wrong with the batch_normalization implementation on the conv2d op. |
My apologies for being obtuse. BN is working as intended, but there is a These are executed by adding a dependency on pt.with_update_ops as This is really a poor API to trickle out to other users, so I will fix it On Mon, Jun 6, 2016 at 7:16 AM, Jason Ramapuram notifications@github.com
|
Great! Thanks! |
I added fix to automatically compute the running variance/mean for inference time. If you have any other issues, please let me know! I'm a little surprised at how poorly the model did with the initial variance (1.0) and mean (0.0). I would have expected the training to have made it somewhat resilient to scale and shift of features. |
Great! Will give it a shot and get back |
Thanks for the assistance @eiderman ! It is working as intended now. |
I believe that there is an error when using
phase
in thedefault_scope
coupled withbatch_normalize=True
.Basically it looks like this:
Full code here: https://github.com/jramapuram/CVAE/blob/master/cvae.py
If I remove
phase=phase
within the scope assignment my model produces the following:However, when setting the phase appropriately I get the following:
This is trained for the same number of iterations using the same model.
The text was updated successfully, but these errors were encountered: