# Backpropagation Through Single *Sampled* Logit.
We test the ability to backpropagate through a single logit.  If you only allow gradients through a single logit, then at a given training iteration, you will only impact one column of your weight matrix, that is, weights contributing to the value of that logit. 

In [36]:
import tensorflow as tf
from tensorflow.contrib.distributions import Categorical
tf.reset_default_graph()

input_dim = 3
hidden_dim = 5
output_dim = 3
lr = 1e-1
num_iterations = 50
print_every = 1

x = tf.fill([1, input_dim], 1.)
y = tf.fill([1, output_dim], 1.)


with tf.name_scope('Model'):
    W_gen = tf.Variable(tf.random_uniform([input_dim, hidden_dim]), name = 'W_gen')
    logits = tf.matmul(x, W_gen)

    # Sample through single logit.
    sample_op = tf.stop_gradient(Categorical(logits).sample(n=1))
    index = tf.squeeze(sample_op)
#     index = tf.constant(0, dtype=tf.int32)
    one_hot = tf.one_hot(index, hidden_dim, dtype = tf.float32)
    logits = logits * one_hot

    W_dis = tf.Variable(tf.random_uniform([hidden_dim, output_dim]), name = 'W_dis')
    output = tf.matmul(logits, W_dis)

        
with tf.name_scope('Loss'):
    loss_op = tf.reduce_mean(tf.squared_difference(output, y))

with tf.name_scope('Train'):
    train_vars = [W_gen]
    train_op = tf.train.AdamOptimizer(lr).minimize(loss_op, var_list = train_vars)
    
    
with tf.Session() as sess:
    init_op = tf.initialize_all_variables()
    sess.run(init_op)

    for i in xrange(num_iterations):
        if i % print_every == 0:
            print('Loss at iteration %d: %f' % (i, sess.run(loss_op)))
            print('Sample: [%d]' % sess.run(index))
            print sess.run(W_gen)
        sess.run(train_op)
    print sess.run(output)


Loss at iteration 0: 0.149865
Sample: [1]
[[ 0.81418228  0.8655349   0.16064     0.55864608  0.35103011]
 [ 0.26089203  0.69035411  0.64020491  0.02805829  0.99758911]
 [ 0.56620026  0.52124786  0.23499095  0.59907818  0.44014001]]
Loss at iteration 1: 0.149865
Sample: [1]
[[ 0.81418228  0.76553494  0.16064     0.55864608  0.35103011]
 [ 0.26089203  0.59035414  0.64020491  0.02805829  0.99758911]
 [ 0.56620026  0.4212479   0.23499095  0.59907818  0.44014001]]
Loss at iteration 2: 0.181932
Sample: [0]
[[ 0.81418228  0.69852936  0.23505335  0.55864608  0.35103011]
 [ 0.26089203  0.52334857  0.71461827  0.02805829  0.99758911]
 [ 0.56620026  0.35424232  0.30940431  0.59907818  0.44014001]]
Loss at iteration 3: 0.431700
Sample: [3]
[[ 0.81418228  0.64673406  0.29257485  0.55864608  0.28714925]
 [ 0.26089203  0.47155327  0.77213979  0.02805829  0.93370825]
 [ 0.56620026  0.30244702  0.36692584  0.59907818  0.37625915]]
Loss at iteration 4: 0.023184
Sample: [1]
[[ 0.81418228  0.59733492  0.3

### Analysis
Training, even for the most trivial task imaginable, is very volatile.  This does not appear to be a plausible way to proceed.

### TensorFlow Bug
Mathematical impossibility. It appears that for a sample $i$, that columns $\neq i$ of the generator weight matrix $W_{gen}$ are updating! This only occurs when I'm using Categorical.  This is filed as [Issue 4074](https://github.com/tensorflow/tensorflow/issues/4074).


In [None]:
# Backpropagation Through Distirbut