# Autoregressive Models
**Jin Yeom**  
jin.yeom@hudl.com

In [1]:
import tensorflow as tf
import tensorflow_probability as tfp


For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.



## 1. Warmup

First, run the following code. It will generate a dataset of samples $x \in \{1, . . . , 100\}$. Take the first 80% of the samples as a training set and the remaining 20% as a test set.

In [2]:
import numpy as np
def sample_data():
    count = 10000
    rand = np.random.RandomState(0)
    a = 0.3 + 0.1 * rand.randn(count)
    b = 0.8 + 0.05 * rand.randn(count)
    mask = rand.rand(count) < 0.5
    samples = np.clip(a * mask + b * (1 - mask), 0.0, 1.0)
    return np.digitize(samples, np.linspace(0.0, 1.0, 100))

In [3]:
data = sample_data()
training_set = data[:int(len(data)*0.8)]
test_set = data[int(len(data)*0.8):]

print("total data =", len(data))
print("training data =", len(training_set))
print("test data =", len(test_set))

total data = 10000
training data = 8000
test data = 2000


Let $\theta = (\theta_{1}, ..., \theta_{100}) \in \mathbb{R}^{100}$, and define the model

$$
p_{\theta}(x) = \frac{e^{\theta_{x}}}{\sum_{x'}{e^{\theta_{x'}}}}
$$

Fit $p_{\theta}$ with maximum likelihood via stochastic gradient descent on the training set, using $\theta$ initialized to zero. Use your favorite version of stochastic gradient descent, and optimize your hyperparameters on a validation set of your choice.

In [4]:
theta = tf.get_variable('theta', shape=[100], dtype=tf.float32, initializer=tf.zeros_initializer())

In [11]:
def softmax_dist(theta):
    p = tf.math.softmax(theta)
    return tfp.distributions.Categorical(probs=p)

Over the course of training, record the average negative log likelihood of the training data (per minibatch) and validation data (for your entire validation set). Plot both on the same graph &ndash; the x-axis should be training setps, and the y-axis should be negative log likelihood; feel free to compute and report the validation performance less frequently. Report the test set performance of your final model. Be sure to report all negative log likelihoods in bits.

In [6]:
def minibatches(data, batch_size):
    perm = np.random.permutation(training_set)
    for i in range(0, len(training_set), batch_size):
        yield tf.constant(perm[i:i+batch_size])

In [12]:
import numpy as np
optimizer = tf.train.AdamOptimizer()
for batch in minibatches(training_set, 16):
    with tf.GradientTape() as tape:        
        model = softmax_dist(theta)
        nll = tf.reduce_mean(-model.log_prob(batch), axis=0)
        print(nll)
    optimizer.minimize(nll)
#     loss = nll(p, batch)
#     print(tf.gradients(loss, theta))

tf.Tensor(4.6051702, shape=(), dtype=float32)


RuntimeError: `loss` passed to Optimizer.compute_gradients should be a function when eager execution is enabled.

I take back EVERYTHING I said about eager execution. It sucks.

TODO: redo these without eager execution