## STATS 507 HW10 Google Tensorflow

## Problem 1: Warm up: constructing a 3-tensor
* Create a TensorFlow constant tensor __`tflogo`__ with shape 5-by-4-by-3. (five cells tall, four wide, three deep)
* This tensor will represent the 5-by-4-by-3 volume that contains the orange structure depicted in the logo (said another way, the orange structure is inscribed in this 5-by-4-by-3 volume). 
* Each cell of your tensor should correspond to one cell in this volume. 
* Each entry of your tensor should be 1 if and only if the corresponding cell is part of the orange structure, and should be 0 otherwise. 
* Looking at the logo, we see that the orange structure can be __broken into 11 cubic cells, so your tensor `tflogo` should have precisely 11 non-zero entries__. 
* For the sake of consistency, the (0, 3, 2)-entry of your tensor (using 0-indexing) should correspond to the top rear corner of the structure where the cross of the “T” meets the top of the “F”. 
* __Note__: if you look carefully, the shadows in the logo do not correctly reflect the orange structure—the shadow of the “T” is incorrectly drawn. Do not let this fool you!
* __Hint__: you may find it easier to create a Numpy array representing the structure first, then turn that Numpy array into a TensorFlow constant. 
* __Second hint__: as a sanity check, try printing your tensor. You should see a series of 4-by-3 matrices, as though you were looking at one horizontal slice of the tensor at a time, working your way from top to bottom.


In [51]:
import numpy as np
import tensorflow as tf

In [52]:
# five layer, start from top
layer0 = np.array([[0, 0, 1], 
                  [0, 0, 1],
                  [0, 0, 1],
                  [1, 1, 1]])
layer1 = np.array([[0, 0, 0], 
                   [0, 0, 0],
                   [0, 0, 1],
                   [0, 0, 0]])
layer2 = np.array([[0, 0, 0], 
                  [0, 0, 0],
                  [0, 1, 1],
                  [0, 0, 0]])
layer3 = np.array([[0, 0, 0], 
                  [0, 0, 0],
                  [0, 0, 1],
                  [0, 0, 0]])
layer4 = np.array([[0, 0, 0], 
                  [0, 0, 0],
                  [0, 0, 1],
                  [0, 0, 0] ])
combine = np.array([layer0, layer1, layer2, layer3, layer4])
combine.shape

(5, 4, 3)

In [53]:
# create tf constant
tflogo = tf.constant(combine)
# T: add up axis 2 (z-direction)
T = tf.reduce_sum(tflogo, axis=2)
# F: add up axis 1 (y-direction)
F = tf.reduce_sum(tflogo, axis=1)
# check the "T" and "F" 
with tf.Session() as sess:
    print(sess.run(T))
    print(sess.run(F))

[[1 1 1 3]
 [0 0 1 0]
 [0 0 2 0]
 [0 0 1 0]
 [0 0 1 0]]
[[1 1 4]
 [0 0 1]
 [0 1 1]
 [0 0 1]
 [0 0 1]]


## Problem 2: Building and training sample models
* In this problem, you’ll use TensorFlow to build the loss functions for a pair of commonly used statistical models.
* In all cases, your answer should include __placeholder variables `x` and `ytrue`, which will serve as the predictor (independent variable) and response (dependent variable)__, respectively. 
* Please use `W` to denote a parameter that multiplies the predictor, and `b` to denote a bias parameter (i.e., a parameter that is added).


### 1. Logistic regression with a negative log-likelihood loss. 
* In this model, which we discussed briefly in class, the binary variable `Y` is distributed as a Bernoulli random variable with success parameter $\sigma(W^{T}X + b)$, where $\sigma(z) = (1 + exp(−z))^{−1}$ is the logistic function, and $X ∈ R^{6}$ is the predictor random variable, and $W ∈ R^{6}$, $b ∈ R$ are the model parameters. 
* Derive the log-likelihood of `Y` , and write the TensorFlow code that represents the negative log-likelihood loss function. 
* Hint: the loss should be a sum over all observations of a negative log-likelihood term.

(1) Since $p = \sigma(W^{T}X + b)$, and  $\sigma(z) = (1 + exp(−z))^{−1}$, set $z = W^{T}X + b$, we get $p = [1 + exp(−[W^{T}X + b])]^{−1}$  
(2) Bernoulli distribution: $f(x) = p^{y}(1-p)^{1-y}$  
(3) log-likelihood = $\Sigma_{n=1}^{N} \log(p_{n}^{y_n}(1-p_{n})^{1-y_n}) = \Sigma_{n=1}^{N} (\log (1-p_{n}) + y_{n} \log \frac {p_{n}}{1-p_{n}}) = \Sigma_{n=1}^{N} y_n(w^{T}X_{n} + b) - \log(1 + \exp(w^{T}X_{n} + b))$  
(4) $X ∈ R^{6}$, $W ∈ R^{6}$, $b ∈ R$  

In [54]:
# inputs
x = tf.placeholder(tf.float64, shape=(None, 6))
ytrue = tf.placeholder(tf.float64)

# regression parameters, initialized with 0
w = tf.Variable(tf.ones([6,1], dtype=tf.float64))
b = tf.Variable(tf.ones([1], dtype=tf.float64))
one = tf.cast(1, dtype=tf.float64)
score = tf.matmul(x, w) + b
# negative log-likelihood
loss = tf.reduce_sum(-ytrue*score + tf.log(one + tf.exp(score)))

### 2. Estimating parameters in logistic regression.
 The zip file at http://www-personal.umich.edu/~klevin/teaching/Winter2019/STATS507/HW10_logistic.zip contains four Numpy `.npy` files that contain train and test data generated from a logistic model:  
(1) `logistic_xtest.npy` : contains a 500-by-6 matrix whose rows are the independent variables (predictors) from the test set.  
(2) `logistic_xtrain.npy` : contains a 2000-by-6 matrix whose rows are the independent variables (predictors) from the train set.  
(3) `logistic_ytest.npy` : contains a binary 500-dimensional vector of dependent variables (responses) from the test set.  
(4) `logistic_ytrain.npy` : contains a binary 2000-dimensional vector of dependent variables (responses) from the train set  
* The i-th row of the matrix in `logistic_xtrain.npy` is the predictor for the response in the i-th entry of the vector in `logistic_ytrain.npy`, and analogously for the two test set files. 

* Please include these files in your submission so that we can run your code without downloading them again. 
* Note: we didn’t discuss reading numpy data from files. To load the files, you can simply call `xtrain = np.load(’xtrain.npy’)` to read the data into the variable xtrain. xtrain will be a Numpy array.
* Load the training data and use it to obtain estimates of W and b by minimizing the negative log-likelihood via __gradient descent__. 
* Another note: you’ll have to play around with the learning rate and the number of steps. Two good ways to check if optimization is finding a good minimizer:
    * Try printing the training data loss before and after optimization. 
    * Use the test data to validate your estimated parameters.

In [55]:
# load data
X_train = np.load("logistic_data/logistic_xtrain.npy")
Y_train = np.load("logistic_data/logistic_ytrain.npy")
X_test = np.load("logistic_data/logistic_xtest.npy")
Y_test = np.load("logistic_data/logistic_ytest.npy")

# set learning rate, create optimizer
global_step = tf.Variable(0, trainable=False) 
optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.005)
train = optimizer.minimize(loss, global_step=global_step)

# initialize
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
# gradient descent
for step in range(10000+1):
    sess.run(train, feed_dict = {x: X_train, ytrue: Y_train})
    # print 
    if step %1000 == 0:
        loss_print = sess.run(loss, feed_dict = {x: X_train, ytrue: Y_train})
        print("Step:{}, loss:{:.2f}".format(step, loss_print))

print("")
print("estimate w:")
print(sess.run(w))
print("estimate b")
print(sess.run(b))

Step:0, loss:1417.36
Step:1000, loss:680.23
Step:2000, loss:680.23
Step:3000, loss:680.23
Step:4000, loss:680.23
Step:5000, loss:680.23
Step:6000, loss:680.23
Step:7000, loss:680.23
Step:8000, loss:680.23
Step:9000, loss:680.23
Step:10000, loss:680.23

estimate w:
[[0.97808246]
 [1.2326892 ]
 [1.49821729]
 [3.01531266]
 [4.64403196]
 [7.52841393]]
estimate b
[-0.95492091]


### 3. Evaluating logistic regression on test data. 
* Load the test data. 
* What is the negative log-likelihood of your model on this test data? That is, what is the negative log-likelihood when you use your estimated parameters with the previously unseen test data?

In [56]:
loss_output = sess.run(loss, feed_dict = {x: X_test, ytrue: Y_test})
print("negative log-likelihood is :{:.3f} for test data.".format(loss_output))

negative log-likelihood is :163.014 for test data.


### 4. Evaluating the estimated logistic parameters. The data was, in reality, generated with
$$W = (1,1,2,3,5,8), b = −1$$
* Write TensorFlow expressions to compute the squared error between your estimated parameters and their true values.
* Evaluate the error in recovering W and b separately. 
* What are the squared errors? 
* Note: you need only evaluate the error of your final estimates, not at every step.

In [57]:
# estimates
est_W, est_b = sess.run([w, b])
print("Estimates of W")
print(est_W)
print("Estimates of b")
print(est_b)

Estimates of W
[[0.97808246]
 [1.2326892 ]
 [1.49821729]
 [3.01531266]
 [4.64403196]
 [7.52841393]]
Estimates of b
[-0.95492091]


In [58]:
# true
wtrue = tf.constant(np.array([1,1,2,3,5,8]), dtype=tf.float64)
btrue = tf.constant(-1, dtype=tf.float64)
# square error
sq_err_w = tf.square(wtrue - est_W)
sq_err_b = tf.square(btrue - est_b)
# sum of square error
sum_sqer_w = tf.reduce_sum(sq_err_w)
sum_sqer_b = tf.reduce_sum(sq_err_b)
tot_sqer = sum_sqer_w + sum_sqer_b
# run w
tf_sq_err_w, tf_sum_w = sess.run([sq_err_w, sum_sqer_w])
print("Square error for w:{}, sum:{:.3f}".format(tf_sq_err_w, tf_sum_w))
# run b
tf_sq_err_b, tf_sum_b = sess.run([sq_err_b, sum_sqer_b]) 
print("Square error for b:{:.3f}, sum:{:.3f}".format(tf_sq_err_b[0], tf_sum_b))
# total
tf_tot_err = sess.run([tot_sqer]) 
print("Total sum of square error:{:.3f}".format(tf_tot_err[0]))

Square error for w:[[4.80378446e-04 4.80378446e-04 1.04431545e+00 4.08815053e+00
  1.61758207e+01 4.93073259e+01]
 [5.41442628e-02 5.41442628e-02 5.88765867e-01 3.12338747e+00
  1.41926307e+01 4.57964955e+01]
 [2.48220468e-01 2.48220468e-01 2.51785888e-01 2.25535131e+00
  1.22624821e+01 4.22731784e+01]
 [4.06148513e+00 4.06148513e+00 1.03085980e+00 2.34477624e-04
  3.93898383e+00 2.48471079e+01]
 [1.32789690e+01 1.32789690e+01 6.99090503e+00 2.70284110e+00
  1.26713243e-01 1.12625215e+01]
 [4.26201884e+01 4.26201884e+01 3.05633606e+01 2.05065327e+01
  6.39287700e+00 2.22393422e-01]], sum:420.472
Square error for b:0.002, sum:0.002
Total sum of square error:420.474


### 5. Make the variables from the above problems available in a dictionary called `results_logistic`. 
* The dictionary should have keys `’W’, ’Wsqerr’, ’b’, ’bsqerr’, ’log_lik_test’`,with respective values `sess.run(x)` where x ranges over the corresponding quantities. 
* For example, if my squared error for `W` is stored in a TF variable called `W_squared_error`, then the key `’Wsqerr’` should have value `sess.run(W_squared_error)`.

In [59]:
results_logistic = {'W': est_W, 'Wsqerr': tf_sq_err_w, 'b': est_b, 'bsqerr': tf_sq_err_b, 'log_lik_test': tf_tot_err }
results_logistic

{'W': array([[0.97808246],
        [1.2326892 ],
        [1.49821729],
        [3.01531266],
        [4.64403196],
        [7.52841393]]),
 'Wsqerr': array([[4.80378446e-04, 4.80378446e-04, 1.04431545e+00, 4.08815053e+00,
         1.61758207e+01, 4.93073259e+01],
        [5.41442628e-02, 5.41442628e-02, 5.88765867e-01, 3.12338747e+00,
         1.41926307e+01, 4.57964955e+01],
        [2.48220468e-01, 2.48220468e-01, 2.51785888e-01, 2.25535131e+00,
         1.22624821e+01, 4.22731784e+01],
        [4.06148513e+00, 4.06148513e+00, 1.03085980e+00, 2.34477624e-04,
         3.93898383e+00, 2.48471079e+01],
        [1.32789690e+01, 1.32789690e+01, 6.99090503e+00, 2.70284110e+00,
         1.26713243e-01, 1.12625215e+01],
        [4.26201884e+01, 4.26201884e+01, 3.05633606e+01, 2.05065327e+01,
         6.39287700e+00, 2.22393422e-01]]),
 'b': array([-0.95492091]),
 'bsqerr': array([0.00203212]),
 'log_lik_test': [420.47402767611226]}

In [60]:
type(results_logistic)

dict

### 6. Classification of normally distributed data. 
* The .zip file at http://www-personal.umich.edu/~klevin/teaching/Winter2019/STATS507/HW10_normal.zip contains four Numpy `.npy` files that contain train and test data generated from K = 3 different classes. 
* Each class $k ∈ {1, 2, 3}$ has an associated mean $μ_{k} ∈ R$ and variance $σ_{k}^{2} ∈ R$, and all observations from a given class are i.i.d. $N(μ_{k},σ_{k}^{2})$. The four files are:
    * `normal_xtest.npy` : contains a 500-vector whose entries are the independent variables (predictors) from the test set.
    * `normal_xtrain.npy` : contains a 2000-vector whose entries are the independent variables (predictors) from the train set.
    * `normal_ytest.npy` : contains a 500-by-3 dimensional matrix whose rows are one-hot encodings of the class labels for the test set.
    * `normal_ytrain.npy` : contains a 2000-by-3 dimensional matrix whose rows are one-hot encodings of the class labels for the train set.

* The i-th entry of the vector in `normal_xtrain.npy` is the observed random variable from class with label given by the i-th row of the matrix in `normal_ytrain.npy`, and analogously for the two test set files. 
* Please include these files in your submission so that we can run your code without downloading them again.
* Load the training data and use it to obtain estimates of the vector of class means $\mu = (\mu_{0}, \mu_{1}, \mu_{2})$ and variances $\sigma^{2} = (\sigma_{0}^{2}, \sigma_{1}^{2}, \sigma_{2}^{2})$ by __minimizing the cross-entropy__ between the estimated normals and the one-hot encodings of the class labels (as we did in our __`softmax regression`__ example in class). 
* Please name the corresponding variables __`mu`__ and __`sigma2`__. 
* This time, instead of using gradient descent, use __`Adagrad`__, supplied by TensorFlow as the function __`tf.train.AdagradOptimizer`__. 
* Adagrad is a stochastic gradient descent algorithm, popular in machine learning. 
* You can call this just like the gradient descent optimizer we used in class—just supply a learning rate. Documentation for the TF implementation of Adagrad can be found here: https://www.tensorflow.org/api_docs/python/tf/train/AdagradOptimizer. See https://en.wikipedia.org/wiki/Stochastic_gradient_descent for more information about stochastic gradient descent and the Adagrad algorithm.
* Note: you’ll no longer be able to use the built-in logit cross-entropy that we used for training our models in lecture. 
* Your cross-entropy for one observation should now look something like __$−\Sigma_{􏰀k}y'_{k}\log p_{k}$__, where $y′$ is the one-hot encoded vector and $p$ is the vector whose k-th entry is the (estimated) probability of the k-th observation given its class. 
* Another note: do not include any estimation of the mixing coefficients (i.e., the class priors) in your model. 
* You only need to estimate three means and three variances, because we are building a discriminative model in this problem.

In [61]:
X_train = np.load("normal_data/normal_xtrain.npy")
Y_train = np.load("normal_data/normal_ytrain.npy")
X_test = np.load("normal_data/normal_xtest.npy")
Y_test = np.load("normal_data/normal_ytest.npy")

In [62]:
# data
tf.reset_default_graph()
x = tf.placeholder(tf.float32, shape=(None, 1))
ytrue = tf.placeholder(tf.float32, shape=(None, 3))
# parameters
mu = tf.Variable(tf.ones([1,3], tf.float32, name = 'mu'))
sigma2 = tf.Variable(tf.ones([1,3], tf.float32, name = 'sigma'))

In [63]:
# cross entropy
log_p = tf.distributions.Normal(loc = mu, scale = sigma2).log_prob(x)
cross_entropy = -tf.reduce_sum(ytrue*log_p, 1)
# train: tf.train.AdagradOptimizer
train_ada = tf.train.AdagradOptimizer(learning_rate = 0.05).minimize(cross_entropy)

In [64]:
# initialize
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
# adagrad
for step in range(10000+1):
    sess.run(train_ada, feed_dict = {x: X_train, ytrue: Y_train})
    # print 
    if step %1000 == 0:
        est_mu, est_sig2 = sess.run([mu, tf.square(sigma2)])
        print("Step:{}, est mu:{}, est sigma2:{}".format(step, est_mu, est_sig2))

Step:0, est mu:[[0.95 0.95 1.05]], est sigma2:[[1.1024998 1.1024998 1.1024998]]
Step:1000, est mu:[[-0.9823299   0.00450897  2.8057144 ]], est sigma2:[[0.5352919 1.008755  1.6795727]]
Step:2000, est mu:[[-1.0075966   0.00449135  2.989315  ]], est sigma2:[[0.533382 1.008755 1.517913]]
Step:3000, est mu:[[-1.0076314   0.00449135  3.003731  ]], est sigma2:[[0.533382  1.008755  1.5159032]]
Step:4000, est mu:[[-1.0076314   0.00449135  3.004786  ]], est sigma2:[[0.533382  1.008755  1.5159012]]
Step:5000, est mu:[[-1.0076314   0.00449135  3.0048313 ]], est sigma2:[[0.533382  1.008755  1.5159012]]
Step:6000, est mu:[[-1.0076314   0.00449135  3.0048313 ]], est sigma2:[[0.533382  1.008755  1.5159012]]
Step:7000, est mu:[[-1.0076314   0.00449135  3.0048313 ]], est sigma2:[[0.533382  1.008755  1.5159012]]
Step:8000, est mu:[[-1.0076314   0.00449135  3.0048313 ]], est sigma2:[[0.533382  1.008755  1.5159012]]
Step:9000, est mu:[[-1.0076314   0.00449135  3.0048313 ]], est sigma2:[[0.533382  1.008755 

In [65]:
sess.run([mu, tf.square(sigma2)])

[array([[-1.0076314 ,  0.00449135,  3.0048313 ]], dtype=float32),
 array([[0.533382 , 1.008755 , 1.5159012]], dtype=float32)]

### 7. Evaluating loss on test data. 
* Load the test data. 
* What is the cross-entropy of your model on this test data? That is, what is the cross-entropy when you use your estimated parameters with the previously unseen test data?

In [66]:
output = sess.run(cross_entropy, feed_dict = {x: X_test, ytrue: Y_test})
print("The cross-entropy is: {:.3f} for test data.".format(np.mean(output)))

The cross-entropy is: 1.373 for test data.


### 8. Evaluating parameter estimation on test data. 
* The true parameter values for the three classes were  
$\mu_{0} =−1$, $\mu_{1} = 0$, $\mu_{2} = 3$,  
$\sigma_{0}^{2} =0.5$, $\sigma_{1}^{2} = 1$, $\sigma_{2}^{2} = 1.5$.
* Write a TensorFlow expression to compute the total squared error (i.e., summed over the six parameters) between your estimates and their true values. 
* What is the squared error? Note: you need only evaluate the error of your final estimates, not at every step.


In [67]:
# estimate
est_mu, est_sig = sess.run([mu, tf.square(sigma2)])
# true
mutrue = tf.constant(np.array([-1, 0, 3]), dtype=tf.float32)
sigtrue = tf.constant(np.array([0.5, 1, 1.5]), dtype=tf.float32)

In [68]:
# square error
sq_err_mu = tf.square(mutrue - est_mu)
sq_err_sig = tf.square(sigtrue - est_sig)
# sum of square error
sum_sqer_mu = tf.reduce_sum(sq_err_mu)
sum_sqer_sig = tf.reduce_sum(sq_err_sig)
tot_sqer = sum_sqer_mu + sum_sqer_sig
# run mu
tf_sq_err_mu, tf_sum_mu = sess.run([sq_err_mu, sum_sqer_mu])
# run sigma
tf_sq_err_sig, tf_sum_b = sess.run([sq_err_sig, sum_sqer_sig]) 
# total
tf_tot_err = sess.run([tot_sqer]) 
print("Total sum of square error: {:.5f}".format(tf_tot_err[0]))

Total sum of square error: 0.00155


### 9. Evaluating classification error on test data. 
* Write and evaluate a TensorFlow expression that computes the classification error of your estimated model averaged over the test data.

In [69]:
# prediction base on test data
pre = tf.distributions.Normal(loc = est_mu, scale = est_sig).log_prob(x)
prediction = tf.argmax(pre,1)
correct = tf.equal(prediction, tf.argmax(ytrue,1))
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
accuracy_out = sess.run(accuracy, feed_dict = {x: X_test, ytrue: Y_test})
print(accuracy_out) 

# save model (for Problem 4)
tf.saved_model.simple_save(
    session = sess,
    export_dir = "normal_trained",
    inputs = {'x':x}, 
    outputs = {'prediction':prediction}
)

0.724
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: normal_trained/saved_model.pb


### 10. Again, for ease of grading, define a dictionary called `results_class`
* keys `’mu’, ’sigma2’, ’crossent_test’, ’class_error’` with values corresponding to the evaluation (again using sess.run) of your estimate of μ, σ2, the cross-entropy on the test set, and the classification error from the previous problem.

In [70]:
results_class = {'mu': est_mu, 'sigma2': est_sig, 'crossent_test': np.mean(output), 'class_error': accuracy_out}
results_class

{'mu': array([[-1.0076314 ,  0.00449135,  3.0048313 ]], dtype=float32),
 'sigma2': array([[0.533382 , 1.008755 , 1.5159012]], dtype=float32),
 'crossent_test': 1.3726938,
 'class_error': 0.724}

## Problem 3 Building a Complicated Model 

* The TensorFlow documentation includes tutorials on building a number of more complicated neural models in TensorFlow: https://www.tensorflow.org/tutorials/. In the left side panel, choose any one tutorial from under one of the headings “ML at production scale”, “Generative models”, “Images” or “Sequences” and follow it. Some of the tutorials include instructions along the lines of “We didn’t discuss this trick, try adding it!”. __You do not need to do any of these additional steps (though you will certainly learn something if you do!).__ 
* Warning: some of the tutorials require large amounts of training data. If this is the case, please do not include the training data in your submission! Instead, include a line of code to download the data from wherever it is stored. Also, some of the tutorials require especially long training time, (e.g., the neural models) so budget your time accordingly!
* Your submission for this problem should be a separate jupyter notebook called __`tutorial.ipynb`__ (no need to include your uniqname), which includes code to load the training and test data, build and train a model, and evaluate that model on test data. __That is, the code in `tutorial.ipynb` should perform all the training and testing steps performed in the tutorial, but without having to be run from the command line.__ Depending on which model you choose, training may take a long time if you use the preset number of training steps, so be sure to include a variable called __`nsteps`__ that controls the number of training steps, and set it to be something moderately small for your submission.
* Note: it will not be enough to simply copy the tutorial’s python code into your jupyter notebook, since the demo code supplied in the tutorials is meant to be run from the command line.
* Another note: If it was not clear, you are, for this problem and this problem only, __permitted to copy-paste code from the TensorFlow tutorials as much as you like without penalty__.
* One more note: Please make sure that in both __`tutorial.ipynb`__ and your main submission notebook __`uniqname.hw10.ipynb`__ you do not set any training times to be excessively long. You are free to set the number of training steps as you like for running on your own machine, but please set these parameters to something more reasonable in your submission so that we do not need to wait too long when running your notebook. 
* Aim to set the number of training steps so that we can run each of your submitted notebooks less than a minute.

## Problem 4 Running Models on Google Cloud Platform (9 points)
In this problem, you’ll get a bit of experience running TensorFlow jobs on Google Cloud Platform (GCP), Google’s cloud computing service. Google has provided us with a grant, which will provide each of you with free compute time on GCP.  

__Important__: this problem is very hard. It involves a sequence of fairly complicated operations in GCP. As such, I 
do not expect every student to complete it. Don’t worry about that. Unless you’ve done a lot of programming in the past, this problem is likely your first foray into learning a new tool largely from scratch instead of having my lectures to guide you. The ability to do this is a crucial one for any data scientist, so consider this a learning opportunity (and a sort of miniature final exam). Start early, read the documentation carefully, and come to office hours if you’re having trouble.
Good luck, and have fun!

The first thing you should do is claim your share of the grant money by visiting this link: https://google.secure.force.com/GCPEDU?cid=VYZbhLIwytS0UVxuWxYyRYgNVxPMOf37oBx0hRmx7 
You will need to supply your name and your UMich email. Please use the email address associated to your unique name (i.e., uniqname@umich.edu), so that we can easily determine which account belongs to which student. Once you have submitted this form, you will receive a confirmation email through which you can claim your compute credits.
These credits are valid on GCP until they expire in January 2020. Any credits left over after completing this homework are yours to use as you wish. Make sure that you claim your credits while signed in under your University of Michigan email, rather than a personal gmail account so that your project is correctly associated with your UMich email. If you accidentally claim the credits under a different address, add your unique name email
as an owner.  

Once you have claimed your credits, you should create a project, which will serve as a repository for your work on this problem. You should name your project `uniqname-stats507w19`, where uniqname is your unique name in all lower-case letters. Your project’s billing should be automatically linked to your credits, but you can verify this fact in the billing section dashboard in the GCP browser console. Please add both me (UMID `klevin`) and your
GSI Roger Fan (UMID `rogerfan`) as owners. You can do this in the IAM tab of the IAM
& admin dashboard by clicking “Add” near the top of the page, and listing our UMich emails and specifying our Roles as Project → Owner.  
 
__Note__: this problem is comparatively complicated, and involves a lot of moving parts. At the end of this problem (several pages below), I have included a list of all the files that should be included in your submission for this problem, as well as a list of what should be on your GCP project upon submission.  

__Important__: after the deadline (May 2nd at 10:00am) you should not edit your GCP project in any way until you receive a grade for the assignment in canvas. If your project indicates that any files or running processes have been altered after the deadline by a user other than klevin or rogerfan, we will assume this to be an instance of editing your assignment after the deadline, and you will receive a penalty.


### 1. Follow the tutorial
* https://cloud.google.com/ml-engine/docs/distributed-tensorflow-mnist-cloud-datalab: which will walk you through the process of training a CNN similar to the one we saw in class, but this time using resources on GCP instead of your own machine. 
* This tutorial will also have you set up a DataLab notebook, which is Google’s version of a Jupyter notebook, in which you can interactively draw your own digits and pass them to your neural net for classification. 
* __Important__: the tutorial will tell you to tear your nodes and storage down at the end. Do not do that. Leave everything running so that we can verify that you set things up correctly. It should only cost a few dollars to leave the datalab server and storage buckets running, but if you wish to conserve your credits, you can tear everything down and go through the tutorial again on the evening of May 1st or the (early!) morning of May 2nd.

### 2. Let us return to the classifier that you trained above on the normally-distributed data. 
* In this and the next several subproblems, we will take an adaptation of that model and upload it to GCP where it will serve as a prediction node similar to the one you built in the tutorial above. Train the same classifier on the same training data, but this time, save the resulting trained model in a directory called __`normal_trained`__. 
* You’ll want to use the __`tf.saved_model.simple_save_function`__.   
    Refer to the GCP STATS507: Data Analysis in Python 7 documentation at https://cloud.google.com/ml-engine/docs/deploying-models, and the documentation on the `tf.saved_model.simple_save function`, here: https://www.tensorflow.org/programmers_guide/saved_model#save_and_restore_models   
* __Please include a copy of this model directory in your submission.__   
* __Hint__: a stumbling block in this problem is figuring out what to supply as the inputs and outputs arguments to the simple_save function. Your arguments should look something like `inputs = {’x’:x}, outputs ={’prediction’:prediction}`.

In [71]:
# saved model function
# x = tf.placeholder(tf.float32, shape=(None, 1), name = 'output')
# prediction = tf.identity(ytrue, name = 'output')
# tf.saved_model.simple_save(
#     session = sess,
#     export_dir = "~/normal_trained",
#     inputs = {'x':x}, 
#     outputs = {'prediction':prediction}
# )

### 3. Let’s upload that model to GCP. 
* First, we need somewhere to put your model. You already set up a bucket in the tutorial, but let’s build a separate one. 
* Create a new bucket called __`mandyho_stats507w19-hw10-normal`__. You should be able to do this by making minor changes to the commands you ran in the tutorial, or by following the instructions at https://cloud.google.com/solutions/running-distributed-tensorflow-on-compute-engine#creating_a_cloud_storage_bucket.   
* Now, we need to upload your saved model to this bucket. There are several ways to do this, but the easiest is to follow the instructions at https://cloud.google.com/storage/docs/uploading-objects and upload your model through the GUI.   
* __Optional challenge (worth no extra points, just bragging rights)__: Instead of using the GUI, download and install the Google Cloud SDK, available at https://cloud.google.com/sdk/ and use the gsutil command line tool to upload your model to a storage bucket.

### 4. Now we need to create a version of your model. 
* Versions are how the GCP machine learning tools organize different instances of the same model (e.g., the same model trained on two different data sets). To do this, follow the instructions located at https://cloud.google.com/ml-engine/docs/deploying-models#creating_a_model_version, which will ask you to  
    * Upload a SavedModel directory (which you just did)  
    * Create a Cloud ML Engine model resource  
    * Create a Cloud ML Engine version resource (this specifies where your model is stored, among other information) 
    * Enable the appropriate permissions on your account.  

* Please name your model __`stats507w19_hw10_normal`__ (note the underscores here as opposed to the hyphens in the bucket name; see the documentation for the gcloud ml-engine versions command for how to delete versions, if need be). 
* __Important__: there are a number of pitfalls that you may encounter here, which I want to warn you about: A good way to check that your model resource and version are set up correctly is to run the command __`gcloud ml-engine versions describe "your_version_name" --model "your_model_name"`__. 
* The resulting output should include a line reading state: READY. You may notice that the Python version for the model appears as, say, python Version: ’2.7’, even though you used, say, Python 3.6. This should not be a problem, but you should make sure that the runtimeVersion is set correctly. __If the line runtimeVersion: ’1.0’ is appearing when you describe your version, you are likely headed for a bug__. You can prevent this bug by adding the flag --`runtime-version 1.6` to your gcloud ml-engine versions create command, and making sure that you are running TensorFlow version 1.6 on your local machine (i.e., the machine where you’re running Jupyter). Running version 1.7 locally while running 1.6 on GCP also seems to work fine.


### 5. Create a `.json` file corresponding to a single prediction instance on the input observation x = 4. 
* Name this `.json` file `instance.hw10.json`, and please include a copy of it in your submission.   
* __Hint__: you will likely find it easiest to use `nano/vim/emacs` to edit edit the `.json` file from the tutorial (GCP Cloud Shell has versions of all three of these editors). Doing this will allow you to edit a copy of the `.json` file directly in the GCP shell instead of going through the trouble of repeatedly downloading and uploading files. Being proficient with a shell-based text editor is also, generally speaking, a good skill for a data scientist to have.

### 6. Okay, it’s time to make a prediction.     
* Follow the instructions at https://cloud.google.com/ml-engine/docs/online-predict#requesting_predictions to submit the observation in your `.json` file to your running model. Your model will make a prediction, and print the output of the model to the screen. Please include a copy-paste of the command you ran to request this prediction as well as the resulting output.   
* __Which cluster does your model think x = 4 came from?__  
* __Hint__: if you are getting errors about dimensions being wrong, make sure that your instance has the correct dimension expected by your model.  
* __Second hint__: if you are encountering an error along the lines of Error during model execution: `AbortionError(code=StatusCode.INVALID_ARGUMENT, details=\"NodeDef mentions attr ’output_type’`, this is an indication that there is a mismatch between the version of TensorFlow that you used to create your model and the one that you are running on GCP. See the discussion of `gcloud ml-engine versions` create above.  

#### Which cluster does your model think x = 4 came from? Answer: from cluster 2

### That’s all of it! Great work! Here is a list of all files that should be included for this problem in your submission, as well as a list of what processes or resources should be left running in your GCP project:

* You should leave the __datalab notebook and its supporting resources__ (i.e., the prediction node and storage bucket) from the GCP ML tutorial running in your GCP project.
* Include in your submission a copy of the __saved model directory constructed from your classifier__. You should also have a copy of this directory in a storage bucket on GCP.
* Leave a storage bucket running on GCP containing your uploaded model directory. This storage bucket should contain a model with a __single version__.
* Include in your submission a __`.json`__ file representing a __single observation__. You need not include a copy of this file in a storage bucket on GCP; it will be stored by default in your GCP home directory if you created it in a text editor in the GCP shell.
* Include in your jupyter notebook a copy-paste of the __command__ you ran to request your model’s prediction on the `.json` file, and please include the __output__ that was printed to the screen in response to that prediction request.     

__Note__: Please make sure that the cell(s) that you copy-paste into is/are set to be Raw NBconvert cell(s), so that your commands display as code but are not run as code by Jupyter.

### Submission file:
* instance.hw10.json
* mandyho_hw10.ipynb
* tutorial.ipynb
* logistic_data
* normal_data
* normal_trained (on both local and GCP)