# TensorFlow

What is `TensorFlow`? 
The official page says:   
>TensorFlow is an open-source machine learning library for research and production. TensorFlow offers APIs for beginners and experts to develop for desktop, mobile, web, and cloud

pinpointing at least two reasons (or, alternatively, nowadays buzz-words) to learn it:
 - machine learning
 - open-source



In this tutorial, I would like to share my basic knowledge concerning this popular framework. It is made as an assignment project for *Learning from Unstructured Data* course. Therefore, it consists of datasets and exercises used during these classes.




### Everything's set up? 

Before digging into Tensorflow, let's make sure we have it installed, by displaying current Tensorflow's version. From now on, we will have the library imported under the alias `tf`.

In [1]:
import tensorflow as tf
print("Tensorflow", tf.__version__)

Tensorflow 1.10.0


Great!
Let's start the tutorial with understanding what is the idea of `TensorFlow`.

Tensorflow is a **dataflow programming framework**.

This means that we define and run a so called **computation graph**:
 - In each node, we can store operations, such as addition, multiplication. 
 - On the edges, we store inputs / outputs of these functions. These can be represented as N-ranked matrices, which are called **Tensors**.
 
The **Tensors** carry the data **flowing** in the graph.
> Control question: Why is Tensorflow called Tensorflow? :)

What is the reason for organizing data flow in such way? 

 - For a graph given, the dependencies between nodes are described explicitly. This makes it easier to exploit parallelism and distribution across multiple devices: CPUs, GPUS and others. Imagine a visualization of a graph, so that we can see at a quick glance which nodes can be executed in parallel, as the data flows in different channels (on different edges). 
 
Great promise of optimized computations! But we also have to note, that no matter how great is the graph we design, it will remain **static at run time**. Once the graph will be running, it is impossible to change it. 



### The APIs

TensorFlow exposes so called `Graph API`. We can use not only Python, but also GoLang, Java and C++ and provided library, helping to write out a graph in a special format, known as protobuf. 
> Protocol buffers are Google's language-neutral, platform-neutral, extensible **mechanism for serializing structured data** – think XML, but smaller, faster, and simpler. 

There is also `Session API`, providing an interface to the *Tensorflow C++ Runtime*. This is where all the *heavy lifting* and *logic behind computational nodes* happens, as well as ultimate distribution of operations to be executed on the hardware.  

### Creating first graph

Let's create first simple computation graph, that will add two numbers together. Because this tutorial is built of jupyter notebook cells, we begin with a cell that will make sure we have everything cleaned up:

In [2]:
# Clear the default graph stack 
# and reset the global default graph.

# Use to play around and avoid 'dead nodes'...

tf.reset_default_graph()

Three main data types of TensorFlow are:
1. Constants
2. Variables
3. Placeholders

To begin with, we will store two numbers in variables that are constants. We can put a `value` inside, which is of `dtype`, will not change (because is constant), and give it a `name`. For addition, we use `tf.add` function, which will return variable to store the result.

In [3]:
tf.reset_default_graph()

a = tf.constant(1.0, dtype=tf.float32, name='a')
b = tf.constant(2.0, dtype=tf.float32, name='b')
result = tf.add(a, b, name='result')

What can we do next?

In [4]:
print(result)

Tensor("result:0", shape=(), dtype=float32)


Using Python's `print` operation, we clearly see that we did not print the actual result. Instead, we printed information concerning tensor what will store that result. This is because we defined a graph, put some constant values, but did not let the data flow yet.

Therefore, we have to make use of already mentioned `Session API` to run the graph (or we can also run part of graph, if we would like to).

In [5]:
with tf.Session() as sess:
    print("Result =", sess.run(result))
    
### Alternatively:
# sess = tf.Session()
# print("Result =", sess.run(result))
# sess.close()

Result = 3.0


A default session is defined by calling `tf.Session()`. Then, we fire up the graph and calculate the result by running `sess.run(result)`. 

In the cell above, there is an alternative way of more manual management of session, but it is better to make use of Python's `with` statement for safety.

____________________________

After this introduction, we should move on to some machine learning! Recall all computations' and matrices' of data friend, `numpy`:

In [6]:
import numpy as np

and we can start with implementing Logistic Regression in Tensorflow.

### Logistic Regression


Let's use gene activity data, which is already stored in *data* directory. First, we need to read data from file, take Xs and Ys, as well as perform z-score normalization.

In [7]:
mat = np.loadtxt('data/gene_data.txt', delimiter='\t', dtype=np.float32)
Ys = mat[:, [-1]]
Xs = mat[:, :-1]
means = np.mean(Xs, 0)
stdevs = np.std(Xs, 0)
Xs = (Xs-means)/stdevs

print("{}:\n {} data records, described by {} features".format('data/gene_data.txt', str(Xs.shape[0]), str(Xs.shape[1])))

data/gene_data.txt:
 36 data records, described by 2 features


We prepare **constant** tensors to store data which we just read from file. 

Then, we need to create **variables** that will store parameters of logistic regression. These are `weights` and `bias`, that we expect to be altered by the algorithm, heading more optimal solution. In TensorFlow, variables represent a tensors whose values can be changed by running operations on them.
> A TensorFlow variable is the best way to represent shared, persistent state manipulated by your program.

 - for `weights`, we make use of `tf.random_normal` that outputs random values from a normal distribution, for given shape: (2, 1), which stands for 2 features we have, per one record of data at a time *although we will in general perform matrix operations*
 - we set initial value of `bias` to 0.0
 - we define `net` tensor, using `tf.matmul` and `tf.add` for calculating weighted linear combination of the input, with a bias
 - and `output` tensor, which we obtain by applying `sigmoid` function for the tensor above
 
Looks like we have all components of logistic regression - we can input some data and receive some output prediction - but we still need another ones to make the learning possible. Therefore, we further define:

 - logistic `cost` function: notice that we use tensors joined together with arithmetical operations, that also results in a suitable tensor; applying `tf.reduce_mean` is equivalent to `np.mean` - simply averaging over all elements
 - an `optimizer`, which is an instance of `GradientDescentOptimizer` class, that implements algorithm specified in its name; we set its `learning_rate` to 0.1
 - and `training_op`, result of calling `minimize` function which is a method of the class used above
 
At a first glance, it seems as we didn't need to know much about Gradient Descent algorithm, as it is already implemented and easy to use. This is one of the advantages TensorFlow has, that does not concern performance and low-level characteristics, but rather convenience and flexibility at the same time.

>What happens in `.minimize(loss=cost)`?
<br>This function first computes gradients of all variables provided. TensorFlow automatically assumes that user-defined variables will be *trained*. Indeed, we defined complete graph, as well as `cost` that depends on `output`, and so on... 
For each variable, the gradient can be another Tensor, or can represent None when no gradient exists. 



In [8]:
tf.reset_default_graph() # :) 

X = tf.constant(Xs, name="X")
y = tf.constant(Ys, name="y")

weights = tf.Variable(tf.random_normal(shape=(2, 1)), name="weights")
bias = tf.Variable(0.0, name="bias")
net = tf.add(tf.matmul(X, weights), bias, name="net")
output = tf.nn.sigmoid(net, name="output")

cost = -tf.reduce_mean(y * tf.log(output) + (1-y) * (tf.log(1-output)))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.25)
training_op = optimizer.minimize(loss=cost)

Now we have all set and will run the graph 20 times. This means that we will calculate current loss 20 times, calculating the gradient each time and adjusting the weights based on learning rate.

Because as the documentation states, variable initializers must be run explicitly before other operations (and we surely have declared some), we will use convenient one-line method that will do the trick. This is included in the first line below.

Notice how we pass tensors as parameters to the `sess.run` method, so that they are executed and returned. We skip the first variable returned by `.minimize` by a `_` variable, for it will not output any number that is in our interest; we rather want it to perform calculations and result in weights changing. 

Running the cell below will train our model 20 times, and print current logistic loss for each epoch.

In [9]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(20):
        _, current_loss = sess.run([training_op, cost])
        print("{} : {}".format(str(epoch+1), str(current_loss)))
    sess.close()

1 : 0.5519187
2 : 0.50460744
3 : 0.46623346
4 : 0.43458194
5 : 0.40806475
6 : 0.38553414
7 : 0.36614892
8 : 0.3492829
9 : 0.33446258
10 : 0.3213244
11 : 0.30958506
12 : 0.29902112
13 : 0.28945413
14 : 0.28073993
15 : 0.272761
16 : 0.26542044
17 : 0.25863796
18 : 0.2523462
19 : 0.24648847
20 : 0.2410166


Now we should be happy to see that after each epoch, the loss value tends to decrease, so our graph is working. 

But we want something more from TensorFlow, and this 'something' is probably *neural networks*. How would a very simple neural network look like?

We already have an example with logistic regression, and we can think of it as a one layer neural network. Therefore, the example of single neuron that represents OR should be from now understandable. It only differs from the Logistic Regression example by the objective function, which is not a logistic loss, but a mean squared error. It is not a big change at all; we take same tensors to determine the value of loss, but apply different operations.



### One neuron that learns OR function

In [10]:
tf.reset_default_graph()

# Pairs of inputs (possible combinations)
Xs = np.array([(0,0), (0,1), (1,0), (1,1)])
# Results on applying OR function on the pairs above
Ys = np.array([0, 1, 1, 1])

X = tf.constant(Xs.astype(np.float32), name="X")
y = tf.constant(Ys.astype(np.float32).reshape((-1,1)), name="y")

weights = tf.Variable(tf.random_normal((2,1)), name="weights")
bias = tf.Variable(0.0, name="bias")
net = tf.add(tf.matmul(X, weights), bias, name="net")
output = tf.nn.sigmoid(net, name="output")

mse = tf.reduce_mean(tf.square(y-output),name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
training_op = optimizer.minimize(loss=mse)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(1001):
        _, current_mse = sess.run([training_op, mse])
        if epoch % 100 == 0:
            print("{} : mse={}".format(str(epoch), str(current_mse)))

0 : mse=0.24710208
100 : mse=0.1285279
200 : mse=0.106404796
300 : mse=0.090100385
400 : mse=0.07716124
500 : mse=0.06685236
600 : mse=0.05856094
700 : mse=0.051808838
800 : mse=0.046241608
900 : mse=0.04159805
1000 : mse=0.037684083


### XOR Troubles

The disadvantage of having only one neuron is that we can question it by a popular *counter-argument* - can one neuron learn XOR function? 
> No, because one neuron is just a linear classifier, and we can not separate XOR with only one line to split the space

Therefore, we will now implement a small neural network with one hidden layer, that can solve the XOR problem, and see how we can connect more units together.



More neurons and layers naturally imply that we will need more tensors. It is clearly visible in the code cell below - we define weights, bias, net and output tensors for each of the neurons. Two integer numbers are added to variables' names', first of them indicates number of layer (hidden has number 1, the next one is output layer and has number 2), and the next number is for numbering a neuron within its layer.

> Notice how we multiply tensor `X` with both `weights11` and `weights12` for each neuron, subsequently adding bias.
<br>To join outputs of two neurons in hidden layer together, we make use of `tf.concat` function, that concatenates two hidden layer outputs along columns (it is specified by axis=1, as we index axis starting from 0 - rows)

It wasn't that hard, many things look similar to what we already know! Run the cell below and see how it works, will those neurons learn?

In [11]:
tf.reset_default_graph()

# Pairs of inputs (possible combinations)
X = tf.constant(np.array([(0,0),(0,1),(1,0),(1,1)]).astype(np.float32), name="X")
# Results on applying XOR function on the pairs above
y = tf.constant(np.array([0,1,1,0]).astype(np.float32).reshape((-1,1)), name="y")

# HIDDEN LAYER:
#     1st neuron:
weights11 = tf.Variable(tf.random_normal((2, 1)), name="weights11")
bias11 = tf.Variable(0.0, name="bias11")
net11 = tf.add(tf.matmul(X, weights11), bias11, name="net11")
output11 = tf.nn.sigmoid(net11, name="output11")

#     2nd neuron:
weights12 = tf.Variable(tf.random_normal((2, 1)), name="weights12")
bias12 = tf.Variable(0.0, name="bias12")
net12 = tf.add(tf.matmul(X, weights12), bias12, name="net12")
output12 = tf.nn.sigmoid(net12, name="output12")

# OUTPUT LAYER:
#     just one neuron:
weights21 = tf.Variable(tf.random_normal((2, 1)), name="weights21")
bias21 = tf.Variable(0.0, name="bias21")

input21 = tf.concat([output11, output12], axis=1)

net21 = tf.add(tf.matmul(input21, weights21), bias21, name="net12")
output = tf.nn.sigmoid(net21, name="output")

mse = tf.reduce_mean(tf.square(y-output),name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
training_op = optimizer.minimize(loss=mse)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(20001):
        _, cost = sess.run([training_op, mse])
        if epoch % 2000 == 0:
            print("{} : mse={}".format(str(epoch), str(cost)))

0 : mse=0.25095218
2000 : mse=0.24822871
4000 : mse=0.23079756
6000 : mse=0.10747802
8000 : mse=0.02328165
10000 : mse=0.010499645
12000 : mse=0.006452598
14000 : mse=0.0045708744
16000 : mse=0.0035062141
18000 : mse=0.0028287258
20000 : mse=0.0023626816


Well, we see that the network is learning, but the cell above is made up of 42 lines. For only four different pairs of binary values. We declared tensors that store each neuron's parameters separately. And used all examples existing at a time, each epoch...

This won't scale too good if the problem gets complicated. Thus, time to move to such problem! 

#### MNIST dataset

MNIST dataset is a very famous set of images of handwritten digits. It provides 60 000 examples in a training set, and 10 000 examples in a test set, all of which have been size-normalized and centered in a fixed-size image. Because of that, we can apply it easly in this tutorial, without spending time on preprocessing and formatting the data.

There is a function already provided that will prepare all the data we need. If needed, the dataset will be downloaded from the Internet, and then it will be cached locally.

In [12]:
from tensorflow import keras

mnist  = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images.shape, test_images.shape

((60000, 28, 28), (10000, 28, 28))

We can see that the dataset we loaded has matching sizes of training and test set, and that each image's size is 28x28 pixels in grayscale. 

Let's keep `test_images` data for the very end, final evaluation of the model we are about to create. Thus, we need to split entire training set to training and validation set. Say we need only 10.000 for validation and keep the rest for training.

Another thing that we need to do with our data is altering the dimensions. We are going to create a fully-connected layer, so that we can't really input 2D image into it. To achieve that, we will use:
 - `.reshape(-1, 28*28)`, in which we specified expected shape as `-1, 28*28`. What does it mean? `28*28` indicates that we want to create a 1D vector out of 2D image matrix. We further use `-1` that will tell the function to infer the remaining dimensions, in case we already forgot how many image samples we have ;)

In [13]:
train_images = train_images[:50000].reshape(-1, 28*28)
train_labels = train_labels[:50000]

valid_images = train_images[50000:].reshape(-1, 28*28)
valid_labels = train_labels[50000:]

test_images = test_images.reshape(-1, 28*28)

num_features = train_images.shape[1]
num_labels = len(np.unique(train_labels))

train_images.shape

(50000, 784)

The new size of `train_images` shows that we managed to *flatten* the images into 1D vectors, with 784 elements each. For readability, we also defined `num_features` variable, that should actually store number 784, for this is the amount of features that describe every single image. We also easly determine `num_labels` by counting all unique labels that can be found in `train_images` set of data.

Now, based on the previous conclusions about too much lines of code, let's write a function that will be able to create one layer of neurons. We would like to tell such function:

 - what is the input tensor that will provide thata to the layer we are creating,
 - how many neurons we would like to have in a layer,
 - which activation function we want to apply on the neurons, since we know many other than *sigmoid*,
 - and preferably give this layer a *name*, as we already saw that we can name the tensors
 
We mentioned above that it would be convenient to change each layer's activation function via the parameter. Well, in TensorFlow, there are many activation functions ready to use. You can check out all the possibities __[here](https://www.tensorflow.org/api_docs/python/tf/keras/activations)__.

Let us see the implementation of `layer()` function below:


In [15]:
def layer(inputs, neurons, name, activation_f=None):
    with tf.name_scope(name):
        weights = tf.Variable(tf.random_normal(
            (int(inputs.shape[1]), neurons) ), name="weights")
        bias = tf.Variable(tf.zeros(neurons), name="bias")
        net = tf.add(tf.matmul(inputs, weights), bias, name="net")
        if activation_f:
            try:
                return activation_f(net)
            except:
                print("Failed to apply activation_f")
                return net
        else:
            return net
        
def layer(inputs, neurons, name, activation_f=None):
    with tf.name_scope(name):
        weights = tf.Variable(tf.random_normal(
            (int(inputs.shape[1]), neurons) ), name="weights")
        bias = tf.Variable(tf.zeros([neurons]), name="bias")
        net = tf.add(tf.matmul(inputs, weights), bias, name="net")
        if activation_f:
            if activation_f == 'sigmoid':
                return tf.nn.sigmoid(net)
            elif activation_f == 'relu':
                return tf.nn.relu(net)
            elif activation_f == 'relu6':
                return tf.nn.relu6(net)
            elif activation_f == 'leaky_relu':
                return tf.nn.leaky_relu(net)
            elif activation_f == 'elu':
                return tf.nn.elu(net)
            else:
                print("Unknown activation function name")
                return net
        else:
            return net

In [16]:
tf.reset_default_graph()

# placeholders
features = tf.placeholder(tf.float32, shape=(None, num_features))
labels = tf.placeholder(tf.int64, shape=(None))
one_hot_labels = tf.one_hot(labels, depth = num_labels,
                            on_value = 1.0, off_value = 0.0, axis = -1)

#network
layer1 = layer(features, 30, 'layer1', activation_f='sigmoid')
layer2 = layer(layer1, 20,'layer2', activation_f='sigmoid')
layer3 = layer(layer2, 10,'layer3')

last_layer = layer3

# loss
loss = tf.reduce_mean(tf.losses.softmax_cross_entropy(
    onehot_labels=one_hot_labels,
    logits=last_layer))

# optimizer
learning_rate = 0.25
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
class_probabilities = tf.nn.softmax(last_layer)

prediction = tf.argmax(class_probabilities, axis=1)
accuracy = tf.reduce_mean(tf.cast(tf.equal(labels, prediction), tf.float32))

In [17]:
shuffled = np.array(range(train_images.shape[0]))

batch_size = 100
num_epochs = 50

with tf.Session() as session:
    tf.global_variables_initializer().run(session=session)
    for epoch in range(num_epochs):
        offset = 0
        np.random.shuffle(shuffled)
        while offset<train_images.shape[0]:
            batch_images = train_images[shuffled[offset:offset+batch_size]]
            batch_labels = train_labels[shuffled[offset:offset+batch_size]]
            batch_dict = {features : batch_images, labels : batch_labels}
            session.run([optimizer], feed_dict=batch_dict)
            offset += batch_size

        validation_dict = {features : valid_images, labels : valid_labels}
        acc = session.run([accuracy], feed_dict=validation_dict)
        if epoch % 5 == 0:
            print("Epoch {}, accuracy: {}".format(str(epoch), str(acc)))

AbortedError: Operation received an exception:Status: 3, message: could not initialize a memory descriptor, in file tensorflow/core/kernels/mkl_softmax_op.cc:136
	 [[Node: Softmax = _MklSoftmax[T=DT_FLOAT, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](layer3/net, DMT/_0)]]

Caused by op 'Softmax', defined at:
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 499, in start
    self.io_loop.start()
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/asyncio/base_events.py", line 438, in run_forever
    self._run_once()
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/asyncio/base_events.py", line 1451, in _run_once
    handle._run()
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tornado/ioloop.py", line 758, in _run_callback
    ret = callback()
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tornado/gen.py", line 1233, in inner
    self.run()
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tornado/gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 346, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 259, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 513, in execute_request
    user_expressions, allow_stdin,
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 294, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2817, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2843, in _run_cell
    return runner(coro)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/IPython/core/async_helpers.py", line 67, in _pseudo_sync_runner
    coro.send(None)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3018, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3183, in run_ast_nodes
    if (yield from self.run_code(code, result)):
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3265, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-16-0d4043d1bd3c>", line 24, in <module>
    class_probabilities = tf.nn.softmax(last_layer)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1738, in softmax
    return _softmax(logits, gen_nn_ops.softmax, axis, name)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1673, in _softmax
    return compute_op(logits, name=name)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 7137, in softmax
    "Softmax", logits=logits, name=name)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
    op_def=op_def)
  File "/home/jells123/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

AbortedError (see above for traceback): Operation received an exception:Status: 3, message: could not initialize a memory descriptor, in file tensorflow/core/kernels/mkl_softmax_op.cc:136
	 [[Node: Softmax = _MklSoftmax[T=DT_FLOAT, _kernel="MklOp", _device="/job:localhost/replica:0/task:0/device:CPU:0"](layer3/net, DMT/_0)]]


REFERENCES:
    https://medium.com/@ouwenhuang/tensorflow-graphs-are-just-protobufs-9df51fc7d08d
    https://medium.com/themlblog/getting-started-with-tensorflow-constants-variables-placeholders-and-sessions-80900727b489
    https://developers.google.com/protocol-buffers/
    http://yann.lecun.com/exdb/mnist/