# TensorFlow

What is `TensorFlow`? 
The official page says:   
>TensorFlow is an open-source machine learning library for research and production. TensorFlow offers APIs for beginners and experts to develop for desktop, mobile, web, and cloud

pinpointing at least two reasons (or, alternatively, nowadays buzz-words) to learn it:
 - machine learning
 - open-source



In this tutorial, I would like to share my basic knowledge concerning this popular framework. It is made as an assignment project for *Learning from Unstructured Data* course. Therefore, it consists of datasets and exercises used during these classes.




### Everything's set up? 

Before digging into Tensorflow, let's make sure we have it installed, by displaying current Tensorflow's version. From now on, we will have the library imported under the alias `tf`.

In [3]:
import tensorflow as tf
print("Tensorflow", tf.__version__)

Tensorflow 1.10.0


Great!
Let's start the tutorial with understanding what is the idea of `TensorFlow`.

Tensorflow is a **dataflow programming framework**.

This means that we define and run a so called **computation graph**:
 - In each node, we can store operations, such as addition, multiplication. 
 - On the edges, we store inputs / outputs of these functions. These can be represented as N-ranked matrices, which are called **Tensors**.
 
The **Tensors** carry the data **flowing** in the graph.
> Control question: Why is Tensorflow called Tensorflow? :)

What is the reason for organizing data flow in such way? 

 - For a graph given, the dependencies between nodes are described explicitly. This makes it easier to exploit parallelism and distribution across multiple devices: CPUs, GPUS and others. Imagine a visualization of a graph, so that we can see at a quick glance which nodes can be executed in parallel, as the data flows in different channels (on different edges). 
 
Great promise of optimized computations! But we also have to note, that no matter how great is the graph we design, it will remain **static at run time**. Once the graph will be running, it is impossible to change it. 



### The APIs

TensorFlow exposes so called `Graph API`. We can use not only Python, but also GoLang, Java and C++ and provided library, helping to write out a graph in a special format, known as protobuf. 
> Protocol buffers are Google's language-neutral, platform-neutral, extensible **mechanism for serializing structured data** – think XML, but smaller, faster, and simpler. 

There is also `Session API`, providing an interface to the *Tensorflow C++ Runtime*. This is where all the *heavy lifting* and *logic behind computational nodes* happens, as well as ultimate distribution of operations to be executed on the hardware.  

### Creating first graph

Let's create first simple computation graph, that will add two numbers together. Because this tutorial is built of jupyter notebook cells, we begin with a cell that will make sure we have everything cleaned up:

In [10]:
# Clear the default graph stack 
# and reset the global default graph.

# Use to play around and avoid 'dead nodes'...

tf.reset_default_graph()

Three main data types of TensorFlow are:
1. Constants
2. Variables
3. Placeholders

To begin with, we will store two numbers in variables that are constants. We can put a `value` inside, which is of `dtype`, will not change (because is constant), and give it a `name`. For addition, we use `tf.add` function, which will return variable to store the result.

In [11]:
tf.reset_default_graph()

a = tf.constant(1.0, dtype=tf.float32, name='a')
b = tf.constant(2.0, dtype=tf.float32, name='b')
result = tf.add(a, b, name='result')

What can we do next?

In [12]:
print(result)

Tensor("result:0", shape=(), dtype=float32)


Using Python's `print` operation, we clearly see that we did not print the actual result. Instead, we printed information concerning tensor what will store that result. This is because we defined a graph, put some constant values, but did not let the data flow yet.

Therefore, we have to make use of already mentioned `Session API` to run the graph (or we can also run part of graph, if we would like to).

In [18]:
with tf.Session() as sess:
    print("Result =", sess.run(result))
    
### Alternatively:
# sess = tf.Session()
# print("Result =", sess.run(result))
# sess.close()

Result = 3.0


A default session is defined by calling `tf.Session()`. Then, we fire up the graph and calculate the result by running `sess.run(result)`. 

In the cell above, there is an alternative way of more manual management of session, but it is better to make use of Python's `with` statement for safety.

____________________________

After this introduction, we should move on to some machine learning! Recall all computations' and matrices' of data friend, `numpy`:

In [20]:
import numpy as np

and we can start with implementing Logistic Regression in Tensorflow.

### Logistic Regression


Let's use gene activity data, which is already stored in *data* directory. First, we need to read data from file, take Xs and Ys, as well as perform z-score normalization.

In [34]:
mat = np.loadtxt('data/gene_data.txt', delimiter='\t', dtype=np.float32)
Ys = mat[:, [-1]]
Xs = mat[:, :-1]
means = np.mean(Xs, 0)
stdevs = np.std(Xs, 0)
Xs = (Xs-means)/stdevs

print("{}:\n {} data records, described by {} features".format('data/gene_data.txt', str(Xs.shape[0]), str(Xs.shape[1])))

data/gene_data.txt:
 36 data records, described by 2 features


We prepare **constant** tensors to store data which we just read from file. 

Then, we need to create **variables** that will store parameters of logistic regression. These are `weights` and `bias`, that we expect to be altered by the algorithm, heading more optimal solution. In TensorFlow, variables represent a tensors whose values can be changed by running operations on them.
> A TensorFlow variable is the best way to represent shared, persistent state manipulated by your program.

 - for `weights`, we make use of `tf.random_normal` that outputs random values from a normal distribution, for given shape: (2, 1), which stands for 2 features we have, per one record of data at a time *although we will in general perform matrix operations*
 - we set initial value of `bias` to 0.0
 - we define `net` tensor, using `tf.matmul` and `tf.add` for calculating weighted linear combination of the input, with a bias
 - and `output` tensor, which we obtain by applying `sigmoid` function for the tensor above
 
Looks like we have all components of logistic regression - we can input some data and receive some output prediction - but we still need another ones to make the learning possible. Therefore, we further define:

 - logistic `cost` function: notice that we use tensors joined together with arithmetical operations, that also results in a suitable tensor; applying `tf.reduce_mean` is equivalent to `np.mean` - simply averaging over all elements
 - an `optimizer`, which is an instance of `GradientDescentOptimizer` class, that implements algorithm specified in its name; we set its `learning_rate` to 0.1
 - and `training_op`, result of calling `minimize` function which is a method of the class used above
 
At a first glance, it seems as we didn't need to know much about Gradient Descent algorithm, as it is already implemented and easy to use. This is one of the advantages TensorFlow has, that does not concern performance and low-level characteristics, but rather convenience and flexibility at the same time.

>What happens in `.minimize(loss=cost)`?
<br>This function first computes gradients of all variables provided. TensorFlow automatically assumes that user-defined variables will be *trained*. Indeed, we defined complete graph, as well as `cost` that depends on `output`, and so on... 
For each variable, the gradient can be another Tensor, or can represent None when no gradient exists. 



In [57]:
tf.reset_default_graph() # :) 

X = tf.constant(Xs, name="X")
y = tf.constant(Ys, name="y")

weights = tf.Variable(tf.random_normal(shape=(2, 1)), name="weights")
bias = tf.Variable(0.0, name="bias")
net = tf.add(tf.matmul(X, weights), bias, name="net")
output = tf.nn.sigmoid(net, name="output")

cost = -tf.reduce_mean(y * tf.log(output) + (1-y) * (tf.log(1-output)))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.25)
training_op = optimizer.minimize(loss=cost)

Now we have all set and will run the graph 20 times. This means that we will calculate current loss 20 times, calculating the gradient each time and adjusting the weights based on learning rate.

Because as the documentation states, variable initializers must be run explicitly before other operations (and we surely have declared some), we will use convenient one-line method that will do the trick. This is included in the first line below.

Notice how we pass tensors as parameters to the `sess.run` method, so that they are executed and returned. We skip the first variable returned by `.minimize` by a `_` variable, for it will not output any number that is in our interest; we rather want it to perform calculations and result in weights changing. 

Running the cell below will train our model 20 times, and print current logistic loss for each epoch.

In [64]:
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(20):
        _, current_loss = sess.run([training_op, cost])
        print("{} : {}".format(str(epoch+1), str(current_loss)))
    sess.close()

1 : 0.442709
2 : 0.41387594
3 : 0.38964933
4 : 0.3690012
5 : 0.35118067
6 : 0.3356298
7 : 0.32192647
8 : 0.3097464
9 : 0.29883677
10 : 0.28899753
11 : 0.28006873
12 : 0.27192074
13 : 0.2644478
14 : 0.2575625
15 : 0.251192
16 : 0.24527533
17 : 0.2397609
18 : 0.23460478
19 : 0.2297694
20 : 0.22522235


Now we should be happy to see that after each epoch, the loss value tends to decrease, so our graph is working. 

But we want something more from TensorFlow, and this 'something' is probably *neural networks*. How would a very simple neural network look like?

We already have an example with logistic regression, and we can think of it as a one layer neural network. Therefore, the example of single neuron that represents OR should be from now understandable. It only differs from the Logistic Regression example by the objective function, which is not a logistic loss, but a mean squared error. It is not a big change at all; we take same tensors to determine the value of loss, but apply different operations.



### One neuron that learns OR function

In [76]:
tf.reset_default_graph()

# Pairs of inputs (possible combinations)
Xs = np.array([(0,0), (0,1), (1,0), (1,1)])
# Results on applying OR function on the pairs above
Ys = np.array([0, 1, 1, 1])

X = tf.constant(Xs.astype(np.float32), name="X")
y = tf.constant(Ys.astype(np.float32).reshape((-1,1)), name="y")

weights = tf.Variable(tf.random_normal((2,1)), name="weights")
bias = tf.Variable(0.0, name="bias")
net = tf.add(tf.matmul(X, weights), bias, name="net")
output = tf.nn.sigmoid(net, name="output")

mse = tf.reduce_mean(tf.square(y-output),name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
training_op = optimizer.minimize(loss=mse)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(1001):
        _, current_mse = sess.run([training_op, mse])
        if epoch % 100 == 0:
            print("{} : mse={}".format(str(epoch), str(current_mse)))

0 : mse=0.39391166
100 : mse=0.15326461
200 : mse=0.12623389
300 : mse=0.10645351
400 : mse=0.09030949
500 : mse=0.07739184
600 : mse=0.067077994
700 : mse=0.058769055
800 : mse=0.051994704
900 : mse=0.046404857
1000 : mse=0.041740462


### XOR Troubles

The disadvantage of having only one neuron is that we can question it by a popular *counter-argument* - can one neuron learn XOR function? 
> No, because one neuron is just a linear classifier, and we can not separate XOR with only one line to split the space

Therefore, we will now implement a small neural network with one hidden layer, that can solve the XOR problem, and see how we can connect more units together.



More neurons and layers naturally imply that we will need more tensors. It is clearly visible in the code cell below - we define weights, bias, net and output tensors for each of the neurons. Two integer numbers are added to variables' names', first of them indicates number of layer (hidden has number 1, the next one is output layer and has number 2), and the next number is for numbering a neuron within its layer.

> Notice how we multiply tensor `X` with both `weights11` and `weights12` for each neuron, subsequently adding bias.
<br>To join outputs of two neurons in hidden layer together, we make use of `tf.concat` function, that concatenates two hidden layer outputs along columns (it is specified by axis=1, as we index axis starting from 0 - rows)

It wasn't that hard, many things look similar to what we already know! Run the cell below and see how it works, will those neurons learn?

In [82]:
tf.reset_default_graph()

# Pairs of inputs (possible combinations)
X = tf.constant(np.array([(0,0),(0,1),(1,0),(1,1)]).astype(np.float32), name="X")
# Results on applying XOR function on the pairs above
y = tf.constant(np.array([0,1,1,0]).astype(np.float32).reshape((-1,1)), name="y")

# HIDDEN LAYER:
#     1st neuron:
weights11 = tf.Variable(tf.random_normal((2, 1)), name="weights11")
bias11 = tf.Variable(0.0, name="bias11")
net11 = tf.add(tf.matmul(X, weights11), bias11, name="net11")
output11 = tf.nn.sigmoid(net11, name="output11")

#     2nd neuron:
weights12 = tf.Variable(tf.random_normal((2, 1)), name="weights12")
bias12 = tf.Variable(0.0, name="bias12")
net12 = tf.add(tf.matmul(X, weights12), bias12, name="net12")
output12 = tf.nn.sigmoid(net12, name="output12")

# OUTPUT LAYER:
#     just one neuron:
weights21 = tf.Variable(tf.random_normal((2, 1)), name="weights21")
bias21 = tf.Variable(0.0, name="bias21")

input21 = tf.concat([output11, output12], axis=1)

net21 = tf.add(tf.matmul(input21, weights21), bias21, name="net12")
output = tf.nn.sigmoid(net21, name="output")

mse = tf.reduce_mean(tf.square(y-output),name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
training_op = optimizer.minimize(loss=mse)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(20001):
        _, cost = sess.run([training_op, mse])
        if epoch % 2000 == 0:
            print("{} : mse={}".format(str(epoch), str(cost)))

0 : mse=0.2536455
2000 : mse=0.24906723
4000 : mse=0.23994184
6000 : mse=0.18723357
8000 : mse=0.1490287
10000 : mse=0.13765413
12000 : mse=0.13320681
14000 : mse=0.13096595
16000 : mse=0.12964542
18000 : mse=0.12878445
20000 : mse=0.12818265


Well, we see that the network is learning, but the cell above is made up of 42 lines. For only four different pairs of binary values. This won't scale too good if the problem gets complicated...  Thus, time to move to such problem! 

#### MNIST dataset

REFERENCES:
    https://medium.com/@ouwenhuang/tensorflow-graphs-are-just-protobufs-9df51fc7d08d
    https://medium.com/themlblog/getting-started-with-tensorflow-constants-variables-placeholders-and-sessions-80900727b489
    https://developers.google.com/protocol-buffers/