## Neural networks and images in Tensorflow
** How too install tensorflow: https://www.tensorflow.org/install/ **

* `pip install tensorflow` -- **cpu-only** version for Linux & Mac OSX
* if you want GPU support try -- `pip install tensorflow-gpu`

For this notebook:
```
conda create -n py36_tensorflow python=3.6 anaconda
source activate py36_tensorflow
pip install tensorflow
```

#### Colab link: https://colab.research.google.com/drive/18xjvLspViCwTUXTBNiz_xKxlUblQuGPU

### About CNNs
Convolutional layers extract features - quantitative representations of some attributes. 

After the extraction you can use these features for classification, for example.

<img src="img/act.png" width="800">

## Let's look at popular architectures.

### VGG

<img src="img/vgg.png" width="600">

### ResNet (Shortcut + Batch Normalization)
 
<img src="img/resnet.png" width="800">
 
### GoogleNet (Predict classes for many times)
 
<img src="img/gln.png" width="800">


## Deeper layer $\to$ more complex features.

<img src="img/feat.png" width="800">

## In practice it is easier to learn pre-trained NN (Fine-Tuning)

<img src="img/ft.jpg" width="600">


## Dark Magic 

<img src="img/dm.png" width="600">

In [None]:
import random
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

** Let's play with tensorflow. ** 

In [None]:
tf.__version__

In [None]:
tf.test.is_gpu_available()

# How does it work?
1. define placeholders where you'll send inputs;
2. make symbolic graph: a recipe for mathematical transformation of those placeholders;
3. compute outputs of your graph with particular values for each placeholder
  * output.eval({placeholder:value}) 
  * s.run(output, {placeholder:value})

* So far there are two main entities: "placeholder" and "transformation"
* Both can be numbers, vectors, matrices, tensors, etc.
* Both can be int32/64, floats of booleans (uint8) of various size.

* You can define new transformations as an arbitrary operation on placeholders and other transformations
 * tf.reduce_sum(tf.arange(N)\**2) are 3 sequential transformations of placeholder N
 * There's a tensorflow symbolic version for every numpy function
   * `a+b, a/b, a**b, ...` behave just like in numpy
   * np.mean -> tf.reduce_mean
   * np.arange -> tf.range
   * np.cumsum -> tf.cumsum
   * If if you can't find the op you need, see the [docs](https://www.tensorflow.org/api_docs/python).

### Important note
Functions are similar to numpy. But we should remember that we declarate computational graph, and these functions will compute only while session is on.


## Example 1 - placeholders
`Placeholder is a tensor of fix size.`

In [None]:
N = tf.placeholder('int64', name="input_to_your_name")
# N square.
result_0 = tf.pow(N,2)
# Sum of squares of numbers up to N.
result_1 = tf.reduce_sum((tf.range(N)**2))

In [None]:
result_0

In [None]:
result_1

#### All initial parameters are submited to session as dict ({Placeholder_name: value}).

In [None]:
sess = tf.Session()

In [None]:
print(sess.run(result_0, {N:100}))

In [None]:
print(sess.run(result_1, {N:100}))

## Example 2 - placeholders

In [None]:
x = tf.placeholder(dtype='float32', shape=(10,))
y = tf.placeholder(dtype='float32', shape=(10,))

In [None]:
func = 100 * tf.multiply(y,x)

In [None]:
dummy = np.arange(10).astype('float32')
print(func.eval({x:dummy, y:dummy}, session=sess))

## Example 3 - gradients
It is really easy to compute gradients.

In [None]:
my_scalar = tf.placeholder('float32')
scalar_squared = my_scalar**2

derivative = tf.gradients(scalar_squared, my_scalar)[0]

x = np.linspace(-3,3)
x_squared, x_squared_der = sess.run(
    [scalar_squared, derivative],
    {my_scalar:x})

plt.plot(x, x_squared, label="x^2")
plt.plot(x, x_squared_der, label="derivative")
plt.legend();

### Important note
This way we can automatically compute gradients even for nasty functions. It is important, that this is **not a numerical differentiation**, there are formulas for derivatives within tensorflow, and it goes through graph vertexes and calculate the derivative of composite function.


## Example 4 - variables

The inputs and transformations have no value outside function call. This isn't too comfortable if you want your model to have parameters (e.g. network weights) that are always present, but can change their value over time.

Tensorflow solves this with `tf.Variable` objects.
* You can assign variable a value at any time in your graph
* Unlike placeholders, there's no need to explicitly pass values to variables when `s.run(...)`-ing
* You can use variables the same way you use transformations 

In [None]:
v_1 = tf.Variable(initial_value=np.ones(5))

In [None]:
# IMPORTANT one:
# initialize variable(s) with initial values
sess.run(tf.global_variables_initializer())

#evaluating shared variable (outside symbolicd graph)
print("initial value", sess.run(v_1))

# within symbolic graph you use them just as any other inout or transformation, not "get value" needed

## Example 5 - logistic regression with tensorflow

In [None]:
from sklearn.datasets import load_digits
mnist = load_digits(2)
X, y = mnist.data, mnist.target

print("y [shape - %s]:" % (str(y.shape)), y[:10])
print("X [shape - %s]:" % (str(X.shape)))

In [None]:
plt.imshow(X[0].reshape([8,8]))

In [None]:
weights = tf.Variable(np.zeros((64, 1)), dtype='float32')
input_X = tf.placeholder('float32', shape=(None, 64))
input_y = tf.placeholder('float32', shape=(None, 1))

In [None]:
logits = tf.matmul(input_X, weights)
# hand-made sigmoid
predicted_y = 1. / (1. + tf.exp(-1. * logits))
epsilon = 0.005
# hand-made logloss
loss = -1. * tf.reduce_sum( 
    tf.multiply(input_y, tf.log( epsilon + predicted_y)) 
    + tf.multiply((1 - input_y), tf.log(1 + epsilon - predicted_y)))

Questions: Why do we need this epsilon?

In [None]:
# We will use optimizer instead of hand-made gradient descent.
optimizer_step = (
    tf.train.GradientDescentOptimizer(0.001, use_locking=True)
    .minimize(loss, var_list=weights))

In [None]:
# a bit of theano-like logic :)
# <compile function that takes X and y, returns log loss and updates weights>
def train_function(X, y, s=sess):    
    X = X.reshape((batch_size, 64))
    y = y.reshape((batch_size, 1))
    s.run(optimizer_step, {input_X: X, input_y: y})
    log_loss = s.run(loss, {input_X: X, input_y: y})
    return log_loss
    
predict_function = lambda X: sess.run(tf.round(predicted_y), {input_X: X})

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [None]:
sess.run(tf.global_variables_initializer())
batch_size=20

In [None]:
for i in range(10):
    batch_indixes = random.sample(
        range(0, X_train.shape[0]), batch_size)
    X_sample = X_train[batch_indixes]
    y_sample = y_train[batch_indixes]
    loss_i = train_function(X_sample, y_sample)
    print("loss at iter %i:%.4f" % (i, loss_i))
    print("train auc:", roc_auc_score(y_train, predict_function(X_train)))
    print("test auc:", roc_auc_score(y_test, predict_function(X_test)))

In [None]:
# Let's look at our model weights
print ("resulting weights:")
ax = sns.heatmap(weights.eval(session=sess).reshape([8, 8]))
plt.show()

Questions: why such weights?

## Example 6 - real-but-toy CNN in TF

#### Let's define some standart like arhitecture, like 
`conv-pool-conv-pool-dense-dense-everybody`

1) Convolutional Layer #1: Applies 32 5x5 filters (extracting 5x5-pixel subregions), with ReLU activation function

2) Pooling Layer #1: Performs max pooling with a 2x2 filter and stride of 2 (which specifies that pooled regions do not overlap)

3) Convolutional Layer #2: Applies 64 5x5 filters, with ReLU activation function

4) Pooling Layer #2: Again, performs max pooling with a 2x2 filter and stride of 2

5) Dense Layer #1: 1,024 neurons, with dropout regularization rate of 0.4 (probability of 0.4 that any given element will be dropped during training)

6) Dense Layer #2 (Logits Layer): 10 neurons, one for each digit target class (0–9). 

As we use TF for NN, let's use some of predefined layers.

** More details about standard layres here: https://www.tensorflow.org/tutorials/layers **

In [None]:
tf.logging.set_verbosity(tf.logging.INFO)
mnist = tf.contrib.learn.datasets.load_dataset("mnist")

In [None]:
# Load training and eval data
train_data = mnist.train.images # Returns np.array
train_labels = np.asarray(mnist.train.labels, dtype=np.int32)
eval_data = mnist.test.images # Returns np.array
eval_labels = np.asarray(mnist.test.labels, dtype=np.int32)

In [None]:
print('Train shape', train_data.shape, ', target shape', train_labels.shape)

### Data visualization

In [None]:
visible = train_data.reshape((train_data.shape[0], 28, 28))
fig, axes = plt.subplots(nrows=1, ncols=7, figsize=(20, 20))

for i, ax in enumerate(axes):
    ax.imshow(visible[i], cmap='gray')

In [None]:
# our model defenition (TF best practices)
def cnn_model_fn(features, labels, mode):
    """Model function for CNN."""
    # Input Layer
    input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

    # Convolutional Layer #1
    conv1 = tf.layers.conv2d(
        inputs=input_layer,
        filters=32,
        kernel_size=[5, 5],
        padding="same",
        activation=tf.nn.relu)

    # Pooling Layer #1
    pool1 = tf.layers.max_pooling2d(
        inputs=conv1, 
        pool_size=[2, 2], 
        strides=2)

    # Convolutional Layer #2 and Pooling Layer #2
    conv2 = tf.layers.conv2d(
        inputs=pool1,
        filters=64,
        kernel_size=[5, 5],
        padding="same",
        activation=tf.nn.relu)
    pool2 = tf.layers.max_pooling2d(
        inputs=conv2, 
        pool_size=[2, 2], 
        strides=2)

    # Dense Layer
    pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])
    dense = tf.layers.dense(
        inputs=pool2_flat, 
        units=1024, 
        activation=tf.nn.relu)
    dropout = tf.layers.dropout(
        inputs=dense, 
        rate=0.4, 
        training=mode == tf.estimator.ModeKeys.TRAIN)

    # Logits Layer
    logits = tf.layers.dense(inputs=dropout, units=10)

    predictions = {
        # Generate predictions (for PREDICT and EVAL mode)
        "classes": tf.argmax(input=logits, axis=1),
        # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
        # `logging_hook`.
        "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
    }

    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(
            mode=mode, predictions=predictions)

    # Calculate Loss (for both TRAIN and EVAL modes)
    loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

    # Configure the Training Op (for TRAIN mode)
    if mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = tf.train.GradientDescentOptimizer(
            learning_rate=0.001)
        train_op = optimizer.minimize(
            loss=loss,
            global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(
            mode=mode, loss=loss, train_op=train_op)

    # Add evaluation metrics (for EVAL mode)
    eval_metric_ops = {
        "accuracy": tf.metrics.accuracy(
            labels=labels, predictions=predictions["classes"])}
    return tf.estimator.EstimatorSpec(
        mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

More information about Estimators: 
- https://www.tensorflow.org/programmers_guide/estimators
- https://www.tensorflow.org/api_docs/python/tf/estimator/EstimatorSpec

In [None]:
# Create the Estimator
mnist_classifier = tf.estimator.Estimator(
    model_fn=cnn_model_fn, model_dir="/tmp/mnist_convnet_model")

In [None]:
hooks = None
if False:  # for console output
    tensors_to_log = {"probabilities": "softmax_tensor"}
    logging_hook = tf.train.LoggingTensorHook(
        tensors=tensors_to_log, every_n_iter=1000)
    hooks = [logging_hook]

In [None]:
%%time
# Train the model
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": train_data},  # data source
    y=train_labels,       # labels source
    batch_size=100,       # batch size
    num_epochs=None,      # data limit (if u want to train prespecified number of epochs)
    shuffle=True)         # shuffle flag
mnist_classifier.train(
    input_fn=train_input_fn,
    steps=2000,  # change here for better results/time, (~10min for 2k steps)
    hooks=hooks)

In [None]:
# Evaluate the model and print results
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": eval_data},
    y=eval_labels,
    num_epochs=1,
    shuffle=False)
eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)

## *Example 7 - tensorboard

TensorBoard is a suite of web applications for inspecting and understanding your TensorFlow runs and graphs.

Go to console and activate your python environment with tensorflow installed, then execute command:

`tensorboard -logdir /tmp/mnist_convnet_model`.

Now, go to `127.0.0.1:6006` to see the TensorBoard

More info: https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard

## *Example 8 - practise notes

In [None]:
NUM_EPOCHS = 10
for i in range(0, NUM_EPOCHS):
    (mnist_classifier
     .train(input_fn=train_input_fn, steps=2000, hooks=hooks)
     .evaluate(input_fn=eval_input_fn))

### If u want even more TF - https://www.tensorflow.org/api_docs/python/tf/contrib/learn/Experiment

** Обратная связь ** 
  * оцените <a href="https://goo.gl/forms/kYZuyAQLuwo8szce2"> семинар </a>
  * оставьте <a href="https://goo.gl/forms/zeZiu1fSgrpPGp6T2"> отзыв </a> о лекции