## Example: 'Hello World'

In [1]:
import tensorflow as tf

#### Constant

In [2]:
hello_constant = tf.constant('Hello World')

In [3]:
with tf.Session() as sess:
    output = sess.run(hello_constant)
    print(output)

b'Hello World'


#### In TensorFlow, data isn’t stored as integers, floats, or strings. These values are encapsulated in an object called a **tensor**.

##### Constant Vector

The values of a constant vector  (`tf.constant(array)`) never changes.

##### Session

Environment for running a graph

In [4]:
A = tf.constant(1234)
A

<tf.Tensor 'Const_1:0' shape=() dtype=int32>

In [5]:
B = tf.constant([123, 456, 789])
B

<tf.Tensor 'Const_2:0' shape=(3,) dtype=int32>

In [6]:
C = tf.constant([[123, 456, 789], [222, 333, 444]])
C

<tf.Tensor 'Const_3:0' shape=(2, 3) dtype=int32>

#### Placeholder
`tf.placeholder()` returns a tensor that gets its value from data passed to the `tf.session.run()` function, allowing you to set the input right before the session runs.

In [7]:
x = tf.placeholder(tf.string)

with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'Hello World'})
    print(output)

Hello World


In [8]:
x = tf.placeholder(tf.string)
y = tf.placeholder(tf.int32)
z = tf.placeholder(tf.float32)

with tf.Session() as sess:
    output = sess.run(x, feed_dict={x: 'Test String', y: 1, z: 0.54})
    print(output)

Test String


#### Quiz: Solution

In [9]:
def run():
    output = None
    x = tf.placeholder(tf.int32)

    with tf.Session() as sess:
        # Feed the x tensor 123
        output = sess.run(x, feed_dict={x: 123})

    return output

## TensorFlow Math

#### Addition
Take in two numbers, two tensors, or one of each, and returns their sum as a tensor.

In [10]:
x = tf.add(5,2) # 7

#### Subtraction and Multiplication
The `x` tensor will evaluate to `6`, because `10 - 4 = 6`. The `y` tensor will evaluate to `10`, because `2 * 5 = 10`. That was easy!

In [11]:
x = tf.subtract(10, 4) # 6
y = tf.multiply(2, 5) # 10

#### Converting types
Convert between types to make certain operators work together.

    tf.subtract(tf.constant(2.0),tf.constant(1))  # Fails with ValueError: Tensor conversion requested dtype float32 for Tensor with dtype int32:
    
That's because the constant `1` is an integer but the constant `2.0` is a floating point value and subtract expects them to match.

In cases like these, you can either make sure your data is all of the same type, or you can cast a value to another type. In this case, converting the `2.0` to an integer before subtracting, like so, will give the correct result:

In [12]:
tf.subtract(tf.cast(tf.constant(2.0), tf.int32), tf.constant(1)) # 1

<tf.Tensor 'Sub_1:0' shape=() dtype=int32>

#### Quiz: TF Math

In [13]:
import tensorflow as tf

x = tf.constant(10)
y = tf.constant(2)
z = tf.divide(x,y) - 1

with tf.Session() as sess:
    output = sess.run(z)
    print(output)

4.0


### TF Linear Function
        y = xW + b
`W` is a matrix of weights connecting two layers.

The output `y`, the input `x`, and the biases `b` are all vectors.

#### Variable
The `tf.Variable` class creates a tensor with an initial value that can be modified - similar to a normal Python variable.
This tensor stores its state in the **session**, so the state of the tensor needs to be initialized manually.
* `tf.global_variables_initializer()` - returns an operation that initializes all TF variables from the graph.

In [16]:
x = tf.Variable(5)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)

#### Initializing Weights
It is a good practice to initialize the weights (W) with **random numbers from a normal distribution**.

* **Randomizing** the weights helps the model from beoming stuck in the same place every time trained (Gradient Descent)
* **Normal Distribution** prevents any one weight overwhelming other weights. 

In [18]:
n_features = 120
n_labels = 5

# tensor with random values froma normal distribution,
# whose magnitude is no more than 2 std from the mean
weigts = tf.Variable(tf.truncated_normal((n_features, n_labels)))

In [193]:
# since the weights are already helping prevent the model from getting stuck,
# in practice we don't randomize the bias
bias = tf.Variable(tf.zeros(n_labels))

In [24]:
import tensorflow as tf

def get_weights(n_features, n_labels):
    """
    Return TensorFlow weights
    :param n_features: Number of features
    :param n_labels: Number of labels
    :return: TensorFlow weights
    """
    return tf.Variable(tf.truncated_normal((n_features, n_labels)))


def get_biases(n_labels):
    """
    Return TensorFlow bias
    :param n_labels: Number of labels
    :return: TensorFlow bias
    """
    return tf.Variable(tf.zeros(n_labels))


def linear(input, w, b):
    """
    Return linear function in TensorFlow
    :param input: TensorFlow input
    :param w: TensorFlow weights
    :param b: TensorFlow biases
    :return: TensorFlow linear function
    """
    # Linear Function (xW + b)
    # (input*w+b)
    return tf.add(tf.matmul(input, w), b)

In [195]:
from tensorflow.examples.tutorials.mnist import input_data

def mnist_features_labels(n_labels):
    """
    Gets the first <n> labels from the MNIST dataset
    :param n_labels: Number of labels to use
    :return: Tuple of feature list and label list
    """
    mnist_features = []
    mnist_labels = []

    mnist = input_data.read_data_sets('datasets/ud730/mnist', one_hot=True)

    # We're only looking at 10000 images
    for mnist_feature, mnist_label in zip(*mnist.train.next_batch(10000)):
        # Add features and labels if it's for the first <n>th labels
        if mnist_label[:n_labels].any():
            mnist_features.append(mnist_feature)
            mnist_labels.append(mnist_label[:n_labels])

    return mnist_features, mnist_labels


# Number of features (28*28 image is 784 features)
n_features = 784
# Number of labels
n_labels = 3

# Features and Labels
features = tf.placeholder(tf.float32)
labels = tf.placeholder(tf.float32)

# Weights and Biases
w = get_weights(n_features, n_labels)
b = get_biases(n_labels)

# Linear Function xW + b
logits = linear(features, w, b)

# Training data
train_features, train_labels = mnist_features_labels(n_labels)

with tf.Session() as session:
    # Initialize session variables
    session.run(tf.global_variables_initializer())
    # Softmax
    prediction = tf.nn.softmax(logits)

    # Cross entropy
    # This quantifies how far off the predictions were.
    # You'll learn more about this in future lessons.
    cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), reduction_indices=1)

    # Training loss
    # You'll learn more about this in future lessons.
    loss = tf.reduce_mean(cross_entropy)

    # Rate at which the weights are changed
    # You'll learn more about this in future lessons.
    learning_rate = 0.08

    # Gradient Descent
    # This is the method used to train the model
    # You'll learn more about this in future lessons.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)

    # Run optimizer and get loss
    _, l = session.run(
        [optimizer, loss],
        feed_dict={features: train_features, labels: train_labels})

# Print loss
print('Loss: {}'.format(l))

Extracting datasets/ud730/mnist/train-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/train-labels-idx1-ubyte.gz
Extracting datasets/ud730/mnist/t10k-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/t10k-labels-idx1-ubyte.gz
Loss: 7.926753997802734


### Softmax
The softmax function squashes it's inputs, typically called **logits** or **logit scores** to be:
* between 0 and 1
* normalize the outputs, so thay they all sum to 1

The output is therefor equivalent to a **categorical probability distribution**>

It's the perfect output activation for a network predicting multiple classes.

In [37]:
import tensorflow as tf

output = None
logit_data = [2.0, 1.0, 0.1]
logits = tf.placeholder(tf.float32)

# Calculate the softmax of the logits
softmax =  tf.nn.softmax(logits)   

with tf.Session() as sess:
    # Feed in the logit data
    output = sess.run(softmax, feed_dict={logits: logit_data})

output

array([ 0.65900117,  0.24243298,  0.09856589], dtype=float32)

### Cross Entropy

#### Reduce Sum and Nutural Log

In [42]:
# takes an array of numbers and sums them together
a = tf.reduce_sum([1,2,3,4,5]) # 15

In [45]:
# Takes the natural log of a number
b = tf.log(100.0) # 4.60517

In [51]:
import tensorflow as tf

softmax_data = [0.7, 0.2, 0.1]
one_hot_data = [1.0, 0.0, 0.0]

softmax = tf.placeholder(tf.float32)
one_hot = tf.placeholder(tf.float32)

# Print cross entropy from session
cross_entropy = -tf.reduce_sum(tf.multiply(one_hot, tf.log(softmax)))

with tf.Session() as sess:
    print(sess.run(cross_entropy,
                   feed_dict={softmax: softmax_data,
                              one_hot: one_hot_data}))

0.356675


### Mini-batching
A technique for training on subsets of the dataset instead of all the data at one time.
* don't have to store the entire dataset in memory
* computationally inefficient, since the loss can't be calculated simultaneously accross all samples

Usefull combined with SGD:
1. randomly shuffle the data at the start of each epoch
2. Create mini-batches
3. Each minibatch's weights are trained with gradient descent. 
    * Since batches are random, SGD is performed on each batch

In [57]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np

In [106]:
n_inputs = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST totcal classes (0-9 digits)

mnist = input_data.read_data_sets('datasets/ud730/mnist', one_hot=True)

Extracting datasets/ud730/mnist/train-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/train-labels-idx1-ubyte.gz
Extracting datasets/ud730/mnist/t10k-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/t10k-labels-idx1-ubyte.gz


In [189]:
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

In [190]:
weights = tf.Variable(tf.random_normal([n_inputs, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

#### Memory
Calculate the memory size of `train_features`, `train_labels`, `weights` and `bias` in bytes.
* Calculate memory required for stored data (ignore memory for overhead)

###### Single-precision floating-point format (float32)
A computer number format that occupies 4 bytes (32 bits) in computer memory

In [109]:
rows, columns = train_features.shape

In [110]:
train_features_memory = rows * columns * 4
print('train_features memory {} Kb'.format(train_features_memory * 0.001))

train_features memory 172480.0 Kb


In [111]:
rows, columns = train_labels.shape

train_labels_memory = rows * columns * 4
print('train_labels memory {} Kb'.format(train_labels_memory * 0.001))

train_labels memory 2200.0 Kb


In [112]:
rows, columns = (n_inputs, c_classes)

weights_memory = rows * columns * 4
print('weights_memory memory {} Kb'.format(weights_memory * 0.001))

weights_memory memory 31.36 Kb


In [113]:
bias_memory = c_classes * 4
print('bias_memory {} Kb'.format(bias_memory * 0.001))

bias_memory 0.04 Kb


#### TF Mini-batching
Divide data into batches
* Since batches can vary in size we take advantage of `tf.placeholder()`

The `None` dimension is a placeholcer for the `batch_size`. At runtime, TF will accept any batch size greater than `0`.

In [114]:
features = tf.placeholder(tf.float32, [None, n_inputs])
labels = tf.placeholder(tf.float32, [None, n_classes])

In [122]:
features = (50000, 400)
labels = (50000, 10)
batch_size = 128

batches = features[0] / batch_size

print('#batches: {}'.format(round(batches)))
print('last batch size: {}'.format((batches - int(batches)) * batch_size))

#batches: 391
last batch size: 80.0


In [125]:
# 4 Samples of features
example_features = [
    ['F11','F12','F13','F14'],
    ['F21','F22','F23','F24'],
    ['F31','F32','F33','F34'],
    ['F41','F42','F43','F44']]

# 4 Samples of labels
example_labels = [
    ['L11','L12'],
    ['L21','L22'],
    ['L31','L32'],
    ['L41','L42']]

In [186]:
import math

def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)
    output_batches = []
    sample_size = len(features)
    
    for start_index in range(0, sample_size, batch_size):
        end_index = start_index + batch_size
        batch = [features[start_index:end_index],
                 labels[start_index:end_index]]
        output_batches.append(batch)
    
    return output_batches
    
    

In [187]:
batches(3, example_features, example_labels)

[[[['F11', 'F12', 'F13', 'F14'],
   ['F21', 'F22', 'F23', 'F24'],
   ['F31', 'F32', 'F33', 'F34']],
  [['L11', 'L12'], ['L21', 'L22'], ['L31', 'L32']]],
 [[['F41', 'F42', 'F43', 'F44']], [['L41', 'L42']]]]

In [203]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)
batch_size = 20

# Import MNIST data
mnist = input_data.read_data_sets('datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))
# logits: xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# loss and optimizer
cost = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits=logits,
                                            labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# calculate accuracy
pred = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(pred, tf.float32))

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    # train optimizer on all batches
    for batch_features, batch_labels in batches(batch_size, train_features, train_labels):
        sess.run(optimizer,
                 feed_dict={features: batch_features,
                            labels: batch_labels})
        
    # calc accuracy for test dataset
    test_accuracy = sess.run(accuracy,
                             feed_dict={features: test_features,
                                        labels: test_labels})


print('Test Accuracy: {}'.format(test_accuracy))

Extracting datasets/ud730/mnist/train-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/train-labels-idx1-ubyte.gz
Extracting datasets/ud730/mnist/t10k-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/t10k-labels-idx1-ubyte.gz
Test Accuracy: 0.14920000731945038


### Epochs
A single forward and backward pass of the entire dateset.
* increase accuracy of the model without requiring more data

In [206]:
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np

n_input = 784
n_classes = 10
mnist = input_data.read_data_sets('datasets/ud70/mnist', one_hot=True)

train_features = mnist.train.images
valid_features = mnist.validation.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
valid_labels = mnist.validation.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

Extracting datasets/ud70/mnist/train-images-idx3-ubyte.gz
Extracting datasets/ud70/mnist/train-labels-idx1-ubyte.gz
Extracting datasets/ud70/mnist/t10k-images-idx3-ubyte.gz
Extracting datasets/ud70/mnist/t10k-labels-idx1-ubyte.gz


In [221]:
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

logits = tf.add(tf.matmul(features, weights), bias)

learning_rate = tf.placeholder(tf.float32)
cost = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits=logits,
                                            labels=labels))

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

pred = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(pred, tf.float32))

init = tf.global_variables_initializer()

batch_size = 128
epochs = 10
learn_rate = 0.001

train_batches = batches(batch_size, train_features, train_labels)

def print_stats(epoch, sess, last_features, last_labels):
    current_cost = sess.run(cost,
                            feed_dict={
                                features: last_features,
                                labels: last_labels
                            })
    
    valid_accuracy = sess.run(accuracy,
                              feed_dict={
                                  features: valid_features,
                                  labels: valid_labels
                              })
    print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format(
        epoch,
        current_cost,
        valid_accuracy))

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(epochs):
        for batch_features, batch_labels in train_batches:
            train_feed_dict = {
                features: batch_features,
                labels: batch_labels,
                learning_rate: learn_rate
            }
            
            sess.run(optimizer, feed_dict=train_feed_dict)
        print_stats(epoch, sess, batch_features, batch_labels)
    
    test_accuracy = sess.run(accuracy, 
                             feed_dict={
                                 features: test_features,
                                 labels: test_labels
                             })

print('Test Accurracy: {}'.format(test_accuracy))

Epoch: 0    - Cost: 14.0     Valid Accuracy: 0.0768
Epoch: 1    - Cost: 12.6     Valid Accuracy: 0.085
Epoch: 2    - Cost: 11.6     Valid Accuracy: 0.0968
Epoch: 3    - Cost: 10.8     Valid Accuracy: 0.109
Epoch: 4    - Cost: 10.1     Valid Accuracy: 0.124
Epoch: 5    - Cost: 9.5      Valid Accuracy: 0.138
Epoch: 6    - Cost: 8.95     Valid Accuracy: 0.152
Epoch: 7    - Cost: 8.46     Valid Accuracy: 0.168
Epoch: 8    - Cost: 8.02     Valid Accuracy: 0.182
Epoch: 9    - Cost: 7.63     Valid Accuracy: 0.199
Test Accurracy: 0.1964000016450882
