# TensorFlow

*TensorFlow* is a powerful open source software library for numerical computation, particularly well suited and fine-tuned for large-scale Machine Learning. Its basic principle is simple: you first define in Python a graph of computations to perform, and then TensorFlow takes that graph and runs it efficiently using optimized C++ code. Most importantly, it is possible to break up the graph into several chunks and run them in parallel across multiple CPUs or GPUs.

In [2]:
import tensorflow as tf

In [5]:
!conda list

# packages in environment at C:\Users\hheim\Anaconda3\envs\TensorFlow:
#
# Name                    Version                   Build  Channel
_tflow_select             2.1.0                       gpu    anaconda
absl-py                   0.9.0                    py36_0  
argon2-cffi               20.1.0                   pypi_0    pypi
astor                     0.8.1                    py36_0  
attrs                     19.3.0                   pypi_0    pypi
backcall                  0.2.0                      py_0  
blas                      1.0                         mkl  
bleach                    3.1.5                    pypi_0    pypi
ca-certificates           2020.6.24                     0  
certifi                   2020.6.20                py36_0  
cffi                      1.14.2                   pypi_0    pypi
colorama                  0.4.3                      py_0  
cudatoolkit               9.0                           1    anaconda
decorator                 4.4.2     



h5py                      2.7.1            py36he54a1c3_0    anaconda
hdf5                      1.10.1           vc14hb361328_0  [vc14]  anaconda
icc_rt                    2019.0.0             h0cc432a_1    anaconda
importlib-metadata        1.7.0                    py36_0  
intel-openmp              2020.1                      216  
ipykernel                 5.3.4            py36h5ca1d4c_0  
ipython                   7.16.1           py36h5ca1d4c_0  
ipython_genutils          0.2.0                    py36_0  
ipywidgets                7.5.1                    pypi_0    pypi
jedi                      0.17.2                   py36_0  
jinja2                    2.11.2                   pypi_0    pypi
jsonschema                3.2.0                    pypi_0    pypi
jupyter                   1.0.0                    pypi_0    pypi
jupyter-console           6.1.0                    pypi_0    pypi
jupyter_client            6.1.6                      py_0  
jupyter_core              4.6.3   

In [7]:
x = tf.Variable(3, name="x")
y = tf.Variable(4, name="y")
f = x*x*y + y + 2

The most important thing to understand is that this code does not actually perform any computation, even though it looks like it does (especially the last line). It just creates a computation graph. In fact, even the variables are not initialized yet. To evaluate this graph, you need to open a TensorFlow session and use it to initialize the variables and evaluate.

A variable starts its life when its initializer is run, and it ends when the session is closed.

In [13]:
with tf.Session() as sess:
    sess.run(x.initializer)
    sess.run(y.initializer)
    res = sess.run(f)
    print(res)
    
tf.reset_default_graph()

42


Any instance of a node that is created is added to the default graph. If you want separate graphs do the following:

**TIP**

In Jupyter (or in a Python shell), it is common to run the same commands more than once while you are experimenting. As a
result, you may end up with a default graph containing many duplicate nodes. One solution is to restart the Jupyter kernel (or the Python shell), but a more convenient solution is to just reset the default graph by running tf.reset_default_graph().

In [37]:
graph_1 = tf.Graph()

with graph_1.as_default():
    x1 = tf.Variable(1)

## Linear Regression with TensorFlow

In the Python API tensors are simply represented by NumPy ndarrays, you can also use `transpose()`, `matmul()`, and `matrix_inverse()`.

In [83]:
import numpy as np

np.random.seed(1234)

X=np.random.rand(1000,3)
B=np.array([1,2,3]).reshape(3,1)
Y=X@B+np.random.randn(1000,1)*5
B_hat = np.linalg.inv(X.transpose()@X)@X.transpose()@Y


Xt = tf.constant(X, dtype=tf.float32, name='X') #Source operation
Yt = tf.constant(Y, dtype=tf.float32, name='Y') #Source operation
Xt_T = tf.transpose(Xt)
Betas = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(Xt_T,Xt)),Xt_T),Yt)

with tf.Session() as LR_sess:
    Betas_LR = Betas.eval()
    Betas_LR = Betas_LR.reshape(3,1)
    print("numpy: ")
    print(B_hat)
    print('tensorflow: ')
    print(Betas_LR)

numpy: 
[[1.36967319]
 [1.61575796]
 [3.21015672]]
tensorflow: 
[[1.3696747]
 [1.615756 ]
 [3.210157 ]]


## Batch Gradient Descent

1) The `random_uniform()` function creates a node in the graph that will generate a tensor containing random values, given its shape and value range, much like NumPy’s `rand()` function.

2) The `assign()` function creates a node that will assign a new value to a variable. In this case, it implements the Batch Gradient Descent step θ(next step) = θ – η∇θMSE(θ).

3) The main loop executes the training step over and over again (n_epochs times), and every 100 iterations it prints out the current Mean Squared Error (mse). You should see the MSE go down at every iteration.

In [92]:
n_epochs = 1000
learning_rate = 0.01

theta = tf.Variable(tf.random_uniform([3, 1], minval=-5.0, maxval=5.0), name="theta") #Initialize random theta
y_pred = tf.matmul(Xt, theta, name="predictions") #Predictions to calculate MSE
error = y_pred - Yt
mse = tf.reduce_mean(tf.square(error), name="mse") #MSE
gradients = 2/1000 * tf.matmul(Xt_T, error) #Gradient Descent (based on LR Formula)
#gradients = tf.gradients(mse, [theta])[0] --> alternatively use this for automatic partial derivatives (mse/theta)
training_op = tf.assign(theta, theta - learning_rate * gradients) #Assign new value to theta

init = tf.global_variables_initializer() #Initialize all variables when called
saver = tf.train.Saver() #Saver node at the end of the construction phase (after all variable nodes are created)

with tf.Session() as sess:
    sess.run(init) #Initialize variables
    
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
            save_path = saver.save(sess, "/temp/LR_model.ckpt") #Save model
        sess.run(training_op) #What change the value of theta
    best_theta = theta.eval()

Epoch 0 MSE = 29.205965
Epoch 100 MSE = 26.486671
Epoch 200 MSE = 26.17316
Epoch 300 MSE = 26.001715
Epoch 400 MSE = 25.879038
Epoch 500 MSE = 25.789934
Epoch 600 MSE = 25.725168
Epoch 700 MSE = 25.678078
Epoch 800 MSE = 25.643843
Epoch 900 MSE = 25.618942


In [94]:
with tf.Session() as sess:
    saver.restore(sess, "/temp/LR_model.ckpt") #Restore Model

INFO:tensorflow:Restoring parameters from /temp/LR_model.ckpt


## Artificial Neural Networks (ANNs)

### Training Linear Threshold Units (LTUs)

The Perceptron is fed one training instance at a time, and for each instance it makes its predictions. For every output neuron that produced a wrong prediction, it reinforces the connection weights from the inputs that would have contributed to the correct prediction.

### Backpropagation Algorithm

Gradient Descent using reverse-auto diff.

For each training instance, the algorithm feeds it to the network and computes the output of every neuron in each consecutive layer (this is the forward pass, just like when making predictions). Then it measures the network’s output error (i.e., the difference between the desired output and the actual output of the network), and it computes how much each neuron in the last hidden layer contributed to each output neuron’s error. It then proceeds to measure how much of these error contributions came from each neuron in the previous hidden layer — and so on until the algorithm reaches the input layer. This reverse pass
efficiently measures the error gradient across all the connection weights in the network by propagating the error gradient backward in the network.

In order for this algorithm to work properly, the authors made a key change to the MLP’s architecture: they replaced the step function with the logistic function (stepwise has no gradient).

In [10]:
#Simplest Case (One neuron)
import tensorflow as tf
import numpy as np
from tensorflow import keras

model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
model.compile(optimizer='sgd', loss='mean_squared_error') #stochastic gradient descent

xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)

model.fit(xs, ys, epochs=500)

model.predict([10.0])

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78

Epoch 103/500
Epoch 104/500
Epoch 105/500
Epoch 106/500
Epoch 107/500
Epoch 108/500
Epoch 109/500
Epoch 110/500
Epoch 111/500
Epoch 112/500
Epoch 113/500
Epoch 114/500
Epoch 115/500
Epoch 116/500
Epoch 117/500
Epoch 118/500
Epoch 119/500
Epoch 120/500
Epoch 121/500
Epoch 122/500
Epoch 123/500
Epoch 124/500
Epoch 125/500
Epoch 126/500
Epoch 127/500
Epoch 128/500
Epoch 129/500
Epoch 130/500
Epoch 131/500
Epoch 132/500
Epoch 133/500
Epoch 134/500
Epoch 135/500
Epoch 136/500
Epoch 137/500
Epoch 138/500
Epoch 139/500
Epoch 140/500
Epoch 141/500
Epoch 142/500
Epoch 143/500
Epoch 144/500
Epoch 145/500
Epoch 146/500
Epoch 147/500
Epoch 148/500
Epoch 149/500
Epoch 150/500
Epoch 151/500
Epoch 152/500
Epoch 153/500
Epoch 154/500
Epoch 155/500
Epoch 156/500
Epoch 157/500
Epoch 158/500
Epoch 159/500
Epoch 160/500
Epoch 161/500
Epoch 162/500
Epoch 163/500
Epoch 164/500
Epoch 165/500
Epoch 166/500
Epoch 167/500
Epoch 168/500
Epoch 169/500
Epoch 170/500
Epoch 171/500
Epoch 172/500
Epoch 173/500
Epoch 

Epoch 203/500
Epoch 204/500
Epoch 205/500
Epoch 206/500
Epoch 207/500
Epoch 208/500
Epoch 209/500
Epoch 210/500
Epoch 211/500
Epoch 212/500
Epoch 213/500
Epoch 214/500
Epoch 215/500
Epoch 216/500
Epoch 217/500
Epoch 218/500
Epoch 219/500
Epoch 220/500
Epoch 221/500
Epoch 222/500
Epoch 223/500
Epoch 224/500
Epoch 225/500
Epoch 226/500
Epoch 227/500
Epoch 228/500
Epoch 229/500
Epoch 230/500
Epoch 231/500
Epoch 232/500
Epoch 233/500
Epoch 234/500
Epoch 235/500
Epoch 236/500
Epoch 237/500
Epoch 238/500
Epoch 239/500
Epoch 240/500
Epoch 241/500
Epoch 242/500
Epoch 243/500
Epoch 244/500
Epoch 245/500
Epoch 246/500
Epoch 247/500
Epoch 248/500
Epoch 249/500
Epoch 250/500
Epoch 251/500
Epoch 252/500
Epoch 253/500
Epoch 254/500
Epoch 255/500
Epoch 256/500
Epoch 257/500
Epoch 258/500
Epoch 259/500
Epoch 260/500
Epoch 261/500
Epoch 262/500
Epoch 263/500
Epoch 264/500
Epoch 265/500
Epoch 266/500
Epoch 267/500
Epoch 268/500
Epoch 269/500
Epoch 270/500
Epoch 271/500
Epoch 272/500
Epoch 273/500
Epoch 

Epoch 304/500
Epoch 305/500
Epoch 306/500
Epoch 307/500
Epoch 308/500
Epoch 309/500
Epoch 310/500
Epoch 311/500
Epoch 312/500
Epoch 313/500
Epoch 314/500
Epoch 315/500
Epoch 316/500
Epoch 317/500
Epoch 318/500
Epoch 319/500
Epoch 320/500
Epoch 321/500
Epoch 322/500
Epoch 323/500
Epoch 324/500
Epoch 325/500
Epoch 326/500
Epoch 327/500
Epoch 328/500
Epoch 329/500
Epoch 330/500
Epoch 331/500
Epoch 332/500
Epoch 333/500
Epoch 334/500
Epoch 335/500
Epoch 336/500
Epoch 337/500
Epoch 338/500
Epoch 339/500
Epoch 340/500
Epoch 341/500
Epoch 342/500
Epoch 343/500
Epoch 344/500
Epoch 345/500
Epoch 346/500
Epoch 347/500
Epoch 348/500
Epoch 349/500
Epoch 350/500
Epoch 351/500
Epoch 352/500
Epoch 353/500
Epoch 354/500
Epoch 355/500
Epoch 356/500
Epoch 357/500
Epoch 358/500
Epoch 359/500
Epoch 360/500
Epoch 361/500
Epoch 362/500
Epoch 363/500
Epoch 364/500
Epoch 365/500
Epoch 366/500
Epoch 367/500
Epoch 368/500
Epoch 369/500
Epoch 370/500
Epoch 371/500
Epoch 372/500
Epoch 373/500
Epoch 374/500
Epoch 

Epoch 399/500
Epoch 400/500
Epoch 401/500
Epoch 402/500
Epoch 403/500
Epoch 404/500
Epoch 405/500
Epoch 406/500
Epoch 407/500
Epoch 408/500
Epoch 409/500
Epoch 410/500
Epoch 411/500
Epoch 412/500
Epoch 413/500
Epoch 414/500
Epoch 415/500
Epoch 416/500
Epoch 417/500
Epoch 418/500
Epoch 419/500
Epoch 420/500
Epoch 421/500
Epoch 422/500
Epoch 423/500
Epoch 424/500
Epoch 425/500
Epoch 426/500
Epoch 427/500
Epoch 428/500
Epoch 429/500
Epoch 430/500
Epoch 431/500
Epoch 432/500
Epoch 433/500
Epoch 434/500
Epoch 435/500
Epoch 436/500
Epoch 437/500
Epoch 438/500
Epoch 439/500
Epoch 440/500
Epoch 441/500
Epoch 442/500
Epoch 443/500
Epoch 444/500
Epoch 445/500
Epoch 446/500
Epoch 447/500
Epoch 448/500
Epoch 449/500
Epoch 450/500
Epoch 451/500
Epoch 452/500
Epoch 453/500
Epoch 454/500
Epoch 455/500
Epoch 456/500
Epoch 457/500
Epoch 458/500
Epoch 459/500
Epoch 460/500
Epoch 461/500
Epoch 462/500
Epoch 463/500
Epoch 464/500
Epoch 465/500
Epoch 466/500
Epoch 467/500
Epoch 468/500
Epoch 469/500
Epoch 

Epoch 495/500
Epoch 496/500
Epoch 497/500
Epoch 498/500
Epoch 499/500
Epoch 500/500


array([[18.98699]], dtype=float32)

In [11]:
import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist

class myCallback(tf.keras.callbacks.Callback): #Callback to Stop epochs if reach desired accuracy
  def on_epoch_end(self, epoch, logs={}): #Gets called by callback class when epoch ends
    if(logs.get('loss')<0.4): #Information of current state of training
      print("\nReached 60% accuracy so cancelling training!")
      self.model.stop_training = True

#Standardized data
callbacks = myCallback()
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images/255.0
test_images=test_images/255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
model.fit(training_images, training_labels, epochs=5, callbacks=[callbacks])

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
Epoch 1/5
Epoch 2/5
Reached 60% accuracy so cancelling training!


<tensorflow.python.keras.callbacks.History at 0x2758b5e4748>

### Convolution and Pooling

In image classification you may have a lot of unnecessary features that don't really contribute much to the classification problem. You can do a convolution which means applying a filter to the pixels and converting them to another value and pooling which takes a 4X4 for example, takes the biggest value in a 2X2 grid and puts that value in a single pixel. 

In [12]:
#When loading the dataset you need a single tensor >>> .reshape(n_images,28,28,1)

model2 = tf.keras.models.Sequential([tf.keras.layers.Conv2D(64, (3,3), activation='relu',input_shape=(28,28,1)),
                                     #64 filters, 3X3, 28X28, 1 color depth (grayscale)
                                     tf.keras.layers.MaxPooling2D(2,2),
                                     #Max in 2X2 survives
                                     tf.keras.layers.Conv2D(64, (3,3), activation='relu',input_shape=(28,28,1)),
                                     tf.keras.layers.MaxPooling2D(2,2),
                                     #Another layer for learning
                                     tf.keras.layers.Flatten(),
                                     tf.keras.layers.Dense(512, activation=tf.nn.relu),
                                     tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

#You lose pixels to be able to construct the 3X3 grid --> you can't take corners
model2.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 64)        640       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 64)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1600)              0         
_________________________________________________________________
dense_5 (Dense)              (None, 512)               819712    
_________________________________________________________________
dense_6 (Dense)              (None, 10)                5130      
Total para

### Image Generator

Computer generated images to train a model for horse and human classifiers.

In [29]:
import os
import zipfile

local_zip = 'C:\\Users\\hheim\\Documents\\TensorFlow\\tmp\\horse-or-human.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('C:\\Users\\hheim\\Documents\\TensorFlow\\tmp\\horse-or-human')
zip_ref.close()

In [35]:
# Directory with our training horse pictures
train_horse_dir = 'C:\\Users\\hheim\\Documents\\TensorFlow\\tmp\\horse-or-human\\horses'

# Directory with our training human pictures
train_human_dir = 'C:\\Users\\hheim\\Documents\\TensorFlow\\tmp\\horse-or-human\\humans'

train_horse_names = os.listdir(train_horse_dir)
print(train_horse_names[:10])

train_human_names = os.listdir(train_human_dir)
print(train_human_names[:10])

['horse01-0.png', 'horse01-1.png', 'horse01-2.png', 'horse01-3.png', 'horse01-4.png', 'horse01-5.png', 'horse01-6.png', 'horse01-7.png', 'horse01-8.png', 'horse01-9.png']
['human01-00.png', 'human01-01.png', 'human01-02.png', 'human01-03.png', 'human01-04.png', 'human01-05.png', 'human01-06.png', 'human01-07.png', 'human01-08.png', 'human01-09.png']


Note that because we are facing a two-class classification problem, i.e. a binary classification problem, we will end our network with a sigmoid activation, so that the output of our network will be a single scalar between 0 and 1, encoding the probability that the current image is class 1 (as opposed to class 0).

In [36]:
model_hh = tf.keras.models.Sequential([
    # Note the input shape is the desired size of the image 300x300 with 3 bytes color
    # This is the first convolution
    tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(300, 300, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    # The second convolution
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The third convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The fourth convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # The fifth convolution
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    # Flatten the results to feed into a DNN
    tf.keras.layers.Flatten(),
    # 512 neuron hidden layer
    tf.keras.layers.Dense(512, activation='relu'),
    # Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('horses') and 1 for the other ('humans')
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model_hh.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_2 (Conv2D)            (None, 298, 298, 16)      448       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 149, 149, 16)      0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 147, 147, 32)      4640      
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 73, 73, 32)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 71, 71, 64)        18496     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 35, 35, 64)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 33, 33, 64)        36928     
__________

In [41]:
from tensorflow.keras.optimizers import RMSprop

model_hh.compile(loss='binary_crossentropy', #loss function
              optimizer=RMSprop(lr=0.001), #Adam would have worked too
              metrics=['accuracy'])

Let's set up data generators that will read pictures in our source folders, convert them to float32 tensors, and feed them (with their labels) to our network. We'll have one generator for the training images and one for the validation images. Our generators will yield batches of images of size 300x300 and their labels (binary).

The `ImageDataGenerator` class allows you to instantiate generators of augmented image batches (and their labels) via `.flow(data, labels)` or `.flow_from_directory(directory)`. These generators can then be used with the Keras model methods that accept data generators as inputs: `fit`, `evaluate_generator`, and `predict_generator`.

You have to go to the main directory, not to specific directories (i.e. horses and humans); the labelling is done as the name of the folder magically. 

In [42]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1/255)

# Flow training images in batches of 128 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
        'C:\\Users\\hheim\\Documents\\TensorFlow\\tmp\\horse-or-human',  # This is the source directory for training images
        target_size=(300, 300),  # All images will be resized to 150x150
        batch_size=128,
        # Since we use binary_crossentropy loss, we need binary labels
        class_mode='binary')

Found 1027 images belonging to 2 classes.


In [43]:
history = model_hh.fit(
      train_generator, #streams images from training directory
      steps_per_epoch=8, #128 images at a time
      epochs=15,
      verbose=1)

#You can add a validation data from another ImageDataGenerator to the parameter in .fit, it will then include accuracy
#in the training and testing sets

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [2]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from sklearn.preprocessing import StandardScaler
import numpy as np

#MNIST DATASET
(X_train, Y_train), (X_test, Y_test) = mnist.load_data() #28X28 Images (784 Features)

X_train_clean = []
X_test_clean = []

for image in X_train:
    X_train_clean.append(image.flatten().reshape(784))
X_train_clean = np.array(X_train_clean)

for image in X_test:
    X_test_clean.append(image.flatten().reshape(784))
X_test_clean = np.array(X_test_clean)
    
Y_train_clean = Y_train.reshape(60000,1).astype(int)
Y_test_clean = Y_test.reshape(10000,1).astype(int)

X_tr_std = StandardScaler().fit(X_train_clean).transform(X_train_clean.astype(float))
X_te_std = StandardScaler().fit(X_test_clean).transform(X_test_clean.astype(float))

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [3]:
Y_train_clean.dtype

dtype('int32')

In [163]:
feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(X_tr_std)

dnn_clf = tf.contrib.learn.DNNClassifier(hidden_units=[300, 100], #300 hidden layer --> 100 hidden layer Deep Neural Network
                                         n_classes=10, #soft_max (probability to each class)
                                         feature_columns=feature_columns)

dnn_clf.fit(x=X_tr_std, y=Y_train_clean, batch_size=50, steps=40000)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_task_type': None, '_task_id': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000231A2CAA8D0>, '_master': '', '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_environment': 'local', '_is_chief': True, '_evaluation_master': '', '_train_distribute': None, '_eval_distribute': None, '_device_fn': None, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_secs': 600, '_log_step_count_steps': 100, '_protocol': None, '_session_config': None, '_save_checkpoints_steps': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': 'C:\\Users\\hheim\\AppData\\Local\\Temp\\tmpi0k_9549'}
Instructions for updating:
When switching to tf.estimator.Estimator, use tf.estimator.EstimatorSpec. You can use the `estimator_spec` method to create an equivalent one.
INFO:tensorflow:Crea

INFO:tensorflow:loss = 0.009592747, step = 6401 (0.656 sec)
INFO:tensorflow:global_step/sec: 153.082
INFO:tensorflow:loss = 0.004474271, step = 6501 (0.653 sec)
INFO:tensorflow:global_step/sec: 149.875
INFO:tensorflow:loss = 0.013879295, step = 6601 (0.668 sec)
INFO:tensorflow:global_step/sec: 152.896
INFO:tensorflow:loss = 0.0016356269, step = 6701 (0.652 sec)
INFO:tensorflow:global_step/sec: 151.92
INFO:tensorflow:loss = 0.008721442, step = 6801 (0.659 sec)
INFO:tensorflow:global_step/sec: 151.232
INFO:tensorflow:loss = 0.0149595365, step = 6901 (0.661 sec)
INFO:tensorflow:global_step/sec: 148.323
INFO:tensorflow:loss = 0.028747296, step = 7001 (0.675 sec)
INFO:tensorflow:global_step/sec: 132.099
INFO:tensorflow:loss = 0.010895155, step = 7101 (0.755 sec)
INFO:tensorflow:global_step/sec: 150.56
INFO:tensorflow:loss = 0.0010335713, step = 7201 (0.666 sec)
INFO:tensorflow:global_step/sec: 144.471
INFO:tensorflow:loss = 0.0017673171, step = 7301 (0.691 sec)
INFO:tensorflow:global_step/s

INFO:tensorflow:global_step/sec: 154.249
INFO:tensorflow:loss = 0.001320052, step = 14501 (0.651 sec)
INFO:tensorflow:global_step/sec: 148.553
INFO:tensorflow:loss = 5.713666e-05, step = 14601 (0.672 sec)
INFO:tensorflow:global_step/sec: 150.316
INFO:tensorflow:loss = 0.00023745406, step = 14701 (0.664 sec)
INFO:tensorflow:global_step/sec: 154.03
INFO:tensorflow:loss = 0.0008790253, step = 14801 (0.649 sec)
INFO:tensorflow:global_step/sec: 130.979
INFO:tensorflow:loss = 0.00087796716, step = 14901 (0.775 sec)
INFO:tensorflow:global_step/sec: 123.482
INFO:tensorflow:loss = 0.00055776484, step = 15001 (0.805 sec)
INFO:tensorflow:global_step/sec: 161.713
INFO:tensorflow:loss = 0.00081492757, step = 15101 (0.609 sec)
INFO:tensorflow:global_step/sec: 164.113
INFO:tensorflow:loss = 0.0023306936, step = 15201 (0.610 sec)
INFO:tensorflow:global_step/sec: 154.011
INFO:tensorflow:loss = 0.00077050464, step = 15301 (0.651 sec)
INFO:tensorflow:global_step/sec: 152.622
INFO:tensorflow:loss = 0.0010

INFO:tensorflow:loss = 0.000885486, step = 22401 (0.726 sec)
INFO:tensorflow:global_step/sec: 141.614
INFO:tensorflow:loss = 0.00027061233, step = 22501 (0.704 sec)
INFO:tensorflow:global_step/sec: 149.653
INFO:tensorflow:loss = 0.0002888206, step = 22601 (0.667 sec)
INFO:tensorflow:global_step/sec: 151.241
INFO:tensorflow:loss = 0.00029851747, step = 22701 (0.662 sec)
INFO:tensorflow:global_step/sec: 149.811
INFO:tensorflow:loss = 0.00061502727, step = 22801 (0.669 sec)
INFO:tensorflow:global_step/sec: 149.43
INFO:tensorflow:loss = 0.00074131414, step = 22901 (0.667 sec)
INFO:tensorflow:global_step/sec: 151.232
INFO:tensorflow:loss = 5.6412864e-05, step = 23001 (0.665 sec)
INFO:tensorflow:global_step/sec: 150.328
INFO:tensorflow:loss = 0.0005254321, step = 23101 (0.666 sec)
INFO:tensorflow:global_step/sec: 148.54
INFO:tensorflow:loss = 0.00014283547, step = 23201 (0.672 sec)
INFO:tensorflow:global_step/sec: 149.876
INFO:tensorflow:loss = 0.0018695235, step = 23301 (0.666 sec)
INFO:ten

INFO:tensorflow:global_step/sec: 147.663
INFO:tensorflow:loss = 7.9297635e-05, step = 30401 (0.676 sec)
INFO:tensorflow:global_step/sec: 145.321
INFO:tensorflow:loss = 0.00082235935, step = 30501 (0.685 sec)
INFO:tensorflow:global_step/sec: 152.653
INFO:tensorflow:loss = 0.000105793246, step = 30601 (0.657 sec)
INFO:tensorflow:global_step/sec: 150.317
INFO:tensorflow:loss = 0.0010191824, step = 30701 (0.663 sec)
INFO:tensorflow:global_step/sec: 151.014
INFO:tensorflow:loss = 0.00022966268, step = 30801 (0.662 sec)
INFO:tensorflow:global_step/sec: 153.367
INFO:tensorflow:loss = 0.00072786113, step = 30901 (0.652 sec)
INFO:tensorflow:global_step/sec: 150.1
INFO:tensorflow:loss = 0.00037969253, step = 31001 (0.665 sec)
INFO:tensorflow:global_step/sec: 153.081
INFO:tensorflow:loss = 9.659481e-05, step = 31101 (0.653 sec)
INFO:tensorflow:global_step/sec: 149.885
INFO:tensorflow:loss = 0.0011565719, step = 31201 (0.669 sec)
INFO:tensorflow:global_step/sec: 151.453
INFO:tensorflow:loss = 0.00

INFO:tensorflow:loss = 0.00052631187, step = 38301 (0.720 sec)
INFO:tensorflow:global_step/sec: 135.315
INFO:tensorflow:loss = 0.00035034603, step = 38401 (0.739 sec)
INFO:tensorflow:global_step/sec: 136.79
INFO:tensorflow:loss = 8.1963015e-05, step = 38501 (0.728 sec)
INFO:tensorflow:global_step/sec: 137.919
INFO:tensorflow:loss = 0.00034950898, step = 38601 (0.725 sec)
INFO:tensorflow:global_step/sec: 138.866
INFO:tensorflow:loss = 3.162105e-05, step = 38701 (0.725 sec)
INFO:tensorflow:global_step/sec: 138.118
INFO:tensorflow:loss = 0.0006412005, step = 38801 (0.728 sec)
INFO:tensorflow:global_step/sec: 140.43
INFO:tensorflow:loss = 0.0007560246, step = 38901 (0.705 sec)
INFO:tensorflow:global_step/sec: 143.445
INFO:tensorflow:loss = 0.00034432427, step = 39001 (0.699 sec)
INFO:tensorflow:global_step/sec: 137.351
INFO:tensorflow:loss = 0.00018933148, step = 39101 (0.730 sec)
INFO:tensorflow:global_step/sec: 134.408
INFO:tensorflow:loss = 0.00025392688, step = 39201 (0.740 sec)
INFO:t

DNNClassifier(params={'head': <tensorflow.contrib.learn.python.learn.estimators.head._MultiClassHead object at 0x00000231A2DD8828>, 'hidden_units': [300, 100], 'feature_columns': (_RealValuedColumn(column_name='', dimension=784, default_value=None, dtype=tf.float64, normalizer=None),), 'optimizer': None, 'activation_fn': <function relu at 0x0000023187437950>, 'dropout': None, 'gradient_clip_norm': None, 'embedding_lr_multipliers': None, 'input_layer_min_slice_size': None})

In [164]:
from sklearn.metrics import accuracy_score

y_pred = list(dnn_clf.predict(X_te_std))
accuracy_score(Y_test_clean, y_pred)

Instructions for updating:
Please switch to predict_classes, or set `outputs` argument.
Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
  est = Estimator(...) -> est = SKCompat(Estimator(...))
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\hheim\AppData\Local\Temp\tmpi0k_9549\model.ckpt-40000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


0.9785

### Training a DNN Using Plain TensorFlow

If you want more control over the architecture of the network, you may prefer to use TensorFlow’s lowerlevel Python API. The first step is the construction phase, building the TensorFlow graph. The second step is the execution phase, where you actually run the graph to train the model.

**Sidenote**: we will need to use placeholder nodes. These nodes are special because they don’t actually perform any computation, they just output the data you tell them to output at runtime. They are typically used to pass the training data to
TensorFlow during training. If you don’t specify a value at runtime for a placeholder, you get an exception.

**Sidenote 2**: `name_scope` will add scope as a prefix to all operations and `variable_scope` will add scope as a prefix to all variables and operations. 

In [1]:
import tensorflow as tf
import numpy as np

n_inputs = 28*28 # MNIST
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X") #We don't know the shape of mini-batch
y = tf.placeholder(tf.int64, shape=(None), name="y") #Same as above

def neuron_layer(X, n_neurons, name, activation=None): #Create the layers of neurons
    with tf.name_scope(name):
        n_inputs = int(X.shape[1])
        stddev = 2 / np.sqrt(n_inputs) #For init var
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev) #For Weights matrix, randomly initialized from a Gauss with std
        W = tf.Variable(init, name="weights") #Weights Matrix
        b = tf.Variable(tf.zeros([n_neurons]), name="biases") #One bias parameter per neuron
        z = tf.matmul(X, W) + b #this is the output or the graph for TensorFlow
    if activation=="relu":
        return tf.nn.relu(z)
    else:
        return z
    
#The first hidden layer takes X as its input. The second takes the output of the first hidden layer as
#its input. And finally, the output layer takes the output of the second hidden layer as its input.

with tf.name_scope("dnn"):
    hidden1 = neuron_layer(X, n_hidden1, "hidden1", activation="relu") #X as input
    hidden2 = neuron_layer(hidden1, n_hidden2, "hidden2", activation="relu") #Previous output matrix as input
    logits = neuron_layer(hidden2, n_outputs, "outputs") #Previous output matrix as input
    
#from tensorflow.contrib.layers import fully_connected
#fully_connected() achieves the same than neuron_layer()

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


**Sidenote** cross-entropy is the cost function associated with the softmax (multinomial regression), $\Theta$ is a matrix, each class has a vector of $\theta s$:

$J(\Theta) = -\frac{1}{m}\sum\limits_{i=1}^m \sum\limits_{k=1}^K y_k^{(i)}log(\hat{p}_k^{(i)})$

In [2]:
with tf.name_scope("loss"):
    #sparse_softmax_cross... computes the cross entropy based on the “logits”, and it expects labels in the form of integers
    #ranging from 0 to the number of classes minus 1 --> 1D tensor containing the cross entropy for each instance
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
                                                              logits=logits) #The previous cell is fed here
    loss = tf.reduce_mean(xentropy, name="loss") #Mean of 1d tensor
    
learning_rate = 0.01

#Train with Gradient Descent
with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    #optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) --> alternative usually faster optimizer
    training_op = optimizer.minimize(loss)

#Accuracy score
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1) #Assigns class withhighest probability
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32)) #convert boolean to float
    
init = tf.global_variables_initializer()
saver = tf.train.Saver()

In [5]:
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("/tmp/data/")

n_epochs = 10
batch_size = 50

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size) 
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={X: mnist.test.images,
            y: mnist.test.labels})
        print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)
    save_path = saver.save(sess, "./my_model_final.ckpt")

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
0 Train accuracy: 0.92 Test accuracy: 0.9108
1 Train accuracy: 0.96 Test accuracy: 0.9286
2 Train accuracy: 0.96 Test accuracy: 0.9377
3 Train accuracy: 0.98 Test accuracy: 0.9429
4 Train accuracy: 0.94 Test accuracy: 0.9484
5 Train accuracy: 1.0 Test accuracy: 0.9526
6 Train accuracy: 0.94 Test accuracy: 0.9545
7 Train accuracy: 0.96 Test accuracy: 0.9557
8 Train accuracy: 0.94 Test accuracy: 0.9587
9 Train accuracy: 0.94 Test accuracy: 0.9608


In [4]:
hidden1.name

'dnn/Relu:0'

### Vanishing/Exploding Gradients Problem

Because of the initialization of parameters previously $N(0,1)$ the gradients either vanished when backpropagating or they exploded. The proposed solution was a different initialization:

<table>
<tr><td>     Logistic       </td><td>         $r=\sqrt{\frac{6}{n_{inputs}+n_{outputs}}}$</td><td> $\sigma=\sqrt{\frac{2}{n_{inputs}+n_{outputs}}}$</td><tr> 
<tr><td>     Hyperbolic       </td><td>         $r=4\sqrt{\frac{6}{n_{inputs}+n_{outputs}}}$</td><td>    $\sigma=4\sqrt{\frac{2}{n_{inputs}+n_{outputs}}}$</td><tr> 
<tr><td>      ReLU      </td><td>         $r=\sqrt{2}\sqrt{\frac{6}{n_{inputs}+n_{outputs}}}$</td><td>    $\sigma=\sqrt{2}\sqrt{\frac{6}{n_{inputs}+n_{outputs}}}$</td><tr> 
</table>

### Batch Normalization

A technique called Batch Normalization (BN) to address the vanishing/exploding gradients problems, and more generally the problem that the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change.

The technique consists of adding an operation in the model just before the activation function of each layer, simply zero-centering and normalizing the inputs, then scaling and shifting the result using two new parameters per layer (one for scaling, the other for shifting). In other words, this operation lets the model learn the optimal scale and mean of the inputs for each layer.

In order to zero-center and normalize the inputs, the algorithm needs to estimate the inputs’ mean and standard deviation. It does so by evaluating the mean and standard deviation of the inputs over the current mini-batch.

In [7]:
import tensorflow as tf
from tensorflow.contrib.layers import batch_norm
from tensorflow.contrib.layers import fully_connected

n_inputs = 28 * 28
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X") 

#Boolean: whether it should use the current mini-batch’s mean and standard deviation (during training) or the running
#averages that it keeps track of (during testing).
is_training = tf.placeholder(tf.bool, shape=(), name='is_training')

#batch_norm parameters
bn_params = {
'is_training': is_training, #Defined above
'decay': 0.99, #Exponential decay parameter
'updates_collections': None #function to update the running averages right before it performs batch normalization during
#training (i.e., when is_training=True)
}

hidden1 = fully_connected(X, n_hidden1, scope="hidden1",normalizer_fn=batch_norm, normalizer_params=bn_params)
hidden2 = fully_connected(hidden1, n_hidden2, scope="hidden2",normalizer_fn=batch_norm, normalizer_params=bn_params)
logits = fully_connected(hidden2, n_outputs, activation_fn=None,scope="outputs",normalizer_fn=batch_norm,
                         normalizer_params=bn_params)

#same as
#with tf.contrib.framework.arg_scope(
#[fully_connected],
#normalizer_fn=batch_norm,
#normalizer_params=bn_params):
    #hidden1 = fully_connected(X, n_hidden1, scope="hidden1")
    #hidden2 = fully_connected(hidden1, n_hidden2, scope="hidden2")
    #logits = fully_connected(hidden2, n_outputs, scope="outputs",activation_fn=None)
    
#When running the session you need
#sess.run(...,feed_dict={is_training : True, ...})