# Introduction to Deep Learning


Our goal is to help you gain beginning deep learning skills through Python coding examples using Tensoflow 2.0. We begin by explaining the concept of deep learning, what machine learning algorithms do, how deep learning differs from machine learning, and how to implement deep learning examples.

**Deep learning** is a class of machine learning algorithms that uses multiple (successive) layers to progressively extract higher level features from the raw input. That is, deep learning emphasizes learning successive layers of increasingly meaningful representations from the data. In image processing, lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits or letters or faces.

# Neural Networks

In deep learning, the successive layers are almost always learned by models called neural networks. **Neural networks** are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw input.

A layer is the core building block in deep learning. A **layer** is a container that usually receives weighted input, transforms it with a set of mostly non-linear functions and then passes these values as output to the next layer.

A layer can be thought of as a data-processing module that acts as a filter for data. Data goes into a layer and it comes out in a more useful form. Layers extract representations out of the data fed into them. Of course, we hope that the representations are meaningful to help us solve the problem at hand. This is why deep learning takes a lot of practice and experimentation to reap benefits.

A very common problem that deep learning is used to solve is the identification of digits 0 through 9. So, we can create a neural network composed of successive layers to help us automatically predict a digit from its image data.

For example, suppose we have an image of the digit 8 in our data set. If our neural network is robust, it should be able to correctly predict that the digit is 8 from the image data without human intervention! That is, the computer model (the neural network) is able to predict with a high degree of accuracy images of digits. Of course, humans can easily distinguish digits between 0 and 9, but the ability of a computer model to do this is amazing and at the heart of what deep learning is all about.

A neural network is a collection of **neurons** with **synapses** connecting them. The collection is organized into three main parts:

* the input layer
* the hidden layer
* the output layer

When training a neural network, data is initially passed to the **input layer**. The input layer then passes the data through the activation function before passing it on to the first hidden layer. An **activation function** defines the output of a neuron given an input or set of inputs.

A **hidden layer** is a layer in between input layers and output layers where artificial neurons take in a set of weighted inputs and produce an output through an activation function. A network can have multiple hidden layers.

The **output layer** produces the result for given inputs. It is the place where all the computation is done.

Neurons tend to be remarkably simple, with nothing but a floating point value, an input, and an output. That float is what we refer to as the **weight** of a neuron.

So, neurons take inputs from their previous layer, transform them to keep values within a manageable range with an activation function, and send the transformed inputs along with their weights to neurons in the next layer. Since values at the input layer are generally centered at zero and have already been appropriately scaled, they don’t need transformation with an activation function.

# Learning Representations from Data

Machine learning algorithms discover rules to execute a data processing task. So, to conduct machine learning we need three things:

1. input data points
2. examples of expected output
3. a way to measure whether the algorithm is performing well

Input data points are data of some kind. An example of input data could be pictures. Image recognition in deep learning requires pictures. Of course the deep learning models require numeric data, which means that the pictures must be transformed in some way. We will cover how to transform such data later in the chapter.

To make predictions from data in deep learning, we need examples of expected output. So, the data must contain a representation of each data example and what each data example represents. For example, when predicting digits, the data must contain representations of each digit and what the digit represents. Specifically, if an example from the data is the digit 9 we must have the representation of the digit 9 and a target value of 9. We will cover how to represent a digit and its target value later in the chapter.

Finally, we need to determine the distance (loss) between the algorithm's current output and its expected output. This distance is often called loss or error. The **loss** is used as feedback to adjust the way the algorithm works. This adjustment is called **learning**. For example, if our neural network model predicts that a digit is 3 but it is really an 8 our model has at least some loss (or error). That is, there is some distance between what the model predicted and its expected output.

# Goal of Machine Learning

The goal of machine learning models is to transform input data into **meaningful** outputs. This transformation is how the model learns from exposure to known examples of inputs and outputs. As such, the central problem in machine learning and deep learning is to *meaningfully transform data*. If we can meaningfully transform data, we can learn useful *representations* of the input data that get us closer to the expected output.

Representations of data offer a different way to look at it. For example, an image of a digit can be represented as a 2 x 2 matrix of 1's and 0's.

Learning in the context of machine learning describes an automatic search process for better representations. Deep learning is a specific subfield of machine learning that emphasizes learning successive *layers* of increasingly meaningful representations.

Simply, we want to learn from our dataset so we can reliably predict from new (unseen) data!

# Control a Neural Network

To control the output of a neural network, we need to measure how far this output is from what we expected. So, we introduce a loss function for this purpose. The **loss function** takes the predictions of the network and the true target and computes a distance score. The **true target** is what we wanted the network to output. That is, what we expect. The **distance score** captures how well the network has done on this specific set of data.

Simply, the **distance score** measures the difference between our expected outcome and the actual outcome. This distance is commonly referred to as **loss**, **cost** or **error**. We want our neural network to minimize loss. That is, we want our neural network to minimize the difference between our predictions and the actual values.

We can refer to what we want from our neural network in several ways:

* minimize loss
* minimize cost
* minimize error
* minimize the loss function
* minimize the cost function

The loss function score is used as a feedback signal to adjust the value of the model in a direction that will lower loss. The higher the loss, the worse our model is performing. To adjust the loss function score, we use an optimizer. An **optimizer** is used to improve speed and performance when training a specific model. An optimizer implements a backpropagation algorithm. The **backpropagation algorithm** is the *central algorithm* in deep learning.

# Backpropagation

A neural network propagates the signal of the input data forward through its parameters towards the moment of decision. It then backpropagates information about the error in reverse through the network so that it can alter the parameters. This happens step by step:

* The network makes a guess about data, using its parameters
* The network’s progress is measured with a loss function
* The error is backpropagated to adjust the wrong-headed parameters

**Backpropagation** takes the error associated with a wrong guess by a neural network and uses that error to adjust the neural network’s parameters in the direction of less error. Specifically, backpropogation starts with the final loss value (or error), works backwards from the top layers to the bottom layers, and applies the chain rule (from calculus) to compute the contribution that each parameter had in the loss value. The algorithm that computes the loss value is gradient descent. **Gradient descent** is an optimization algorithm used to minimize a function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In machine learning, we use gradient descent to update the parameters of our model. Since this is a beginning book, we won't delve deeper into how backpropagation works.

# The Training Loop

When data initially enters a neural network, its output is from from ideal. So, loss is very high. But, with every example the network processes the weights are adjusted a little in the correct direction. So, loss decreases. A **weight** refers to the strength of a connection between two nodes (or neurons). This cycle of repeatedly feeding the data into the network and adjusting weights is called the **training loop**.

By repeating the training loop a sufficient amount of times, the network yields weights that minimize the loss function (or loss). A network with minimal loss is one with outputs that are very close to the targets. Such a network is called a **trained network**.

When a neural network is running, it takes a data set, splits it into a bunch of tiny fragments, and disperses those fragments among all of the neurons contained within. The neurons take the data they receive, operate on it using the stored weight, and then pass on the resulting data to the output. At the end of processing, all of the outputs are aggregated to come to a conclusion. If the network is still being trained, this conclusion will be evaluated for correctness and then the weights of all the neurons involved will be adjusted slightly. These adjustments reduce the values of the ones that were wrong and increase the ones that were right.



# How Deep Learning Learns from Data

Deep learning learns through incremental, layer-by-layer activity where increasingly complex representations are developed. Also, these intermediate incremental representations are learned jointly. That is, each layer is automatically updated to follow both the representational needs of the layer above and the needs of the layer below. Learning through **successive layers** and **learning jointly** make deep learning vastly more successful than previous approaches to machine learning.

Before we can begin working with real deep learning examples, we need to explain the tensorflow package and the work space (Google Colab) that we will be working with throughout the book.

# TensorFlow 2.x

**TensorFlow** is a Python open source library for numerical computation that was created to facilitate machine learning and deep learning problem solving. TensorFlow bundles together machine learning and deep learning models and algorithms and makes them useful by way of a common programming environment.

In 2019, Google released a new version of their TensorFlow deep learning library (TensorFlow 2.x) that integrates the Keras API directly and promotes this interface as the default or standard interface for deep learning development on the platform. The integration between TensorFlow and Keras is commonly referred to as the **tf.keras** interface or API ('tf' is the abbreviation for 'TensorFlow').

**Keras** is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Keras is extremely popular for building and training deep learning models.

The Keras API was the obvious choice for integration with TensorFlow because it was clean and simple, which allowed standard deep learning models to be defined, fit, and evaluated in just a few lines of code. Another reason was because it allowed us to use popular deep learning mathematical libraries for backend computation such as TensorFlow, Theano, and CNTK. As a result, Keras could use the power of these libraries to be harnessed with a very clean and simple interface.

The biggest change with Tensorflow 2.x is the integration of Keras layers and models to manage variables. Keras models and layers offer the convenient variables and trainable_variables properties to recursively gather up all dependent variables, which makes it easy to manage variables locally where they are being used.

TensorFlow 2.x was introduced to make TensorFlow users more productive. TensorFlow 2.x removes redundant APIs, makes APIs more consistent, and better integrates with the Python runtime with Eager execution.

**Eager execution** is a flexible machine learning platform for research and experimentation that provides an intuitive interface to help you structure your code naturally by using legacy Python data structures. It also allows you to quickly iterate on small models and small data.

# Google Colab

**Google Colab** (short for Google Colaboratory) is a cloud-based data science work space similar to the jupyter notebook. Each **Colab** session is equipped with a virtual machine running 13 GB of ram and either a CPU, GPU, or TPU processor. The work space allows you to use and share Jupyter notebooks with others without having to download, install, or run anything on your own computer other than a browser.

Colab is an exceptional research tool for machine learning education and research because it is free to use, mimics a Jupyter notebook environment, and requires no setup to use. It also provides GPU and TPU support for complex machine learning models that require heavy duty computing resources. GPU support is very easy to set and use. However, TPU support requires deep expertise because it is much more difficult to set up and use.

Although Colab is based on the popular Jupyter open source project, the **interface** and **functionality** differ slightly. So, it may take you a minute to familiarize yourself with the product.

The first time we worked with Colab, we visited the main site at: 
https://colab.research.google.com/notebooks/welcome.ipynb. The site offers a nice tutorial on using the product. But, you can browse and find numerous tutorials and YouTube videos on the topic to deepen your Colab skills. Actually, Jupyter Notebook aficionados should have little trouble adapting because Colab was developed based on the Jupyter project.

Colab works with most major browsers. But, it is most thoroughly tested with the latest versions of Chrome, Firefox and Safari.

Colaboratory notebooks are stored in Google Drive and can be shared as you would with Google Docs or Sheets. Simply click the Share button at the top right of any Colaboratory notebook, or follow these Google Drive file sharing instructions.

For **all** of the programming material covered, we will be working in the Colab environment. Of course, you are free to work in any enviroment you wish. But, for readers new to TensorFlow 2.0 we strongly recommned working within the Colab environment.

# Google Drive

Google Drive is a cloud-based file storage and synchronization service developed by Google. It allows us to store files on their servers, synchronize files across devices, and share files. Google Drive encompasses Google Docs, Google Sheets, and Google Slides, which are a part of an office suite that permits collaborative editing of documents, spreadsheets, presentations, drawings, forms, and more. Files created and edited through the office suite are automatically saved in Google Drive. Fortunatley for us, Google Drive offers 15 gigabytes of free storage.

## Connect Google Colab with Google Drive

It just takes a few simple steps to connect Colab with Google Drive:

1. Sign into your Google email account

2. Open a new browser tab and browse to **Google Colab**

3. Click the **Colab-Google** link

4. Click **Google Drive** in the pop-up window

All notebooks on your Google Drive account appear in the window. You should see no notebooks appear unless you’ve worked with Colab in the past. Notebooks are saved on Google Drive My Drive inside the **Colab Notebooks** directory. This directory is *automatically* created when Colab is connected to Google Drive.
If you want to create a new notebook, click on **NEW NOTEBOOK**. Or, click on **CANCEL**, which takes you to the **Welcome To Colaboratory** screen. This screen offers the main menu for Google Colab as well as the table of contents that helps you get started.

The connection we just established between Google Colab and Google drive is *persistent*. That is, we only need to establish this connection *once* unless browser history is cleared.

# Create a Notebook

Here are the steps to create a new Colab notebook:

1. Open **Google Colab** in a browser (if you haven't already done so).

2. Click the **File** tab (top-left)

3. Click **New notebook** from the drop-down list

Now, we are ready to begin working with Colab! We can just start typing our code and press the arrow symbol on the left to execute.

To create your first piece of code, add the following in the code cell:

In [1]:
string = 'Peter picked a pail of pickeled peppers'
string

'Peter picked a pail of pickeled peppers'

To execute code, click the *little arrow to the left*. The output from the code cell shows the contents of the string variable.

# GPU Hardware Accelerator

To vastly speed up processing, we use the GPU available from the Google Colab cloud service. Colab provides a free Tesla K80 GPU of about 12 GB. It’s very easy to enable the GPU in a Colab notebook:
1.	click **Runtime** in the top left menu
2.	click **Change runtime type** from the drop-down menu
3.	choose **GPU** from the *Hardware accelerator* drop-down menu
4.	click **SAVE**

The GPU must be enabled in each notebook. But, it only has to be enabled once.

Test is GPU is active:

In [2]:
import tensorflow as tf

# display tf version and test if GPU is active
tf.__version__, tf.test.gpu_device_name()

('2.3.0', '/device:GPU:0')

Import the *tensorflow* library and display the version of TensorFlow as well as the status of the GPU. If '/device:GPU:0' is displayed, the GPU is active. If '..' is displayed, the regular CPU is active.

# Download a File from a URL

We can directly download a file from a URL with **tf.keras.utils.get_file**. But, we need TensorFlow for this task. Fortunately, Colab already has it preinstalled. Colab has two versions: a 2.x version and a 1.x version. Colab currently uses TensorFlow 2.x by default, but users can easily switch to 1.x. Consult the following URL for more information:

https://colab.research.google.com/notebooks/tensorflow_version.ipynb#scrollTo=N2y2uqx9GfA5

## Download Data from a URL

In [3]:
# we need the keras module
from tensorflow import keras

ds = 'auto-mpg.data'
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
dataset_path = tf.keras.utils.get_file(ds, url)
dataset_path

Downloading data from http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data


'/root/.keras/datasets/auto-mpg.data'

We just downloaded a dataset from the UCI Machine Learning Repository. The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine.

## Prepare the Dataset

As is, the dataset requires some preprocessing. For example, it is without feature headings.

In [4]:
# we need the pandas library
import pandas as pd

cols = ['MPG','Cylinders','Displacement','Horsepower','Weight',
        'Acceleration', 'Model Year', 'Origin']

raw_dataset = pd.read_csv(dataset_path, names=cols,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)

We begin by importing the pandas library. We continue by creating a variable to hold the feature names for each column. We then create a Pandas DataFrame with the **read_csv()** method.

Let's see what we have.

In [5]:
raw_dataset.tail()

Unnamed: 0,MPG,Cylinders,Displacement,Horsepower,Weight,Acceleration,Model Year,Origin
393,27.0,4,140.0,86.0,2790.0,15.6,82,1
394,44.0,4,97.0,52.0,2130.0,24.6,82,2
395,32.0,4,135.0,84.0,2295.0,11.6,82,1
396,28.0,4,120.0,79.0,2625.0,18.6,82,1
397,31.0,4,119.0,82.0,2720.0,19.4,82,1


In [6]:
# create a copy
data = raw_dataset.copy()

# display some data
data.tail()

Unnamed: 0,MPG,Cylinders,Displacement,Horsepower,Weight,Acceleration,Model Year,Origin
393,27.0,4,140.0,86.0,2790.0,15.6,82,1
394,44.0,4,97.0,52.0,2130.0,24.6,82,2
395,32.0,4,135.0,84.0,2295.0,11.6,82,1
396,28.0,4,120.0,79.0,2625.0,18.6,82,1
397,31.0,4,119.0,82.0,2720.0,19.4,82,1


We create a copy of the original DataFrame in case we need the original data. We end by displaying the last five records with the **tail()** method.

# Colab Abends

An AbEnd (also abnormal end or abend) is an abnormal termination of software or a program crash. The term goes way back to an error message from the IBM OS/360, which used the IBM zOS operating systems.

Since Colab is free, we can't expect it to be perfect for our needs. We've noticed that when we run Colab for a long time (several hours) without pause or read a large dataset into memory and process said data, it has the tendency to crash (or abend). When this happens, you have two choices that we know of:

1. Restart runtime.

2. Close the program and restart it from scratch.

To restart runtime, click **Runtime** on the top menu, click **Restart runtime...**, and click **Yes** when prompted. Colab recommends this option. To restart from scratch, clear browser history first and then start Colab from scratch.

# Colab Strange Results

We've noticed that sometimes we get errors and other strange results when working with Colab. If you are getting unexpected errors or results, just restart the runtime (just like in the **Colab Abends** section) for the notebook you are working on. And, rerun your notebook from the beginning.

Don't be shy to restart runtime. We've been working with Colab for quite some time and find that Colab can act strangely, especially when running when processing complex models and/or being used for a long time. Everytime we've restarted runtime, it has acted as expected. 

# Tensors

A **tensor** is a container for *numeric* data. Tensors can contain data within an arbitrary number of dimensions. That is, it can be a zero-dimensional (0D), one-dimensional (1D), two-dimensional (2D), three-dimensional (3D), and so on. Within the context of tensors, a dimension is often called an **axis**.

So, tensors are a generalization of matrices represented by n-dimensional arrays. The dimensionality of a tensor is often described by its number of axes. The number of axes represented by a tensor is called its **rank**. Tensors are defined by how many axes they have in total.

## Scalars (0D tensors)

A **scalar** is a tensor of only one number. For example, a Numpy float32 or float64 number is a scalar tensor (or scalar array).

It is easy to dispay the number of axes (or dimensionality) of a Numpy tensor with the *ndim* attribute. Let's look at an example of a Numpy scalar:

In [9]:
import numpy as np

# create a numpy scalar

scalar = np.array(9)
scalar

array(9)

In [10]:
# signal its rank

print (str(scalar.ndim) + 'D')

0D


## Vectors (1D tensors)

A **vector** is an array of numbers. So, a vector is a 1D tensor. A 1D tensor has exactly one axis.

Let's look at an example of a Numpy vector:

In [11]:
# create a numpy vector

vector = np.array([0, 1, 0, 0, 0, 0])
vector

array([0, 1, 0, 0, 0, 0])

In [12]:
# signal its rank

print (str(vector.ndim) + 'D')

1D


## Matrices (2D tensors)

A **matrix** is an array of vectors. So, a matrix is a 2D tensor. A 2D tensor (or matrix) has two axes. Its axes are generally referred to a *rows* and *columns*.

Let's look at an example of a Numpy matrix:

In [13]:
# create a numpy matrix

matrix = np.array([[0, 1, 0, 0, 0, 0],
                   [0, 0, 1, 0, 0, 0],
                   [0, 0, 0, 1, 0, 0],
                   [0, 0, 0, 0, 1, 0],
                   [0, 0, 0, 0, 0, 1]])
matrix

array([[0, 1, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0],
       [0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 1]])

In [14]:
# signal its rank

print (str(matrix.ndim) + 'D')

2D


**Rows** are the entries from the first axis and **columns** are the entries from the second axis. So, the first row from our example is [0, 1, 0, 0, 0, 0] and the first column is [0, 0, 0, 0, 0, 0].

## 3D tensors and beyond

We can create a 3D tensor by packing 2D tensors (or matrices) into a new array. A 3D tensor can be visually interpreted as a cube of numbers.

Let's look at an example of a Numpy 3D tensor:

In [15]:
# create a 3D tensor

D3 = np.array([[[0, 1, 2]],
               [[3, 4, 5]],
               [[6, 7, 8]]])

# signal its rank

print (str(D3.ndim) + 'D')

3D


By packing 3D tensors into an array, you can create 4D tensors and so on. In deep learning, we generally manipulate tensors that are 0D to 4D. With video processing, we can go up to 5D.

# Key Attributes of a Tensor

1. rank
2. shape
3. data type

As discussed earlier, the **rank** of a tensor is its number of axes. A 1D tensor has one axis, a 2D tensor has two axes, and a 3D tensor has three axes.

The **shape** of a tensor is a tuple of integers that describe the number of dimensions it has along each axis. So, our 3D matrix has shape (3, 1, 3), 2D matrix has shape (5, 6), vector has shape (6,) and scalar has an empty shape ().

Let's prove this with examples:

In [16]:
# 3 instances of 1 x 3 matrices

D3.shape

(3, 1, 3)

In [17]:
# 5 rows and 6 columns (or 5 x 6 matrix)

matrix.shape

(5, 6)

In [18]:
# 6 element vector

vector.shape

(6,)

In [19]:
# just a scalar number

scalar.shape

()

The **data type** is the type of data contained in the tensor. In Python, the data type is usually called *dtype*. Let's see data types of our tensors:

In [20]:
# dtype of tensors

print (scalar.dtype)
print (vector.dtype)
print (matrix.dtype)
print (D3.dtype)

int64
int64
int64
int64


# Input Pipelines

An **input pipeline** is a sequence of data processing components that manipulate and apply data transformations. Pipelines are very common in machine learning and deep learning systems because these systems are data-rich. That is, they demand large volumes of data to perform. Input pipelines are the best way to transform large datasets because they break down processing into manageable components.

Each component of an input pipeline pulls in a large amount of data, processes it in some manner, and spits out the result. The next component pulls in the resultant data, processes it in another manner, and spits out its own output. The pipeline continues until all of its components have finished their work.

# The tf.data API

The tf.data API revolves around the concept of a dataset, which represents a sequence of data items. Let's create our first TensorFlow dataset now.

Let's create a simple 1D dataset:

In [21]:
# import tensorflow if starting from here
import tensorflow as tf

X = tf.range(5)
X

<tf.Tensor: shape=(5,), dtype=int32, numpy=array([0, 1, 2, 3, 4], dtype=int32)>

The dataset is a vector with five NumPy *int32* values ranging from 0 to 4.

We can easily display actual values from the TensorFlow tensor we just created:

In [22]:
X.numpy()

array([0, 1, 2, 3, 4], dtype=int32)

We can also access individual elements:

In [23]:
# first element from tensor

X[0].numpy()

0

Or, slice multiple elements:

In [24]:
# 2nd, 3rd, and 4th elements from tensor

X[1:4].numpy()

array([1, 2, 3], dtype=int32)

# Function 'from_tensor_slices'

Let's convert the dataset to a TensorFlow dataset:

In [25]:
dataset = tf.data.Dataset.from_tensor_slices(X)
dataset

<TensorSliceDataset shapes: (), types: tf.int32>

The **from_tensor_slices** functions takes a tensor and creates a **tf.data.Dataset** whose elements are all the slices of **X** (along the first dimension). We'll talk a lot more about TFDS data in the next chapter. Of course **X** only has one dimension! So, **dataset** contains five items, namely, tensors 0 to 4. Notice that the shape is '()', which indicates that the elements are scalars.

# Iterate a tf.data.Dataset

We can simply iterate over the dataset items as so:

In [26]:
for item in dataset:
  print (item)

tf.Tensor(0, shape=(), dtype=int32)
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor(4, shape=(), dtype=int32)


Alternatively, we could have created the TensorFlow dataset directly:

In [27]:
dataset = tf.data.Dataset.range(5)
for item in dataset:
  print (item)

tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(2, shape=(), dtype=int64)
tf.Tensor(3, shape=(), dtype=int64)
tf.Tensor(4, shape=(), dtype=int64)


However, notice that the **dtype** is *int64* rather than *int32*.

# Tensors and Numpy

Tensors play nice with Numpy. We can create a tensor from a NumPy array and vice versa. We can even apply TensorFlow operations to NumPy arrays and NumPy operations to tensors. Let's create a numpy array from the TensorFlow dataset that we just created.

In [28]:
# create a variable to hold a line break

br = '\n' # this is just a convenient way to include a line break

# import NumPy

import numpy as np

# technique 1

ls = []
for item in dataset:
  e = item.numpy()
  ls.append(e)

np_arr = np.asarray(ls, dtype=np.float32)
print (type(np_arr))
print (np_arr, br)  

# technique 2

ls = [item.numpy() for item in dataset]
np_arr = np.asarray(ls, dtype=np.float32)

print (type(np_arr))
print (np_arr)

<class 'numpy.ndarray'>
[0. 1. 2. 3. 4.] 

<class 'numpy.ndarray'>
[0. 1. 2. 3. 4.]


Use the **numpy()** method to create a numpy array from a TensorFlow dataset.The first technique uses a conventional Python loop. The second technique uses list comprehension, which takes fewer lines of code.

We can convert the numpy array back to a TensorFlow dataset with the **constant** method.

In [29]:
tf_arr = tf.constant(np_arr)
tf_arr

<tf.Tensor: shape=(5,), dtype=float32, numpy=array([0., 1., 2., 3., 4.], dtype=float32)>

However, constants are immutable. That is, their values cannot be modified. So, we can use the **variable** method if we need to modify.

In [30]:
# use the 'np_arr' list we just created

tf_arr = tf.Variable(np_arr)
tf_arr

<tf.Variable 'Variable:0' shape=(5,) dtype=float32, numpy=array([0., 1., 2., 3., 4.], dtype=float32)>

# Chaining Transformations

We can apply transformations to a tf.data.Dataset by calling its transformation methods. Each method returns a **new** dataset, which allows us to chain transformations. Let's start with a single transformation.

Create a tf.data.Dataset and show its values:

In [31]:
dataset = tf.data.Dataset.range(5)

for item in dataset:
  print (item)

tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(2, shape=(), dtype=int64)
tf.Tensor(3, shape=(), dtype=int64)
tf.Tensor(4, shape=(), dtype=int64)


We see that *dataset* contains values [0, 1, 2, 3, 4].

Use the **repeat()** transformation method to add data:

In [32]:
data_rep = dataset.repeat(3)

for item in data_rep:
  print (item)

tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(2, shape=(), dtype=int64)
tf.Tensor(3, shape=(), dtype=int64)
tf.Tensor(4, shape=(), dtype=int64)
tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(2, shape=(), dtype=int64)
tf.Tensor(3, shape=(), dtype=int64)
tf.Tensor(4, shape=(), dtype=int64)
tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(2, shape=(), dtype=int64)
tf.Tensor(3, shape=(), dtype=int64)
tf.Tensor(4, shape=(), dtype=int64)


The new dataset contains three sets of the original. In deep learning, we can repeat data to enlarge a dataset for better performance without getting new data.

Now, let's chain transformations:

In [33]:
data_batch = dataset.repeat(3).batch(7)

for item in data_batch:
  print (item)

tf.Tensor([0 1 2 3 4 0 1], shape=(7,), dtype=int64)
tf.Tensor([2 3 4 0 1 2 3], shape=(7,), dtype=int64)
tf.Tensor([4], shape=(1,), dtype=int64)


What happened? The first transformation, **repeat(3)**, creates three copies of the original dataset. We chain the first transformation into the second with **batch(7)**, which creates batches of seven items from the dataset.

So, the new dataset contains three tensors. The first tensor contains \[0, 1, 2, 3, 4, 0, 1], the second tensor contains \[2, 3, 4, 0, 1, 2, 3], and the third tensor contains \[4]. By the time we get to the third batch, we run out of data.

We can drop the final batch as so:

In [34]:
data_drop = dataset.repeat(3).batch(7, drop_remainder=True)

for item in data_drop:
  print (item)

tf.Tensor([0 1 2 3 4 0 1], shape=(7,), dtype=int64)
tf.Tensor([2 3 4 0 1 2 3], shape=(7,), dtype=int64)


Dataset methods don't modify datasets, they create new ones. So, we can keep track of each dataset by naming them differently.

We can create equal batches as so:

In [35]:
data_equal = dataset.repeat(3).batch(5)

for item in data_equal:
  print (item)

tf.Tensor([0 1 2 3 4], shape=(5,), dtype=int64)
tf.Tensor([0 1 2 3 4], shape=(5,), dtype=int64)
tf.Tensor([0 1 2 3 4], shape=(5,), dtype=int64)


# Mapping Tensors

We can transform items in a tensor with the **map()** method. Let's look at an example:

In [36]:
# create a dataset

dataset = tf.data.Dataset.range(7)

# repeat and batch it

data_batch = dataset.repeat(3).batch(7)

# display the batched dataset

for row in data_batch:
  print (row)

# map() a function on it

data_map = data_batch.map(lambda x: x ** 2)

# display the first batch

print ()
for item in data_map.take(1):
  print (item)

tf.Tensor([0 1 2 3 4 5 6], shape=(7,), dtype=int64)
tf.Tensor([0 1 2 3 4 5 6], shape=(7,), dtype=int64)
tf.Tensor([0 1 2 3 4 5 6], shape=(7,), dtype=int64)

tf.Tensor([ 0  1  4  9 16 25 36], shape=(7,), dtype=int64)


We create a new dataset with values \[0, 1, 2, 3, 4, 5, 6]. We chain the repeat transformation to the batch transformation. We square each item by mapping with a lamda function. A **lambda function** is a single-line function declared with no name that can have any number of arguments, but can only have one expression. Instead of iterating the entire dataset, we can take one or more samples with the **take()** method. In our case, we just take the first sample.

# Unbatch Data

What if we want to unbatch a dataset? Let's look at an example:

In [37]:
dataset = tf.data.Dataset.range(5)

data_batch = dataset.repeat(3).batch(7)

for item in data_batch.take(1):
  print (item)

print ()

data_unbatch = data_batch.unbatch()

for item in data_unbatch.take(7):
  print (item)

tf.Tensor([0 1 2 3 4 0 1], shape=(7,), dtype=int64)

tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(2, shape=(), dtype=int64)
tf.Tensor(3, shape=(), dtype=int64)
tf.Tensor(4, shape=(), dtype=int64)
tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)


We create a dataset, apply a chaining transformation, and display the first tensor with values \[0, 1, 2, 3, 4, 0, 1]. We unbatch the dataset and display the first seven tensors. Notice that the first tensor from **data_batch** contains seven items and that each tensor from **data_unbatch** contains a single scalar value.

# Filter a tf.data.Dataset

We can also filter data with the **filter()** method:

In [38]:
# create a dataset

dataset = tf.data.Dataset.range(7)

# display the dataset

for row in dataset:
  print (row)

# apply a filter

data_filter = dataset.filter(lambda x: x < 6 and x > 3)

print ()
for item in data_filter:
  print (item)

tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(2, shape=(), dtype=int64)
tf.Tensor(3, shape=(), dtype=int64)
tf.Tensor(4, shape=(), dtype=int64)
tf.Tensor(5, shape=(), dtype=int64)
tf.Tensor(6, shape=(), dtype=int64)

tf.Tensor(4, shape=(), dtype=int64)
tf.Tensor(5, shape=(), dtype=int64)


We see that **data_filter** only contains tensors with scalar values less than 3.

# Shuffling Data

Deep learning algorithms work best when instances in the training set are independent and identically distributed. A simple way to ensure this is to shuffle instances with the **shuffle()** method.

 Let's look at an example:

In [39]:
# create a dataset

dataset = tf.data.Dataset.range(10).repeat(3)
print ('dataset has', len(list(dataset)), 'elements')

dataset has 30 elements


Shuffle the dataset:

In [40]:
# shuffle data into batches of 7

ds = dataset.shuffle(buffer_size=5).batch(7)

for item in ds:
  print (item)

tf.Tensor([1 2 4 3 8 0 9], shape=(7,), dtype=int64)
tf.Tensor([5 2 6 7 5 4 6], shape=(7,), dtype=int64)
tf.Tensor([3 9 8 1 1 0 4], shape=(7,), dtype=int64)
tf.Tensor([5 0 7 8 6 7 3], shape=(7,), dtype=int64)
tf.Tensor([9 2], shape=(2,), dtype=int64)


We get tensors of seven items because we set batch size to 7. Notice that the last tensor has only two elements. We have four tensors of size seven, which equals 28 elements. Since the dataset has 30 elements, we have two left over.

We set buffer size to 5. So, Tensorflow keeps a buffer of the next five samples and randomly selects one those five samples. It then adds the next element to the buffer. Each sample contains a batch of data. So, each sample in our example contains seven elements because we set batch size to 7.

Performance can be improved by experimenting with different buffer sizes. But, getting it right takes time and energy.

Once shuffle is applied to a dataset, each dataset iteration creates a new shuffle:

In [41]:
# rerun to get a different shuffle

for item in ds:
  print (item)

tf.Tensor([4 5 1 2 3 8 0], shape=(7,), dtype=int64)
tf.Tensor([1 6 3 7 0 5 4], shape=(7,), dtype=int64)
tf.Tensor([7 9 6 2 2 8 3], shape=(7,), dtype=int64)
tf.Tensor([9 6 1 8 4 0 9], shape=(7,), dtype=int64)
tf.Tensor([5 7], shape=(2,), dtype=int64)


# TensorFlow Math

TensorFlow provides several operations for math computations with the tf.math module. Peruse https://www.tensorflow.org/api_docs/python/tf/math for all possible math operations.

## Vector Tensors

Let's create some data:

In [42]:
# create data

v1 = np.array([0, 1, 4, 8, 16])
v2 = np.array([0, 3, 9, 27, 81])

Convert numpy arrays to tensor constants and add:

In [43]:
conv1 = tf.constant(v1)
conv2 = tf.constant(v2)

result = tf.add(conv1, conv2)
result

<tf.Tensor: shape=(5,), dtype=int64, numpy=array([ 0,  4, 13, 35, 97])>

Convert numpy arrays to tensor variables and add:

In [44]:
varv1 = tf.Variable(v1)
varv2 = tf.Variable(v2)

result = tf.add(varv1, varv2)
result

<tf.Tensor: shape=(5,), dtype=int64, numpy=array([ 0,  4, 13, 35, 97])>

Subtract tensor variables:

In [45]:
result = tf.subtract(varv2, varv1)
result

<tf.Tensor: shape=(5,), dtype=int64, numpy=array([ 0,  2,  5, 19, 65])>

Mix constants and variables:

In [46]:
result = tf.add(conv1, varv2)
result

<tf.Tensor: shape=(5,), dtype=int64, numpy=array([ 0,  4, 13, 35, 97])>

Equivalency:

In [48]:
result = tf.equal(varv1, varv2)
result

<tf.Tensor: shape=(5,), dtype=bool, numpy=array([ True, False, False, False, False])>

Multiply:

In [49]:
result = tf.multiply(conv1, conv2)
result

<tf.Tensor: shape=(5,), dtype=int64, numpy=array([   0,    3,   36,  216, 1296])>

Divide:

In [50]:
result = tf.divide(conv2, 3)
result

<tf.Tensor: shape=(5,), dtype=float64, numpy=array([ 0.,  1.,  3.,  9., 27.])>

## Matrix Tensors

In [51]:
# create data

m1 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
m2 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])

m1.shape, m2.shape

((3, 3), (3, 3))

Convert numpy matrices to tensors and add:

In [52]:
conm1 = tf.constant(m1)
conm2 = tf.constant(m2)

result = tf.add(conm1, conm2)
result

<tf.Tensor: shape=(3, 3), dtype=int64, numpy=
array([[2, 0, 0],
       [0, 2, 0],
       [0, 0, 2]])>

Equivalency:

In [53]:
result = tf.equal(conm1, conm2)
result

<tf.Tensor: shape=(3, 3), dtype=bool, numpy=
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])>

## tf.data.Dataset Tensors

Create a dataset:

In [54]:
# create a dataset

m = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

Convert numpy matrix to a tf.data.Dataset:

In [55]:
dataset = tf.data.Dataset.from_tensor_slices(m)
dataset

<TensorSliceDataset shapes: (4,), types: tf.int64>

Display tensors:

In [56]:
for t in dataset:
  print (t)

tf.Tensor([1 2 3 4], shape=(4,), dtype=int64)
tf.Tensor([5 6 7 8], shape=(4,), dtype=int64)
tf.Tensor([ 9 10 11 12], shape=(4,), dtype=int64)


Transform tensor values:

In [57]:
squared_data = dataset.map(lambda x: x ** 2)

# display tensors

for item in squared_data:
  print (item)

tf.Tensor([ 1  4  9 16], shape=(4,), dtype=int64)
tf.Tensor([25 36 49 64], shape=(4,), dtype=int64)
tf.Tensor([ 81 100 121 144], shape=(4,), dtype=int64)


# Save a Notebook

Although **Autosave** is implemented in Google Colab, there is a delay between the moment you execute a cell and when the save occurs. So, we recommend periodically saving.

1. Click **File** in the top left menu
2. Click **Save** in the drop-down menu 

# Download a Notebook to a Local Drive

Google Drive is an excellent place to store Colab notebooks. But, we also like to save notebooks to a local drive.
Download a notebook to a local drive:
1.	Be sure to save the notebook
2.	Click **File** in the drop-down menu
3.	Click **Download .ipynb** in the drop-down menu

# Load a Notebook from a Local Drive

We always save the most current notebook to our local drive because we have plenty of extra storage. We use Google Drive as backup because we only have 15 GB of free space. If you work for a company, they may provide extra storage. Given this case, you may want to use Google Drive for primary storage.

To load a notebook from a local drive:
1. Open **Google Colab**
2. On the pop-up menu, click **Upload**
3. Click **Choose File**
4. Locate the notebook on your local drive and open it