# Introduction to deep learning

**Key Terms**

- **Deep-learning model:**
    Also called an artificial neural network (ANN), a machine-learning model that is composed of more than one layer
- **Shallow model:**
    A machine-learning model with just a single layer

In our previous analysis, we learned various machine-learning algorithms. Machine learning resides at the heart of data science, and advances in machine learning foster the discipline's development. Some methods, like linear and logistic regressions, have been around for many decades. And others, like random forest, support vector machines, and various boosting approaches, have been known since the 1980s.

However, the last decade witnessed a revolution in machine learning. That revolution is commonly called the *deep-learning revolution*. Deep learning as a subfield of machine learning has achieved enormous success in tasks such as computer vision, object detection, image classification, machine translation, speech recognition, and more. Based on building very large models that have millions (even billions) of parameters by stacking layers over layers, deep-learning models have even surpassed human-level performances in some areas. That being said, the ideas and the methodologies that paved the way for this revolution have their roots in the 1980s and the 1990s.

What is it that makes deep-learning models achieve astonishing successes? Here are the key reasons:

1. **The sheer amount of available data.** To perform well, deep-learning models require a lot of data to train on. The worldwide spread of the internet and the exponential increases in data generation are among the main reasons behind the successes of deep-learning models. Researchers discovered that if deep-learning models are trained on sufficient amounts of data, then they can achieve astounding results in some tasks. However, without enough computational resources, training these models would take a very long time.

2. **The increased computational capacity** of today's computers helped researchers to overcome the long durations of training deep-learning models. Hence, many experimental studies culminated in the idea that deep-learning models can achieve great performance if they are trained on enough data.

We'll learn the fundamentals of deep learning and focus on the following topics:

* What artificial neural networks are and their basic structures
* What the TensorFlow library is, along with its high-level API, Keras. These libraries are essential when working with deep-learning models.
* What activation functions and loss functions are
* How to train deep artificial neural networks

To get started, you'll need to understand the difference between *deep models* and *shallow models*.

### What does *deep* mean?

Artificial networks are the essence of deep-learning models, and the terms *artificial neural networks*, *deep-learning models*, and *deep neural networks* can be used interchangeably. But to understand deep learning, it's important to know where the term *deep* comes from. The term also implies the existence of some *shallow* models.

We can think of a deep-learning model as a machine-learning model that is composed of more than one layer. We can think of a layer as a mathematical block that takes some features as inputs, and then outputs another set of features by mathematically transforming the inputs. The layer that is the easiest to understand is probably linear regression. Linear regression takes inputs and then outputs a single value. A more complex layer can be constructed by putting $N$ number of linear regressions near each other, so that when they are given $M$ number of features, they output $N$ new features by transforming their inputs.

As mentioned above, a deep-learning model is a model that consists of more than one layer. However, the layers should be organized such that the output of a previous layer would be the input of the next layer. In this respect, a deep-learning model is different from just ensembling several models together. The serial processing of the input, layer by layer, is a novel feature of deep-learning models. In contrast, a shallow model—like linear regression, SVM, or random forest—is one that has only a single layer. The figure below depicts what a deep-learning model looks like:

![Eng-multiple_layers.png](attachment:Eng-multiple_layers.png)

**Image Source:** Michael Nielsen, Neural Networks and Deep Learning

### What makes deep-learning models so successful?

Now, briefly consider what makes deep-learning models special in some tasks. The key insight is that the layered structure of deep-learning models enables them to discover complex patterns, layer by layer. That's to say, in the first layer, the primitive patterns are discovered. In the second layer, more complex patterns that are the combinations of the previously discovered primitive patterns are discovered. Similarly, a subsequent layer discovers the useful combinations of the previous layer's discoveries, and so on.

This insight leads us to a slightly different type of feature engineering. Remember that the feature-engineering step is one of the most crucial phases of a data science pipeline; in all the models that we built throughout this program, we made a careful analysis of feature engineering. However, some very complex data types, like images, audio, or text, are so complex that manually artifacting all the useful features is almost impossible. One of the crucial discoveries of the last decade is the fact that deep-learning models are very successful at discovering useful features from complex data, such as image, video, and text. Hence, the magic of these models is their feature-engineering capabilities. And to some extent, these capabilities free researchers from the difficult task of devising useful features from very complex data. In machine learning, this task is known as *representation learning*.

### When should we use deep-learning models?

To benefit most from the capabilities of deep-learning models, we'll need to have a good idea of when to use deep-learning models and when to use classical machine-learning models. When we're deciding if we should use a deep-learning model, keep the following points in mind:

1. Deep-learning models are data hungry, and they perform well if they are trained on large amounts of data. The following figure gives an idea of how classical machine-learning models and deep-learning models scale with the amount of data.

![DS-why_dl.jpg](attachment:DS-why_dl.jpg)

2. The magic of deep-learning models is in their ability to discover complex patterns in the data. If our data lends itself well to manual feature engineering, then we should go with classical machine-learning models. However, if we work on very complex types of data, like image, video, speech, or text, then we should give deep-learning models a try.

3. Large deep-learning models that have many parameters to estimate require a lot of computational power and time. That's why people usually train them on graphical processing units (GPUs) instead of CPUs. If using GPUs is not possible for you, then you should go with classical machine-learning models.

---
# Architecture of artificial neural networks

Now let's look at the fundamental architecture of artificial neural networks (ANNs). This architecture is also known as a *multilayer perceptron* or *feedforward network*, and the ideas presented here constitute the backbone of most deep-learning models.

**Key Terms**

- **Multilayer perceptron:**
    Also called a feedforward network, an ANN architecture in which information moves in only one direction, with no loops or cycles between nodes
- **Neuron:**
    Also called a perceptron, a node that gets some inputs and returns an output in an ANN
- **Layer:**
    A collection of neurons that simultaneously take some features as inputs to process and then simultaneously output another set of features
- **Dense layer:**
    A type of hidden layer composed of multiple neurons, each of which receives inputs from all the neurons in the previous layer

This multilayer perceptron is a type of neural network that is relatively easy to understand conceptually. In many real-world tasks, however, other architecture types of neural networks are preferred over this one. 

---

# TensorFlow and Keras

**Key Terms**

- **Epoch:**
    A complete pass of the model through the training dataset

The TensorFlow framework was initiated and backed by Google, and then adapted and advanced by a huge open-source community. When we're coding deep-learning models, we'll be using *Keras*, which is the high-level API of TensorFlow. Keras abstracts away TensorFlow's low-level data structures with intuitive, easily integrated, and extensible structures. We'll implement an artificial neural network (ANN) using Keras.

**Note:** *Keras used to be a separate framework that could work on top of TensorFlow as well as some other deep-learning frameworks, like Theano and CNTK. However, the TensorFlow community has now integrated Keras as TensorFlow's official high-level abstraction. With TensorFlow 2.0, Keras is the official high-level API of TensorFlow*.

---
# Activation and loss functions

**Key Terms**

- **Activation function:**
    A function that defines a neuron's output, given an input or set of inputs
- **Cost function:**
    Also called the loss function, a function that measures the performance of a machine-learning model for given data by quantifying the error between predicted values and expected values

There are several loss functions. Using an appropriate loss function is necessary when implementing a deep-learning model. Because classification and regression tasks address different types of problems, they require different types of loss functions.

---
# Gradient descent and backpropagation algorithms

The gradient descent algorithm has several variants. It is related to the backpropagation algorithm, which provides a way to apply gradient descent in deep neural network architectures.

**Key Terms**

- **Backpropagation:**
    A technique for training neural networks that uses gradient desent to calculate the loss function at output and distribute it back through the neural network, resulting in adjusted weights for neurons

One very important aspect of training deep-learning models is answering the question: how do we train deep neural networks?

As we've already learned, training a neural network means estimating its parameters. In a modest-size neural network, we can have hundreds of thousands or even millions of parameters. So finding the best possible value for each of these parameters is a challenging task. The most common method—if not the only method—that is used for training neural networks is the gradient descent algorithm and its variants.