# Deep Learning 

Deep learning (DL) is a branch of machine learning (ML) that employs algorithms inspired by the neural network of the brain which is called an artificial neural network (ANN). DL became more popular over time because it turned out to be more capable of processing raw data while ML requires some preprocessing steps to construct a viable model. But what is machine learning then?

### Machine Learning

Machine learning is programming computers to perform a task based on experience (examples) without giving explicit instructions. During this process, we should try to minimize the error so that the developed model can perform well.

In a more mathematical way, we want to learn a function (model) with its parameters that produces the right output for a given input:

f<sub>$\theta$</sub>(x) = y

argmin<sub>$\theta$</sub> $\epsilon$(f<sub>$\theta$</sub>(x))

#### How it is different than traditional programming?

In traditional programming, systems are coded to react based on specific instructions. However, it is not the case in machine learning. The concept of machine learning or artificial intelligence advocates that giving specific instructions for each distinct situation would not be possible. Instead, the program or system should be able to learn by the provided data (examples) itself and react based on this knowledge which is a learned model function. 

#### How it is different than the statistics?
Both of them aim to make predictions of natural phenomena but there are some differences. Statistics help humans understand the world by assuming that data that occurs in the universe is understandable by humans. Machine learning, on the other hand, has no such specific effort or purpose to explain how the world works besides assuming that the data generation process is unknown.

#### Types of machine learning

* Supervised Learning: learning a model f<sub>$\theta$</sub>(x) from labeled data (X,y) : Given a new input X, predict the right output y.
Example: Given examples of different houses and apartments' attributes (X), predict the price (y) of the unseen house or apartment.

* Unsupervised Learning: learning a model f<sub>$\theta$</sub>(x) from unlabeled data (X) to explore the structure of the data and extract meaningful information.
Example: Given inputs X, find which ones are special, similar, anomalous.

* Semi-Supervised Learning: learning a model f<sub>$\theta$</sub>(x) from a few labeled and many unlabeled data.

* Reinforcement Learning: creating an agent that improves its performance over time as a result of interactions with the environment.

## Artificial Neural Networks

An artificial neural network (ANN) is a computing system consists of a collection of connected components known as *neurons* that are grouped into a structure called *layer*. Each neuron-to-neuron connection sends a signal from one neuron to the next. The signal is processed by the receiving neuron, which then sends messages to downstream neurons in the network. Note that neurons are also known as *nodes*.

A very basic ANN architecture is nothing but logistic regression which is the fundamental algorithm of the ML. If you recall what logistics regression does to make the classification:

* it creates a linear model using the dot product of input vector **x** and weight vector **w** and adds up a bias term *w<sub>0</sub>*.
* then the linear model *w<sub>0</sub>* + **wx** push trough to a sigmoid function to get a prediction between 0 and 1.
* as the last step, it uses a log loss a.k.a. cross-entropy as a loss function, and gradient descent as an optimizer to learn the weights.

Logistic Regression in mathematical notation:

y(x)= sigmoid(*w<sub>0</sub>* + **wx**) = (*w<sub>0</sub>* + *w<sub>1</sub>**x<sub>1</sub>* + *w<sub>2</sub>**x<sub>3</sub>* + ... + *w<sub>p</sub>**x<sub>p</sub>*) where p is the number of features.

Logistic Regression in schematic representation:

<br>
<br>

(schema)

<br>
<br>


When the schematic representation is checked closely, one can see that logistic regression is nothing more than 1-layer neural network. It is actually the building block of the ANNs! Large ANN architectures can be easily created if the logistic regression blocks are combined together. In other words, **Artificial neural networks are nothing but a combination of logistic regression blocks.**

To call it ANN, adding one or more layers that are named *hidden layers* is enough. Weights between nodes in different layers form a weight matrix W<sup>ℓ</sup> per layer ℓ.

And each layer is constructed in a 2-step process:

1. the dot product (z) of weight matrix W<sup>ℓ</sup> and input matrix **x**  is calculated.  
2. Z is passed through the non-linear activation function such as the sigmoid function and the output **a** is called activation.

a<sup>0</sup> = **x**

a<sup>1</sup> = f(z) = f(W<sup>1</sup>**x** + w<sub>0</sub><sup>1</sup>)

a<sup>2</sup> = f(z) = f(W<sup>1</sup>a<sup>1</sup> + w<sub>0</sub><sup>1</sup>)

where f is the activation function and a<sup>0</sup> is just our input data. Of course depending on how many input and output nodes, we will end up with different numbers of weight matrices and activations. 

In a typical ANN, it is expected to have more layers. More layers mean having more flexibility but difficulty to train which can require lots of data and time, depending on the computational power. 

Moreover, having more layers in the neural network makes it trickier to train since once you push data to the network, the model should do a backpropagation to update its weights and, the more layers the model have the more signal of the backpropagation will get lost throughout the layers.
