# Deep Learning
Deep learning is the process of applying deep neural network technologies to solve problems. Deep neural networks are neural networks with at least one hidden layer. Like data mining, deep learning refers to a process, which employs deep neural network architectures, which are particular types of machine learning algorithms.

<img src="https://www.kdnuggets.com/wp-content/uploads/data-science-puzzle-600.jpg">
As shown in the image above, deep learning is to data mining as (deep) neural networks are to machine learning (process versus architecture). Also visible is the fact that deep neural networks are heavily involved in contemporary artificial intelligence.

<img src="https://www.kdnuggets.com/wp-content/uploads/neural-networks-layers.jpg">

## Lets understand some terminologies

### Artificial Neural Networks (ANNs)

The machine learning architecture is originally inspired by the biological brain (particularly the neuron) by which deep learning is carried out. Artificial neural networks alone (the non-deep variety) have been around for a very long time. However, the recently devised neural network architectures included layers of hidden neurons (beyond simply the input and output layers), and this added level of complexity is what enables deep learning, and provides a more powerful set of problem-solving tools.  
ANNs actually vary in their architectures quite considerably, and therefore there is no definitive neural network definition. The 2 generally-cited characteristics of all ANNs are the possession of **adaptive weight sets**, and **the capability of approximating non-linear functions of the inputs to neurons**.

### Feedforward Neural Network

Feedforward neural networks are the simplest form of neural network architecture, in which connections are non-cyclical. Information in a feedforward network advances in a single direction from the input nodes, though any hidden layers, to the output nodes - no cycles are present. 

### Recurrent Neural Network

In contrast to the feedforward neural networks, the connections of recurrent neural networks form a directed cycle. This bidirectional flow allows for internal temporal state representation, which, in turn, allows sequence processing, and, of note, provides the necessary capabilities for recognizing speech and handwriting.

### Activation Function

In neural networks, the activation function produces the output decision boundaries by combining the network's weighted inputs. Activation functions range from identity (linear) to sigmoid (logistic, or soft step) to hyperbolic (tangent) and beyond. In order to employ backpropagation (see below), the network must utilize activation functions which are differentiable.
<img src="https://www.kdnuggets.com/wp-content/uploads/activation-functions.jpg">

### Backpropagation

Back prop is just gradient descent on individual errors. You compare the predictions of the neural network with the desired output and then compute the gradient of the errors with respect to the weights of the neural network. This gives you a direction in the parameter weight space in which the error would become smaller. 

### Cost Function

When training a neural network, the correctness of the network's output must be assessed. As we know the expected correct output of training data, the output of training can be compared. The cost function measures the difference between actual and training outputs. A cost of zero between the actual and expected outputs would signify that the network has been training as would be possible; this would clearly be ideal.

### Gradient Descent

Gradient descent is an optimization algorithm used for finding local minima of functions. While it does not guarantee a global minimum, gradient descent is especially useful for functions which are difficult to solve analytically for precise solutions, such as setting derivatives to zero and solving.
<img src="https://www.kdnuggets.com/wp-content/uploads/gradient-descent-step.jpg">

### Convolutional Neural Network

Typically associated with computer vision and image recogntion, Convolutional Neural Networks (CNNs) employ the mathematical concept of convolution to mimic the neural connectivity mesh of the biological visual cortex.  
First, convolution, can be thought of as a sliding window over top a matrix representation of an image (see below). This allows for the loose mimicking of the overlapping tiling of the biological visual field.
<img src="http://deeplearning.stanford.edu/wiki/images/6/6c/Convolution_schematic.gif">
Implementation of this concept in the architecture of the neural network results in collections of neurons dedicated to processing image sections, at least when employed in computer vision. When utilized in some other domain, such as natural language processing, the same approach can be used, given that input (words, sentences, etc.) could be arranged in matrices and processed in similar fashion.  

<a href="https://www.kdnuggets.com/2016/10/deep-learning-key-terms-explained.html">source</a>