# Artificial Neural Network

Traditionally, systems were made intelligent by using sophisticated algorithms
written by programmers.
For example, say you are interested in recognizing whether a photo contains a dog or
not. In the traditional Machine Learning (ML) setting, an ML practitioner or a subject
matter expert first identifies the features that need to be extracted from images. Then
they extract those features and pass them through a well-written algorithm that
deciphers the given features to tell whether the image is of a dog or
not. The following diagram illustrates the same idea:

![img](./imgs/ANN0.png)

Neural networks provide the unique benefit of combining feature extraction (hand-
tuning) and use those features for classification/regression in a single shot with little
manual feature engineering. Both these subtasks only require labeled data (for
example, which pictures are dogs and which pictures are not dogs) and neural
network architecture. It does not require a human to come up with rules to classify an
image, which takes away the majority of the burden traditional techniques impose on
the programmer.

Notice that the main requirement is that we provide a considerable amount of
examples for the task that needs a solution. A high-level view of how neural networks are leveraged for the task of
classification is as follows:

![img](./imgs/ANN1.png)

## The artificial neural network building blocks

An ANN is a collection of tensors (weights) and mathematical operations, arranged in
such a way to loosely replicate the functioning of a human brain. It can be viewed as a
mathematical function that takes in one or more tensors as inputs and predicts one or
more tensors as outputs. The arrangement of operations that connects these inputs to
outputs is referred to as the architecture of the neural network – which we can
customize based on the task at hand, that is, based on whether the problem contains
structured (tabular) or unstructured (image, text, audio) data (which is the list of
input and output tensors).

An ANN is made up of the following:

- <b>Input layers: </b> These layers take the independent variables as input.

- <b> Hidden (intermediate) layers: </b> These layers connect the input and output
layers while performing transformations on top of input data. Furthermore,
the hidden layers contain nodes (units/circles in the following diagram) to
modify their input values into higher-/lower-dimensional values. The
functionality to achieve a more complex representation is achieved by
using various activation functions that modify the values of the nodes of
intermediate layers.

- <b> Output layer: </b> This contains the values the input variables are expected to
result in.

With this in mind, the typical structure of a neural network is as follows:

![img](./imgs/ANN2.png)

The number of <b> nodes </b> (circles in the preceding diagram) in the output layer depends
on the task at hand and whether we are trying to predict a continuous variable or a
categorical variable. If the output is a continuous variable, the output has one node. If
the output is categorical with <i> m </i> possible classes, there will be <i> m </i> nodes in the output
layer. Let's zoom into one of the nodes/neurons and see what's happening. A neuron
transforms its inputs as follows:

![img](./imgs/ANN3.png)

In the preceding diagram, $x_1$ ,$x_2$ , ..., $x_n$ are the input variables, and $w_0$ is the bias term
(similar to the way we have a bias in linear/logistic regression).
Note that $w_1$ ,$w_2$ , ..., $w_n$ are the weights given to each of the input variables and $w_0$ is the
bias term. The output value a is calculated as follows:

![img](./imgs/ANN4.png)

As you can see, it is the sum of the products of <b> weight and input </b> pairs followed by an
additional function <b> f </b> (the bias term + sum of products). The function <b> f </b> is the activation
function that is used to apply non-linearity on top of this sum of products. More
details on the activation functions will be provided in the next section, on
feedforward propagation. Further, higher nonlinearity can be achieved by having
more than one hidden layer, stacking multitudes of neurons.

At a high level, a neural network is a collection of nodes where each node has an
adjustable float value and the nodes are interconnected as a graph to return outputs
in a format that is dictated by the architecture of the network. The network constitutes
three main parts: the input layer, the hidden layer(s), and the output layer. Note that
you can have a higher <b> number (n) </b> of hidden layers, with the term <b> deep </b> learning
referring to the greater number of hidden layers. Typically, more hidden layers are
needed when the neural network has to comprehend something complicated such as
image recognition.