<h2 style="text-align:center;color:#0F4C81;"
>Introduction to Deep Learning</h2>

<h3 style="color:#88B04B">What is Deep Learning?</h3>

Deep Learning is a branch of **Machine Learning (ML)** that uses **Artificial Neural Networks** to automatically learn patterns and features from data. Unlike traditional ML, which often requires manual feature engineering, deep learning models learn directly from raw data, such as images, text, and audio.

These models are called **"deep"** because they have multiple layers of interconnected "neurons" that progressively extract higher-level features from the input. Deep learning powers many modern AI systems, including **image recognition**, **speech-to-text**, **chatbots**, and **autonomous vehicles**.

In simple terms, deep learning allows machines to "learn" the way humans do — by analyzing large amounts of data and finding patterns — but at a much larger scale.

ANNs (Artificial Neural Networks) are at the very core of Deep Learning. They are versatile, powerful, and scalable, making them ideal to tackle large and highly complex Machine Learning tasks such as classifying billions of images (e.g., Google Images), powering speech recognition services (e.g., Apple’s Siri), recommending the best videos to watch to hundreds of millions of users every day (e.g., YouTube), or learning to beat the world champion at the game of Go (DeepMind’s AlphaGo).

<h3 style="color:#88B04B;">Biological Neurons</h3>

Before we discuss artificial neurons, let’s take a quick look at a biological neuron. It is an unusual-looking cell mostly found in animal brains. It’s composed of a _cell body_ containing the nucleus and most of the cell’s complex components, many branching extensions called _dendrites_, plus one very long extension called the _axon_. The axon’s length may be just a few times longer than the cell body, or up to tens of thousands of times longer. Near its extremity the axon splits off into many branches called _telodendria_, and at the tip of these branches are minuscule structures called _synaptic terminals_ (or simply _synapses_), which are connected to the dendrites or cell bodies of other neurons. Biological neurons produce short electrical impulses called _action potentials_ (APs, or just _signals_) which travel along the axons and make the synapses release chemical signals called _neurotransmitters_. When a neuron receives a sufficient amount of these neurotransmitters within a few milliseconds, it fires its own electrical impulses (actually, it depends on the
neurotransmitters, as some of them inhibit the neuron from firing).

<div style="display:flex;justify-content:center;">
<img src="images/biological-neuron-1.png" style="width:500px;object-fit:cover;" />
<div>

Thus, individual biological neurons seem to behave in a rather simple way,
but they are organized in a vast network of billions, with each neuron
typically connected to thousands of other neurons. Highly complex
computations can be performed by a network of fairly simple neurons,
much like a complex anthill can emerge from the combined efforts of
simple ants. The architecture of biological neural networks (BNNs) is still
the subject of active research, but some parts of the brain have been
mapped, and it seems that neurons are often organized in consecutive
layers, especially in the cerebral cortex (i.e., the outer layer of your brain).

<div style="display:flex;justify-content:center;">
<img src="images/biological-neuron-2.png" style="width:600px;object-fit:cover;" />
<div>


<h3 style="color:#88B04B;">The Perceptron</h3>

The _Perceptron_ is one of the simplest ANN architectures, invented in 1957
by Frank Rosenblatt. It is based on a slightly different artificial neuron called a _threshold logic unit_ (TLU), or sometimes a _linear threshold unit_ (LTU). The inputs and output are numbers (instead of binary on/off values), and each input connection is associated with a weight. The TLU first computes a linear function of its inputs: $z = w_1x_1 + w_2x_2 + \dots + w_nx_n + b = w^Tx + b$. Then it applies a _step function_ to the result: $h_w(x) = \text{step}(z)$. So it’s almost like Logistic Regression, except it uses a step function instead of the logistic function. Just like in Logistic Regression, the model parameters are the input weights $w$ and the bias term $b$.
<div style="display:flex;justify-content:center;">
<img src="images/threshold_logic_unit.png" style="width:500px;object-fit:cover;" />
<div>

The most common step function used in Perceptrons is the _Heaviside step function_. Sometimes the _sign_ function is used instead.

$$
\text{heaviside}(z) = \begin{cases} 0 \text{ if } z \lt  0 \\
                                    1 \text{ if } z \ge 0
\end{cases}
$$

$$
\text{sgn}(z) = \begin{cases} -1 \text{ if } z \lt  0 \\
                               0 \text{ if } z = 0 \\
                               1 \text{ if } z \gt 0
\end{cases}
$$

A single TLU can be used for simple linear binary classification. It computes a linear function of its inputs, and if the result exceeds a threshold, it outputs the positive class. Otherwise it outputs the negative class. This may remind you of Logistic Regression or linear SVM classification. You could, for example, use a single TLU to classify iris flowers based on petal length and width. Training such a TLU would require finding the right values for $w_1$, $w_2$ and $b$ (the training algorithm is discussed shortly).

A Perceptron is composed of one or more TLUs organized in a single layer,
where every TLU is connected to every input. Such a layer is called a _fully connected layer_, or a _dense_ layer. The inputs constitute the _input layer_. And since the layer of TLUs produces the final outputs, it is called the _output layer_.

<div style="display:flex;justify-content:center;">
<img src="images/perceptron_1.png" style="width:500px;object-fit:cover;" />
<div>

This Perceptron can classify instances simultaneously into three different
binary classes, which makes it a multilabel classifier. It may also be used
for multiclass classification.

In [None]:
<div style="display:flex;justify-content:center;">
<img src="images" style="width:500px;object-fit:cover;" />
<div>

In [4]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

iris = load_iris(as_frame=True)
X = iris.data[['petal length (cm)', 'petal width (cm)']].values
y = (iris.target == 0) # Iris setosa

per_clf = Perceptron(random_state=42)
per_clf.fit(X, y)

X_new = [[2, 0.5], [3, 1]]
y_pred = per_clf.predict(X_new)

In [5]:
y_pred

array([ True, False])