# Why Deep Learning?

## Preamble

In [None]:
import data_science_learning_paths
data_science_learning_paths.setup_plot_style(dark=True)

## Features make machine learning possible

Remember the **Iris** dataset? The measurements of the flower's petal and sepal dimensions are a good example of features that allow separation of observations into classes.

![](graphics/iris-data.png)

_Image credits: Principal Component Analysis by Sebastian Raschka_


In [None]:
iris_data = data_science_learning_paths.datasets.read_iris()

In [None]:
iris_data.head()

In [None]:
import seaborn

In [None]:
seaborn.pairplot(
    iris_data,
    vars=["sepal length (cm)", "sepal width (cm)", "petal length (cm)", "petal width (cm)"],
    hue="species"
)

## Feature Engineering


> Coming up with features is difficult, time-consuming, requires expert knowledge. "Applied machine learning" is basically feature engineering.

— Andrew Ng, ML researcher, [Machine Learning and AI via Brain simulations](https://forum.stanford.edu/events/2011/2011slides/plenary/2011plenaryNg.pdf)

Good features are not always so easily accessible - they may hide in the messy raw data. **Feature engineering is the process of mining the data for good features.** It requires an understanding the data to create features that make machine learning algorithms work.



### Example: Feature Engineering for Titanic Survival Model

In [None]:
import pandas

In [None]:
data_path = "../.assets/data/titanic/titanic.csv"

In [None]:
titanic_data = pandas.read_csv(data_path)
titanic_data.head()

In [None]:
!cat ../.assets/data/titanic/titanic-documentation.txt

#### Feature Idea: Cabins and Decks

This is an example of using some more domain knowledge to come up with a feature: We notice that we have a cabin number for some of the passengers on the list:

In [None]:
titanic_data["Cabin"].unique()

Do you notice a pattern? The cabin number starts with a letter. What could that mean? Let's have a look at the blueprints:

![](https://upload.wikimedia.org/wikipedia/commons/thumb/8/84/Titanic_cutaway_diagram.png/515px-Titanic_cutaway_diagram.png) 
*cutaway diagram of the RNS Titanic - Source: Wikimedia Commons*

As we see, the letters refer to the ship's decks, from top to bottom. This gives us a promising hypothesis: Perhaps your chances of survival in the disaster depend on the placement of your cabin in the ship. We should test this hypothesis by deriving the deck as a new feature from the cabin number and providing it to our model.

#### Exercise: Cabins and Decks

Derive the deck as a feature from the cabin number!

In [None]:
# TODO: your code here



## High-dimensional Input Data: Where are the Features?

Machine learning on tabular data is the most common use case in industry data science projects. But what about machine learning on other types of data - time series, images, audio, video? They have one thing in common: Each item is a very high-dimensional data point. 

Imagine trying to classify objects in a photo on the basis of the raw measurements - the RGB values of each pixel. Surely we need some clever **featurization** before we can think about tackling this problem with ML.

![](https://www.codeproject.com/KB/cpp/1196024/classification_cat_dog.JPG)

###  Example: Handwriting Recognition

The [MNIST](https://en.m.wikipedia.org/wiki/MNIST_database) dataset is a famous benchmark for handwriting recognition performance. The task is to recognize a handwritten digit from a small grayscale image, i.e. classification with 10 classes.

In [None]:
from tensorflow import keras
import numpy

In [None]:
(X_train, y_train),(X_test, y_test) = keras.datasets.mnist.load_data()

In [None]:
X_train.shape

In [None]:
numpy.unique(y_train)

In [None]:
n_classes = 10

In [None]:
import matplotlib.pyplot as plt

In [None]:
for i in range(5):
    plt.figure(figsize=(2,2))
    plt.imshow(X_train[i], cmap="binary")
    plt.title(y_train[i])
    plt.show()

### A (not yet deep) Network for Handwriting Recognition

As a preprocessing step before training, the grayscale pixel data is scaled to the interval $[0,1]$:

In [None]:
X_train, X_test = X_train / 255.0, X_test / 255.0


Here is the complete workflow for neural network construction, training and classifying:

In [None]:
net = keras.models.Sequential([
  keras.layers.Flatten(),
  keras.layers.Dense(128, activation="relu"),
  keras.layers.Dropout(0.2),
  keras.layers.Dense(n_classes, activation="softmax")
])
net.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)
net.fit(
    X_train, 
    y_train, 
    epochs=10, 
)
y_pred = numpy.argmax(net.predict(X_test), axis=-1)

If you prefer a graphical summary - this one was made with the [Netron](https://github.com/lutzroeder/netron) application:

![](graphics/MNIST_classifier.svg)

In [None]:
for i in range(10):
    plt.figure(figsize=(2,2))
    plt.imshow(X_test[i], cmap="binary")
    plt.title(f"pred: {y_pred[i]} actual: {y_test[i]}")

## Deep Learning = Automated Feature Engineering

A neural network can **learn to extract relevant features from complex inputs**. _Adding more layers_ can enable it to perform this task better - this is _putting the "deep" in "deep learning"_.  Each layer in the front part of the network acts as a _feature extractor_.


![](graphics/convnet_architecture.jpg)

_[Source](https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53)_ 

![](graphics/cnn_feature_viz.jpg)

[_Source_](https://www.researchgate.net/publication/319622441_DeepFeat_A_Bottom_Up_and_Top_Down_Saliency_Model_Based_on_Deep_Features_of_Convolutional_Neural_Nets/figures?lo=1)

---
_This notebook is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright © 2018-2025 [Point 8 GmbH](https://point-8.de)_