# Introduction the NN

In essence: For each epoch, for each training instance the backpropagation algorithm first makes a prediction (forward pass), measures the error, then goes through each layer in reverse to measure the error contribution from each connection (reverse pass), and finally slightly tweaks the connection weights to reduce the error (Gradient Descent step).


<img src="../img/derivatives_activation.png" width="80%">

*Derivatives: rate of change*


**Key Differences Between Perceptron and Neuron**
| Feature               | Perceptron                       | Neuron in MLP                   |
|-----------------------|-----------------------------------|----------------------------------|
| **Activation Function** | Step function                   | Nonlinear (e.g., ReLU, sigmoid) |
| **Output**            | Binary (0 or 1)                  | Continuous or nonlinear values  |
| **Usage**             | Single-layer models (linear tasks) | Multilayer networks (nonlinear tasks) |
| **Problem Solving**   | Only linear separability         | Handles nonlinear problems      |


## **Regression MLPs**

| Hyperparameter        | Typical Value                                                                 |
|-----------------------|------------------------------------------------------------------------------|
| # input neurons       | One per input feature (e.g., 28 x 28 = 784 for MNIST)                       |
| # hidden layers       | Depends on the problem. Typically 1 to 5.                                   |
| # neurons per hidden layer | Depends on the problem. Typically 10 to 100.                            |
| # output neurons      | 1 per prediction dimension (if you expect 2 values, then 2 outputs)                                                  |
| Hidden activation     | ReLU (or SELU)                                              |
| Output activation     | None or ReLU/Softplus (if positive outputs) or Logistic (0 to 1)/Tanh (hyperbolic tangent)(-1 to 1) (if bounded outputs) |
| Loss function         | MSE or MAE/Huber (if outliers)                                              |




In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split(
    housing.data, housing.target, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_full, y_train_full, random_state=42)

mlp_reg = MLPRegressor(hidden_layer_sizes=[50, 50, 50], random_state=42)
pipeline = make_pipeline(StandardScaler(), mlp_reg)
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_valid)
rmse = mean_squared_error(y_valid, y_pred, squared=False)

print(rmse)

## **Classification MLPs**

| Hyperparameter            | Binary classification | Multilabel binary classification | Multiclass classification |
|---------------------------|-----------------------|-----------------------------------|---------------------------|
| Input and hidden layers   | Same as regression   | Same as regression               | Same as regression       |
| # output neurons          | 1                   | 1 per label                      | 1 per class              |
| Output layer activation   | Logistic            | Logistic                          | Softmax                  |
| Loss functino   | Cross-entropy            | Cross-entropy                         | Cross-entropy                  |


In [19]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

iris = load_iris()
X_train_full, X_test, y_train_full, y_test = train_test_split(
    iris.data, iris.target, test_size=0.1, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_full, y_train_full, test_size=0.1, random_state=42)

mlp_clf = MLPClassifier(hidden_layer_sizes=[5], max_iter=10_000,
                        random_state=42)
pipeline = make_pipeline(StandardScaler(), mlp_clf)
pipeline.fit(X_train, y_train)
accuracy = pipeline.score(X_valid, y_valid)
accuracy

1.0

## **Simple image classifier with Keras**

In [3]:
import tensorflow as tf
from tensorflow import keras

tf.__version__
keras.__version__


'3.6.0'

In [29]:
import gzip
import numpy as np


# the fashion ds has 10 categories, like mnist, but with clothes

def load_idx(filepath):
    """Load IDX format data from a gzip file."""
    with gzip.open(filepath, 'rb') as f:
        # Read the file content
        data = f.read()
        # Magic number (first 4 bytes)
        magic = int.from_bytes(data[0:4], byteorder='big')
        # Number of items (next 4 bytes)
        num_items = int.from_bytes(data[4:8], byteorder='big')
        if magic == 2049:  # Labels
            return np.frombuffer(data[8:], dtype=np.uint8)
        elif magic == 2051:  # Images
            rows = int.from_bytes(data[8:12], byteorder='big')
            cols = int.from_bytes(data[12:16], byteorder='big')
            images = np.frombuffer(data[16:], dtype=np.uint8)
            return images.reshape(num_items, rows, cols)
        else:
            raise ValueError("Unknown magic number in file header!")

# Paths to the data files
train_images_path = '../data/train-images-idx3-ubyte.gz'
train_labels_path = '../data/train-labels-idx1-ubyte.gz'
test_images_path = '../data/t10k-images-idx3-ubyte.gz'
test_labels_path = '../data/t10k-labels-idx1-ubyte.gz'

# Load the data
X_train_full = load_idx(train_images_path)
y_train_full = load_idx(train_labels_path)
X_test = load_idx(test_images_path)
y_test = load_idx(test_labels_path)

# Normalize the image data
# we also need to scale the data, for simplicity we will scale the pixel intensities in the 0-1 range simply dividing them by 255

X_train_full = X_train_full / 255.0
X_test = X_test / 255.0

print("Train images shape:", X_train_full.shape)
print("Train labels shape:", y_train_full.shape)
print("Test images shape:", X_test.shape)
print("Test labels shape:", y_test.shape)


Train images shape: (60000, 28, 28)
Train labels shape: (60000,)
Test images shape: (10000, 28, 28)
Test labels shape: (10000,)


In [33]:
# here we will create a validation set

X_valid, X_train = X_train_full[:5000], X_train_full[5000:]
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]



In [37]:
# create class names
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

In [41]:
# sample of class for a given item
class_names[y_train[10]]

#y_train[10]

6

### My first Keras sequential model

In [43]:
model = keras.models.Sequential([

    keras.layers.Flatten(input_shape=[28,28]), # receives input data and converts it to 1d array
    keras.layers.Dense(300, activation="relu"), # dense layer with 300 neurons
    keras.layers.Dense(100, activation="relu"), # second dense layer with 100 neurons
    keras.layers.Dense(10, activation="softmax") #output layer, with 10 nodes, as 10 classes, softmax because multiclass
])

  super().__init__(**kwargs)


In [44]:
# get info about the model including output shape of each layer and the number of parameters
model.summary()

In [45]:
# you can get all the layers of a model
model.layers

[<Flatten name=flatten, built=True>,
 <Dense name=dense, built=True>,
 <Dense name=dense_1, built=True>,
 <Dense name=dense_2, built=True>]

In [46]:
# you can call each layer by its index
model.layers[1].name

'dense'

In [47]:
# you can het the layer by its name
model.get_layer("dense_2").name

'dense_2'

In [52]:
# you can get all the weights and biases from each layer
weights, biases = model.layers[1].get_weights()

weights

array([[ 0.02308173,  0.02321342,  0.01780492, ..., -0.01716774,
         0.01247725,  0.0568724 ],
       [ 0.05130477,  0.04607778, -0.04728509, ...,  0.00962491,
        -0.01858232,  0.03471071],
       [ 0.02859914, -0.06492709,  0.02310194, ..., -0.02200221,
        -0.07220224, -0.0112334 ],
       ...,
       [-0.07249278,  0.02996396,  0.02114277, ...,  0.05384013,
        -0.03104062, -0.02192979],
       [-0.02263604,  0.06190917,  0.00712591, ...,  0.02354175,
        -0.00246117, -0.00628362],
       [ 0.02515146,  0.02241144, -0.04031318, ..., -0.07111613,
        -0.07353047,  0.05182727]], dtype=float32)

In [50]:
keras.utils.plot_model(model)

You must install pydot (`pip install pydot`) for `plot_model` to work.
