## **Task 5 - Two-Layer Perceptron**

Implementing a two-layer perceptron and training it to represent a given function $f(x)$, describing the Laplace distribution. This function is given by the formula:

\begin{equation}
    f(x) = \frac{1}{2b}e^{-\frac{|x-u|}{b}}
\end{equation}
gdzie:
- Range $x: [-8, 8]$
- Values of $\mu$ and $b$: $\mu = 0$ i $b = 1$

**Steps:**
1. Implement a two-layer perceptron to represent the function $f(x)$ for a given range of $x$  and the value of $\mu$ and $b$.
2. Check the quality of the approximation by calculating the Mean Squared Error (MSE) and Mean Absolute Error (MAE) between the actual values of the function and the values predicted by the network.
3. Create the graph of the real function and the function predicted by the network.
4. Investigate how the number of neurons in the hidden layer affects the quality of the approximation by changing its value and comparing the results.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
np.random.seed(0)

In [None]:
def laplace(x, mu=0, b=1):
    return 1/(2*b) * np.exp(-np.abs(x-mu)/b)

In [None]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_derivative(z):
    s = sigmoid(z)
    return s * (1 - s)

In [None]:
def visualize_function(x, y, title, xlabel='x', ylabel='f(x)'):
    plt.plot(x, y)
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.grid()
    plt.show()

In [None]:
x = np.linspace(-10, 10, 100)
y = laplace(x)

visualize_function(x, y, 'Laplace Distribution')

In [None]:
x = np.linspace(-10, 10, 100)
y = sigmoid(x)

visualize_function(x, y, 'Sigmoid Function')

$y_i$ - true value, $\hat{y}_i$ - predicted value, $n$ - number of samples

**Mean Squared Error (MSE)**:

$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$

**Mean Absolute Error (MAE)**:

$\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$


In [None]:
# dla jednego przykladu n=1
def mse_loss(y_true, y_pred):
    return (y_true - y_pred) ** 2

def mae_loss(y_true, y_pred):
    return np.abs(y_true - y_pred)

In [None]:
def mse_loss_derivative(y_true, y_pred):
    return 2 * (y_pred - y_true)

def mae_loss_derivative(y_true, y_pred):
    return np.where(y_pred > y_true, 1, -1)

The two-layer perceptron consists of three layers:
1. input layer - contains input features $x$.
2. hidden layer - contains neurons that process inputs and pass them to the output layer.
3. output layer - contains neurons that process data from the hidden layer and generate a result.

**Initialization of network parameters:**.
- The weights of the output neurons are zeroed.
- The weights of the hidden layer neurons are initialized randomly with a distribution $~U(-1/\sqrt{we}, 1/\sqrt{we})$, where $we$ is the number of inputs to the neuron.

**Forward propagation:**
\begin{equation}
    h = \sigma(W_1 \cdot x + b_1)
\end{equation}

\begin{equation}
    \hat{y} = W_2 \cdot h + b_2
\end{equation}


gdzie:
- $W_1$ - input layer weights
- $b_1$ - input layer bias
- $\sigma$ - hidden layer activation function
- $W_2$ - output layer weights
- $b_2$ - output layer bias


**Backward propagation:**
The weights are updated based on the error made by the network. The error is calculated based on the difference between the network's actual and predicted values (MSE or MAE). The weights are then updated in the direction opposite to the gradient of the cost function. Subsequent weights are updated according to the rule:

\begin{equation}
    W = W - \alpha \frac{\partial L}{\partial W}
\end{equation}

\begin{equation}
    b = b - \alpha \frac{\partial L}{\partial b}
\end{equation}


where:
- $L$ - cost function
- $\alpha$ - learning rate


In [None]:
class Perceptron:
    def __init__(self, size_in, size_hidden, size_out):
        self.W1, self.W2, self.b1, self.b2 = self.initialize_parameters(size_in, size_hidden, size_out)

    def initialize_parameters(self, size_in, size_hidden, size_out):
        # initialize weights using Uniform distribution
        uniform_range = 1/np.sqrt(size_in)
        W1 = np.random.uniform(-uniform_range, uniform_range, (size_hidden, size_in))
        W2 = np.random.uniform(-uniform_range, uniform_range, (size_out, size_hidden))
        b1 = np.zeros((size_hidden, 1))
        b2 = np.zeros((size_out, 1))
        return W1, W2, b1, b2

    def forward(self, x):
        z1 = np.dot(self.W1, x) + self.b1
        a1 = sigmoid(z1)
        z2 = np.dot(self.W2, a1) + self.b2
        return z1, a1, z2

    def backpropagation(self, x, y, z1, a1, z2):
        dL_dz2 = mse_loss_derivative(y, z2)

        dL_dW2 = np.dot(dL_dz2, a1.T)
        dL_db2 = dL_dz2

        dL_da1 = np.dot(self.W2.T, dL_dz2)
        dL_dz1 = dL_da1 * sigmoid_derivative(z1)
        dL_dW1 = np.dot(dL_dz1, x.T)
        dL_db1 = dL_dz1

        return dL_dW1, dL_db1, dL_dW2, dL_db2

    def update_parameters(self, dW1, db1, dW2, db2, learning_rate):
        self.W1 -= learning_rate * dW1
        self.b1 -= learning_rate * db1
        self.W2 -= learning_rate * dW2
        self.b2 -= learning_rate * db2

    def shuffle(self, x, y):
        indices = np.arange(x.shape[1])
        np.random.shuffle(indices)
        return x[:, indices], y[:, indices]

    def train(self, x, y, epochs, learning_rate):
        for i in range(epochs):
            epoch_loss = 0

            # shuffle samples
            x, y = self.shuffle(x, y)

            for j in range(x.shape[1]):
                x_sample = x[:, j].reshape(-1, 1)
                y_sample = y[:, j].reshape(-1, 1)
                z1, a1, z2 = self.forward(x_sample)
                dW1, db1, dW2, db2 = self.backpropagation(x_sample, y_sample, z1, a1, z2)
                self.update_parameters(dW1, db1, dW2, db2, learning_rate)

                epoch_loss += mse_loss(y_sample, z2)

            if i % 10 == 0:
                print(f'Epoch {i}, loss: {epoch_loss / x.shape[1]}')

    def predict(self, x):
        _, _, z2 = self.forward(x)
        return z2

In [None]:
mu = 0
b = 1
size_in = 1
size_hidden = 748
size_out = 1

# generate data
x_all = np.linspace(-8, 8, 10_000).reshape(1, -1)
y_all = laplace(x_all, mu, b)

# shuffle data
indices = np.arange(x_all.shape[1])
np.random.shuffle(indices)
x_all_shuffled = x_all[:, indices]
y_all_shuffled = y_all[:, indices]

# split to train and test set
x_train = x_all_shuffled[:, :800]
y_train = y_all_shuffled[:, :800]
x_test = x_all_shuffled[:, 800:]
y_test = y_all_shuffled[:, 800:]

x = x_train.reshape(1, -1)
y = y_train.reshape(1, -1)

model = Perceptron(size_in, size_hidden, size_out)
model.train(x, y, epochs=100, learning_rate=0.001)

In [None]:
def visualize_predictions(x, y_true, y_pred, title, xlabel='x', ylabel='f(x)'):
    plt.plot(x, y_true, label='True')
    plt.plot(x, y_pred, label='Predicted')
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.legend()
    plt.grid()
    plt.show()

In [None]:
y_preds = []

for x_sample, y_sample in zip(x_test[0], y_test[0]):
    x_sample = np.array([x_sample]).reshape(1, 1)
    y_preds.append(model.predict(x_sample)[0][0])

In [None]:
sorted_indices = np.argsort(x_test[0])
x_test_sorted = x_test[0][sorted_indices]
y_test_sorted = y_test[0][sorted_indices]
y_test_preds = np.array(y_preds)[sorted_indices]

In [None]:
visualize_predictions(x_test_sorted, y_test_sorted, y_test_preds, 'Predictions on Test Set')