<hr>
<h1 align="center">
  Introduction to Neural Networks
</h1>
<hr>


<h2>Mathematical Formalization</h2>

<h3>Dataset</h3>

1. Purpose of training, validation, and test sets:
   - Training: Used to train the model by updating weights.
   - Validation: Used to tune hyperparameters and avoid overfitting.
   - Test: Used to evaluate the final performance of the model.


2.  Influence of the number of examples (N):
    - A larger N generally improves the generalization of the model by providing more data to learn patterns.
    - Smaller N can lead to overfitting and poor generalization.


<h3>Network Architecture (Forward Phase)</h3>

3. Importance of activation functions:
   - They introduce non-linearity, enabling the network to learn complex patterns.


4.  Sizes (nx, nh, ny) in Figure 1:
    - nx: Input size, nh: Hidden layer size, ny: Output size.
    - These sizes depend on the data features and problem requirements.


5.  ${y}$ vs. $\hat{y}$:
    - ${y}$: Ground truth label.
    - $\hat{y}$: Model prediction. Difference between them is captured by the loss function.


6.  SoftMax usage:
    - Converts raw outputs into probabilities summing to 1, useful for classification.


7.  Forward equations:
    - $\tilde{h}$ = ${W_h}$ $\cdot$ ${x}$ + ${b_h}$
    - ${h}$ = $\tanh$($\tilde{h}$)
    - $\tilde{y}$ = ${W_y}$ $\cdot$ ${h}$ + ${b_y}$
    - $\hat{y}$ = SoftMax($\tilde{y}$)


<h3>Cost Function<h3>

<h3>Learning Method</h3>

<h2>Implementation</h2>

In [65]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

In [68]:
def init_params(nx, nh, ny):
    params = {
        "Wh": torch.randn(nh, nx) * 0.3,
        "bh": torch.zeros(nh),
        "Wy": torch.randn(ny, nh) * 0.3,
        "by": torch.zeros(ny),
    }
    for key in params:
        params[key].requires_grad = True

    return params

In [70]:
def forward(params, X):
    h_pre = torch.mm(X, params["Wh"].t()) + params["bh"]
    h = torch.tanh(h_pre)
    y_pre = torch.mm(h, params["Wy"].t()) + params["by"]
    y_exp = torch.exp(y_pre)
    y_hat = y_exp / torch.sum(y_exp, dim=1, keepdim=True)
    outputs = {"h_pre": h_pre, "h": h, "y_pre": y_pre}
    return y_hat, outputs

In [73]:
def loss_accuracy(Yhat, Y):
    _, Y_pred = torch.max(Yhat, 1)
    _, Y_true = torch.max(Y, 1)
    loss = -torch.mean(torch.sum(Y * torch.log(Yhat + 1e-10), dim=1))
    accuracy = torch.mean((Y_pred == Y_true).float())
    return loss, accuracy

In [74]:
def backward(params, outputs, Y):
    Yhat = torch.exp(outputs["y_pre"])
    Yhat /= torch.sum(Yhat, dim=1, keepdim=True)
    grad_y_pre = Yhat - Y
    grad_Wy = torch.mm(grad_y_pre.t(), outputs["h"])
    grad_by = torch.sum(grad_y_pre, dim=0)
    grad_h = torch.mm(grad_y_pre, params["Wy"])
    grad_h_pre = grad_h * (1 - outputs["h"] ** 2)
    grad_Wh = torch.mm(grad_h_pre.t(), X)
    grad_bh = torch.sum(grad_h_pre, dim=0)

    return {"Wy": grad_Wy, "by": grad_by, "Wh": grad_Wh, "bh": grad_bh}

In [75]:
def sgd(params, grads, eta):
    for key in params:
        params[key] -= eta * grads[key]

In [78]:
def train_neural_network(
    X_train, Y_train, X_test, Y_test, nx, nh, ny, n_epochs=1000, batch_size=32, eta=0.01
):
    params = init_params(nx, nh, ny)
    for epoch in range(n_epochs):
        indices = torch.randperm(X_train.size(0))
        X_train = X_train[indices]
        Y_train = Y_train[indices]
        for i in range(0, X_train.size(0), batch_size):
            X_batch = X_train[i : i + batch_size]
            Y_batch = Y_train[i : i + batch_size]
            Yhat, outputs = forward(params, X_batch)
            loss, accuracy = loss_accuracy(Yhat, Y_batch)
            grads = backward(params, outputs, Y_batch)
            sgd(params, grads, eta)
        if epoch % 100 == 0:
            Yhat_test, _ = forward(params, X_test)
            test_loss, test_accuracy = loss_accuracy(Yhat_test, Y_test)
            print(
                f"Epoch {epoch}: Train Loss = {loss.item():.4f}, Test Accuracy = {test_accuracy.item():.4f}"
            )

    return params

<h3>Application to MNIST</h3>

In [93]:
def train_svm_on_circles():
    np.random.seed(42)
    def make_circles(n_samples=1000, noise=0.1, factor=0.5):
        n = n_samples // 2
        theta = np.linspace(0, 2*np.pi, n)
        inner_x = factor * np.cos(theta)
        inner_y = factor * np.sin(theta)
        inner_labels = np.zeros(n)
        outer_x = np.cos(theta)
        outer_y = np.sin(theta)
        outer_labels = np.ones(n)
        X = np.vstack([
            np.column_stack([inner_x, inner_y]),
            np.column_stack([outer_x, outer_y])
        ])
        y = np.concatenate([inner_labels, outer_labels])
        X += np.random.normal(0, noise, X.shape)
        
        return X, y

    X, y = make_circles(n_samples=1000, noise=0.1)
    
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    svm = SVC(kernel='rbf')
    svm.fit(X_train, y_train)
    
    y_pred = svm.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    
    plt.figure(figsize=(10, 5))
    
    plt.subplot(121)
    plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=plt.cm.RdYlBu)
    plt.title('Training Data')
    
    plt.subplot(122)
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                         np.arange(y_min, y_max, 0.02))
    Z = svm.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    plt.contourf(xx, yy, Z, alpha=0.4)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu)
    plt.title(f'SVM Decision Boundary (Accuracy: {accuracy:.2f})')
    
    plt.tight_layout()
    plt.show()
    
    print(f"SVM Accuracy: {accuracy:.4f}")
    
    return svm



In [92]:
def main():
    """
    Main function to run all implementations
    """
    print("Training Neural Network on MNIST:")
    mnist_model = train_mnist_neural_network()
    
    print("\nTraining SVM on Circles Dataset:")
    svm_model = train_svm_on_circles()

if __name__ == "__main__":
    main()

Training Neural Network on MNIST:
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)>



RuntimeError: Error downloading train-images-idx3-ubyte.gz