Implement **L1, L2, and ElasticNet** regularization techniques on a single-layered neural network with four input nodes and a bias. Assume the input vector has the values [0.5, 1.5, 1.0, 0.5] and a bias of -1, with no activation function at the output node. The learned parameter vector W = [0.1, 0.2, 0.3, 0.4] should return the expected result of "1.75."


a) Write a Python function to estimate the total loss using L1, L2, and ElasticNet regularization separately. The loss function should be defined as the mean squared error between the predicted output and the expected result. Implement regularization terms for L1, L2, and ElasticNet with regularization strengths (alpha) of 0.01, 0.05, and 0.1, respectively, and **compute the total loss for each regularization technique**.


In [25]:
import numpy as np

# Defining all the values

X = np.array([0.5, 1.5, 1.0, 0.5])
bias = -1
expected_result = 1.75
W = np.array([0.1, 0.2, 0.3, 0.4])


def mse_loss(y_pred, y_true):
    return np.mean((y_pred - y_true) ** 2)

#  L1
def l1_regularization(W, alpha):
    return alpha * np.sum(np.abs(W))

# L2
def l2_regularization(W, alpha):
    return 0.5 * alpha * np.sum(W ** 2)

#  ELASTICNET
def elasticnet_regularization(W, alpha, rho):
    return alpha * ((1 - rho) * np.sum(np.abs(W)) + 0.5 * rho * np.sum(W ** 2))



#  to calculate total loss with
def total_loss(X, W, bias, expected_result, alpha, regularization_type, rho=None):
    y_pred = np.dot(X, W) + bias
    loss = mse_loss(y_pred, expected_result)

    if regularization_type == 'L1':
        regularization_term = l1_regularization(W, alpha)
    elif regularization_type == 'L2':
        regularization_term = l2_regularization(W, alpha)
    elif regularization_type == 'ElasticNet':
        regularization_term = elasticnet_regularization(W, alpha, rho)
    else:
        raise ValueError("Invalid regularization type")

    total_loss = loss + regularization_term
    return total_loss



alpha_values = [0.01, 0.05, 0.1]
regularization_types = ['L1', 'L2', 'ElasticNet']
rho = 0.5  # Rho value

for alpha in alpha_values:
    for regularization_type in regularization_types:
        loss = total_loss(X, W, bias, expected_result, alpha, regularization_type, rho)
        print(f"Total loss with {regularization_type} regularization (alpha={alpha}): {loss}")


Total loss with L1 regularization (alpha=0.01): 3.6199999999999997
Total loss with L2 regularization (alpha=0.01): 3.6115
Total loss with ElasticNet regularization (alpha=0.01): 3.61575
Total loss with L1 regularization (alpha=0.05): 3.6599999999999997
Total loss with L2 regularization (alpha=0.05): 3.6174999999999997
Total loss with ElasticNet regularization (alpha=0.05): 3.63875
Total loss with L1 regularization (alpha=0.1): 3.71
Total loss with L2 regularization (alpha=0.1): 3.625
Total loss with ElasticNet regularization (alpha=0.1): 3.6675


### **Compare** the effects of L1, L2, and ElasticNet regularization techniques by analyzing their impact on the total loss and the learned parameter vector.Discuss the strengths and weaknesses of each regularization technique in terms of controlling model complexity and preventing overfitting

---




**L1 (Lasso) regularization technique:**


reduces the sparsity and brings all the parameters weights down to exactly zero. total loss will increase with the regularization strength.

l1 works good in high dimensional data with many features where most of them are irrelevant.

effective with huge data
may remove useful features too aggresively.



**L2 (Ridge) regularization technique:**

total loss value increases as the weight increases

helps with faster convergence in training because of the smooth updates.

helps prevent overfitting and does not delete the weights completely just reduces it.

may not work well in instances where some fearures are more important than others.



**Elastic net**:

this is a combination of both
l1 and l2

The total loss increases based on the combined effect of L1 and L2


It can handle situations where L1 alone might be too aggressive by also penalizing large weights with L2 regularization


 Suitable for datasets with correlated features

 May still discard potentially useful features if L1 penalty dominates





##c)

 Implement **a multi-layer perceptron** (MLP) to approximate the XOR function using only numpy without using any built-in datasets or libraries. Design a network with one hidden layer containing two neurons and an output layer. Initialize the weights and biases randomly. Train the network using backpropagation for the XOR function with input values (0,0),(0,1),(1,0),(1,1) and target outputs 0,1,1,0, respectively.






In [23]:
import numpy as np

#sigmoid activation function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

#derivative of the sigmoid function
def sigmoid_derivative(x):
    return x * (1 - x)



X = np.array([[0, 0],
              [0, 1],
              [1, 0],
              [1, 1]])

y = np.array([[0],
              [1],
              [1],
              [0]])




np.random.seed(1)
input_neurons = 2
hidden_neurons = 2
output_neurons = 1

weights_input_hidden = np.random.uniform(size=(input_neurons, hidden_neurons))
biases_hidden = np.random.uniform(size=(1, hidden_neurons))

weights_hidden_output = np.random.uniform(size=(hidden_neurons, output_neurons))
biases_output = np.random.uniform(size=(1, output_neurons))

learning_rate = 0.5
epochs = 10000

#backpropagation
for epoch in range(epochs):

    hidden_layer_input = np.dot(X, weights_input_hidden) + biases_hidden
    hidden_layer_output = sigmoid(hidden_layer_input)

    output_layer_input = np.dot(hidden_layer_output, weights_hidden_output) + biases_output
    predicted_output = sigmoid(output_layer_input)


    error = y - predicted_output
    d_predicted_output = error * sigmoid_derivative(predicted_output)

    error_hidden_layer = d_predicted_output.dot(weights_hidden_output.T)
    d_hidden_layer = error_hidden_layer * sigmoid_derivative(hidden_layer_output)




    weights_hidden_output += hidden_layer_output.T.dot(d_predicted_output) * learning_rate
    biases_output += np.sum(d_predicted_output, axis=0, keepdims=True) * learning_rate

    weights_input_hidden += X.T.dot(d_hidden_layer) * learning_rate
    biases_hidden += np.sum(d_hidden_layer, axis=0, keepdims=True) * learning_rate

# final predictions
print("Final predictions:")
print(predicted_output)

Final predictions:
[[0.01931569]
 [0.98331818]
 [0.9833515 ]
 [0.01725879]]


Given **a convolutional neural network (CNN)**  architecture with one convolutional layer followed by a max-pooling layer and a fully connected layer, compute the total number of parameters in the network. Assume the input image size is 32×32, the convolutional layer has 16 filters of size 3×3, the max-pooling layer has a pool size of 2×2, and the fully connected layer has 256 neurons. (10 Marks)



In [22]:

conv_filters = 16
conv_filter_size = 3
conv_input_channels = 3
conv_bias = 1


conv_params = conv_filters * (conv_filter_size * conv_filter_size * conv_input_channels + conv_bias)
fc_neurons = 256
fc_input_size = 16 * 16 * 16
fc_bias = 1

fc_params = fc_neurons * (fc_input_size + fc_bias)

total_params = conv_params + fc_params
print("Total number of parameters in the network:", total_params)


Total number of parameters in the network: 1049280
