<a href="https://colab.research.google.com/github/joshuahurd515/ai-and-data-science-work/blob/main/Project1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Below is my implementation of the Multi Layer Perceptron

In [None]:
import numpy as np
import pandas as pd

class MLP:
    def __init__(self, num_inputs, num_hidden, num_outputs, learning_rate=0.005):
        ##initialize the class with the variables below
        self.num_inputs = num_inputs
        self.num_hidden = num_hidden
        self.num_outputs = num_outputs
        self.learning_rate = learning_rate

        self.W1 = np.random.uniform(-1, 1, (num_inputs, num_hidden))
        self.b1 = 1
        self.W2 = np.random.uniform(-1, 1, (num_hidden, num_outputs))
        self.b2 = 1

    def sigmoid(self, x):
        ##function for calculating the hidden layer activation function
        return 1 / (1 + np.exp((-np.clip(x, -500, 500))))

    def sigmoid_derivative(self, x):
        ##the derivative of the activation function
        return x * (1 - x)

    def mse_loss(self, y_true, y_pred):
        ##using mean of square errors as my loss function
        return np.mean(np.square(y_true - y_pred))

    def forward(self, X):
        ##create the forward pass to go from the inputs to the hidden layer

        ##multiplying X times W(weight of X to the hidden layer) and then adding the mass which is the logit

        ##after calculating the logit, you then put it rhough the sigmoid function
        ##  which is the hidden activation
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = self.sigmoid(self.z1)
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        self.a2 = self.sigmoid(self.z2)
        return self.a2

    def backward(self, X, y, y_pred):
        ##below is calculating the backwards propagation which gives you the weights from the hidden layer to the output layer
        error = y - y_pred
        delta = error * self.sigmoid_derivative(y_pred)
        dW2 = np.dot(self.a1.T, delta)
        db2 = np.sum(delta, axis=0, keepdims=True)
        delta_hidden = np.dot(delta, self.W2.T) * self.sigmoid_derivative(self.a1)
        dW1 = np.dot(X.T, delta_hidden)
        db1 = np.sum(delta_hidden, axis=0, keepdims=True)

        ##once you calculate the weights, you then update them
        self.W1 += self.learning_rate * dW1
        self.b1 += self.learning_rate * db1
        self.W2 += self.learning_rate * dW2
        self.b2 += self.learning_rate * db2

    def train(self, X, y, epochs=200):
        ##training the data for the forward and setting backwards to the inputs from here
        for epoch in range(epochs):
            y_pred = self.forward(X)
            self.backward(X, y, y_pred)

    def predict(self, X):
        ##calling the forward function to get your prediction
        y_pred = self.forward(X)
        return np.argmax(y_pred, axis=1) + 1  # Convert one-hot encoding to class label

##opening the wine file
data = pd.read_csv("wine.data", header=None)

##splitting the class data and the other features apart
X = data.iloc[:, 1:].values
y = data.iloc[:, 0].values

#one-hot encode the target variable
num_classes = len(np.unique(y))
y_onehot = np.zeros((len(y), num_classes))
y_onehot[np.arange(len(y)), y-1] = 1

##normalize the data
##X = (X - np.mean(X, axis=0)) / np.std(X, axis=0)

#shuffle data
np.random.seed(None)
shuffle_idx = np.random.permutation(len(y))
X, y_onehot = X[shuffle_idx], y_onehot[shuffle_idx]

#define number of folds
num_folds = 5
fold_size = len(y) // num_folds

#initialize list to store accuracies
accuracies = []

#perform cross validation
for fold in range(num_folds):
    #split data into training and validation sets
    start_idx = fold * fold_size
    end_idx = (fold + 1) * fold_size
    val_X, val_y = X[start_idx:end_idx], y_onehot[start_idx:end_idx]
    train_X = np.concatenate((X[:start_idx], X[end_idx:]), axis=0)
    train_y = np.concatenate((y_onehot[:start_idx], y_onehot[end_idx:]), axis=0)

    #initialize multi layer perceptron
    num_inputs = X.shape[1]
    num_hidden = 8
    num_outputs = num_classes
    learning_rate = 0.005
    mlp = MLP(num_inputs, num_hidden, num_outputs, learning_rate)

    #train MLP
    mlp.train(train_X, train_y, epochs=200)

    #test multi layer perceptron on validation set and calculate accuracy
    val_y_pred = mlp.predict(val_X)
    val_y_true = np.argmax(val_y, axis=1) + 1
    accuracy = sum(val_y_pred == val_y_true) / len(val_y_true)
    accuracies.append(accuracy)

#print mean accuracy for indivisual values and for the mean of accuracies
print("Individual Accuracies: ", accuracies)
print("Mean of Individual Accuracies: ", np.mean(accuracies))

Individual Accuracies:  [0.4857142857142857, 0.45714285714285713, 0.2857142857142857, 0.37142857142857144, 0.37142857142857144]
Mean of Individual Accuracies:  0.3942857142857143


**Outputs from running the code:**

Above are the outputs from running the code, here I have about 40% accuracy. The accuracy pretty much varies from anywhere from 30% to 40%.

The odd thing about this is that when I normalized the data, my accuracy was very high. I would be getting around 97% accuracy, and I think that I might have done something wrong because the indivisual accuracies from the 5-fold cross validation were sometimes 100%, which honestly did not make very much sense to me


**Model Details:**

The above model implements the multi layer perceptron for classification of the wine data set. The MLP has an input layer with the number of features equal to the nymber of columns in the data set, as well as a hidden layer with eight nodes. Along with this, it also has a n output layer that that has the same amount of nodes as the nymber of classes in the target. The model implements forwards and back propagation to train the model, as well as using the sigmoid activation function on the hidden layer, and the softmax function used on the output layer. The loss function that I incorporated was the mean of squared errors. Lastly, the model used 5-fold cross validation to out put the average accuracy from doing this.

(Note: As I said previously, I did not normalize my data in the final result because I was getting accuracy of around 97% which was not really making sense to me, so I took it out and would end up with a final average accuracy of around 34-43 percent.)