<a href="https://colab.research.google.com/github/rahibvk/AI-Based-Web-Application-Development-and-Deployment/blob/main/student_27_notebook_rahib_vadakke_koleth.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [None]:
df = pd.read_csv('student_27_dataset.csv')

In [None]:
# Check for missing values
print(df.isnull().sum())

numeric_1        0
numeric_2        0
numeric_3        0
binary_1         0
binary_2         0
categorical_1    0
Personal_ID      0
target           0
dtype: int64


In [None]:
# Impute missing values (mean imputation for numeric columns)
df['numeric_1'] = df['numeric_1'].fillna(df['numeric_1'].mean())
df['numeric_2'] = df['numeric_2'].fillna(df['numeric_2'].mean())
df['numeric_3'] = df['numeric_3'].fillna(df['numeric_3'].mean())


In [None]:
# Use one-hot encoding for categorical column 'categorical_1'
df = pd.get_dummies(df, columns=['categorical_1'], drop_first=True)

In [None]:
# Drop irrelevant columns (e.g., 'Personal_ID')
df.drop(['Personal_ID'], axis=1, inplace=True)

In [None]:
# Define features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

In [None]:
# Split the data into 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Initialize the StandardScaler
scaler = StandardScaler()


In [None]:
# Scale the training and testing data
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
# Initialize the MLP with two hidden layers: 50 neurons and 25 neurons
mlp = MLPClassifier(hidden_layer_sizes=(50, 25), max_iter=1000, random_state=42)

In [None]:
# Train the MLP on the scaled training data
mlp.fit(X_train_scaled, y_train)

In [None]:
# Make predictions on the test data
y_pred = mlp.predict(X_test_scaled)

In [None]:
y_pred = mlp.predict(X_test_scaled)

# Calculate and print the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')

Accuracy: 39.29%


In [None]:
# Print classification report
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.41      0.50      0.45        14
           1       0.36      0.29      0.32        14

    accuracy                           0.39        28
   macro avg       0.39      0.39      0.39        28
weighted avg       0.39      0.39      0.39        28



In [None]:
# Print confusion matrix
print(confusion_matrix(y_test, y_pred))

[[ 7  7]
 [10  4]]


In [None]:
# Example of a tuned MLP (adjusting hidden layers, activation function, and learning rate)
mlp_tuned = MLPClassifier(hidden_layer_sizes=(100, 50), activation='tanh', learning_rate='adaptive', max_iter=1000)
mlp_tuned.fit(X_train_scaled, y_train)

In [None]:
# Evaluate the tuned model
y_pred_tuned = mlp_tuned.predict(X_test_scaled)
accuracy_tuned = accuracy_score(y_test, y_pred_tuned)
print(f'Tuned Model Accuracy: {accuracy_tuned * 100:.2f}%')

Tuned Model Accuracy: 42.86%


# Theoretical Questions for Student 27


**Question 1:** Explain the difference between removing rows with missing values and imputing them. When would you choose one method over the other?

*Your answer:*
Any row in the dataset where at least one column has a missing value is removed when the row has missing values. Although this is a straightforward method, it may lead to the loss of potentially important data, particularly if a large number of rows have missing values.

Use case: If the dataset is sufficiently large, the missing values are uncommon, and removing those rows has no appreciable impact on the analysis or model performance, you might eliminate the rows.

As we did in the code, impute missing data means substituting a statistical estimate (mean, median, mode, etc.) for the missing values, or even employing sophisticated methods (regression or machine learning algorithms).

Use case: If there is a big amount of missing data and eliminating rows would result in a severe loss of information, you would impute the missing values. It enhances model performance by letting you keep every row and use realistic estimates to fill in the gaps.

For numeric columns, we used mean imputation in our code, which substituted the column mean for any missing values. This preserves all the data, but if the pattern of missing data is not random, bias may be introduced.

> Add blockquote



**Question 2:** What is the role of backpropagation in training a Multilayer Perceptron?

*Your answer:*
An essential algorithm for Multilayer Perceptron (MLP) training is backpropagation. In order to reduce prediction error, it modifies the network's weights. This is how it operates:

Forward Pass: The network processes the incoming data and generates an output.
Error Calculation: A loss function is used to compute the error, which is the difference between the expected output and the actual target.
Backward Pass (Backpropagation): Using the calculus chain rule, the error is spread backward across the network, layer by layer. It calculates the loss function's gradient, or partial derivatives, in relation to the network weights.
Weight Update: Using an optimization process such as Gradient Descent, the weights are modified according to the gradient. Reducing the mistake is the aim.

In short, backpropagation helps the MLP learn by fine-tuning the weights to make better predictions, reducing the error iteratively over multiple training steps. Without backpropagation, the network would not learn from the errors and would not improve its performance.
