<a href="https://colab.research.google.com/github/gamidirohan/MachineLearning-Lab/blob/main/Lab07.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A2.

Importing necessary libraries

In [7]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import Perceptron
from sklearn.metrics import classification_report
import pickle
import os

Class Labels

In [8]:
class_labels = ["acrostic", "ballad", "epigram", "haiku", "limerick", "sestina", "sonnet", "villanelle"]

Evaluate model

In [9]:
def evaluate_model(model, X_train, X_test, y_train, y_test, class_labels):
    train_accuracy = model.score(X_train, y_train)
    test_accuracy = model.score(X_test, y_test)
    print(f"Train Accuracy: {train_accuracy:.2f}")
    print(f"Test Accuracy: {test_accuracy:.2f}")

    # Generate classification report
    y_pred = model.predict(X_test)
    report = classification_report(y_test, model.predict(X_test), target_names=class_labels, zero_division=1)
    print("Classification Report:")
    print(report)

Save model as a .pkl file

In [10]:
def save_model(model, model_file):
    with open(model_file, 'wb') as f:
        pickle.dump(model, f)
    print(f"Model saved as {model_file}")

Load Embeddings from .csv

In [11]:
# Load the dataset into a DataFrame
data_df = pd.read_csv("poems_data.csv")

# Drop rows with missing values
data_df.dropna(inplace=True)

# Extract features and target variable
X = data_df.drop(columns=['label']).values
y = data_df['label'].values

Splitting data into Train and Test datasets

In [12]:
# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Parameter Grid for MLP

In [13]:
mlp_param_grid = {
    'hidden_layer_sizes': [(50,), (100,), (150,), (200,)],
    'activation': ['relu', 'tanh'],
    'solver': ['adam', 'sgd'],
    'learning_rate': ['constant', 'adaptive'],
    'learning_rate_init': [0.001, 0.01, 0.1]
}

RandomSearchCV for MLP

In [14]:
mlp_random_search = RandomizedSearchCV(
    estimator=MLPClassifier(),
    param_distributions=mlp_param_grid,
    n_iter=10,
    scoring='accuracy',
    cv=5,
    verbose=2,
    random_state=42,
    n_jobs=-1
)
mlp_random_search.fit(X_train, y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits




Print MLP best Parameters

In [15]:
print("Best parameters for MLP:")
print(mlp_random_search.best_params_)

Best parameters for MLP:
{'solver': 'adam', 'learning_rate_init': 0.001, 'learning_rate': 'adaptive', 'hidden_layer_sizes': (200,), 'activation': 'relu'}


Evaluate MLP with best parameters

In [16]:
mlp_model = mlp_random_search.best_estimator_
print("Evaluating MLP...")
evaluate_model(mlp_model, X_train, X_test, y_train, y_test, class_labels)

Evaluating MLP...
Train Accuracy: 1.00
Test Accuracy: 0.85
Classification Report:
              precision    recall  f1-score   support

    acrostic       0.95      0.79      0.86        24
      ballad       0.58      0.85      0.69        13
     epigram       0.76      0.80      0.78        20
       haiku       0.77      0.91      0.83        22
    limerick       1.00      0.94      0.97        18
     sestina       0.90      0.90      0.90        21
      sonnet       1.00      0.73      0.85        15
  villanelle       0.95      0.86      0.90        22

    accuracy                           0.85       155
   macro avg       0.86      0.85      0.85       155
weighted avg       0.87      0.85      0.86       155



Save MLP model as .pkl

In [17]:
mlp_model_file = "mlp_model.pkl"
save_model(mlp_model, mlp_model_file)

Model saved as mlp_model.pkl


Defining parameter grid for Perceptron

In [18]:
perceptron_param_grid = {
    'alpha': [0.0001, 0.001, 0.01, 0.1],
    'max_iter': [1000, 2000, 3000],
    'tol': [1e-3, 1e-4, 1e-5]
}

Perform RandsearchCV for Perceptron

In [19]:
perceptron_random_search = RandomizedSearchCV(
    estimator=Perceptron(),
    param_distributions=perceptron_param_grid,
    n_iter=10,
    scoring='accuracy',
    cv=5,
    verbose=2,
    random_state=42,
    n_jobs=-1
)
perceptron_random_search.fit(X_train, y_train)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


Printing best Perceptron Parameters

In [20]:
print("Best parameters for Perceptron:")
print(perceptron_random_search.best_params_)

Best parameters for Perceptron:
{'tol': 1e-05, 'max_iter': 3000, 'alpha': 0.1}


Evaluate Perceptron with best Parameters

In [21]:
perceptron_model = perceptron_random_search.best_estimator_
print("Evaluating Perceptron...")
evaluate_model(perceptron_model, X_train, X_test, y_train, y_test, class_labels)

Evaluating Perceptron...
Train Accuracy: 0.31
Test Accuracy: 0.26
Classification Report:
              precision    recall  f1-score   support

    acrostic       1.00      0.00      0.00        24
      ballad       0.38      0.23      0.29        13
     epigram       0.15      0.65      0.24        20
       haiku       1.00      0.05      0.09        22
    limerick       0.33      0.06      0.10        18
     sestina       1.00      0.00      0.00        21
      sonnet       0.29      0.87      0.43        15
  villanelle       0.91      0.45      0.61        22

    accuracy                           0.26       155
   macro avg       0.63      0.29      0.22       155
weighted avg       0.68      0.26      0.21       155



Save model as .pkl

In [22]:
perceptron_model_file = "perceptron_model.pkl"
save_model(perceptron_model, perceptron_model_file)

Model saved as perceptron_model.pkl


## A3

Import necessary libraries

In [23]:
!pip install catboost
!pip install XlsxWriter

from sklearn.preprocessing import MinMaxScaler
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from catboost import CatBoostClassifier
from sklearn.preprocessing import LabelEncoder
from xgboost import XGBClassifier

Collecting catboost
  Downloading catboost-1.2.5-cp310-cp310-manylinux2014_x86_64.whl (98.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.2/98.2 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: catboost
Successfully installed catboost-1.2.5
Collecting XlsxWriter
  Downloading XlsxWriter-3.2.0-py3-none-any.whl (159 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m159.9/159.9 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: XlsxWriter
Successfully installed XlsxWriter-3.2.0


Applying MinMax scaling on input data for Naive-Bayes (Since it doesn't take negative values)

In [24]:
def scale_MinMax(X_train, X_test):
  scaler = MinMaxScaler()
  X_train_scaled = scaler.fit_transform(X_train)
  X_test_scaled = scaler.transform(X_test)
  return X_train_scaled, X_test_scaled

Initializing all classifiers

In [25]:
classifiers = {
    "Naive Bayes": MultinomialNB(),
    "Support Vector Machine": SVC(),
    "Decision Tree": DecisionTreeClassifier(),
    "Random Forest": RandomForestClassifier(),
    "AdaBoost": AdaBoostClassifier(),
    "XGBoost": XGBClassifier(use_label_encoder=True, eval_metric='mlogloss'),
    "CatBoost": CatBoostClassifier(logging_level='Silent')
}

Train and evaluate all classifiers

In [26]:
results = {}

X_train_scaled, X_test_scaled = scale_MinMax(X_train, X_test)

for clf_name, clf in classifiers.items():
    print(f"Tuning hyperparameters for {clf_name}...")
    clf.fit(X_train_scaled, y_train)
    train_accuracy = clf.score(X_train_scaled, y_train)
    test_accuracy = clf.score(X_test_scaled, y_test)
    print(f"Train Accuracy: {train_accuracy:.2f}")
    print(f"Test Accuracy: {test_accuracy:.2f}")

    # Generate classification report
    print(f"Classification Report for {clf_name}:")
    report = classification_report(y_test, clf.predict(X_test_scaled), target_names=class_labels, zero_division=1)
    print(report)

    # Store results
    results[clf_name] = {
        "Train Accuracy": train_accuracy,
        "Test Accuracy": test_accuracy,
        "Classification Report": report
    }

Tuning hyperparameters for Naive Bayes...
Train Accuracy: 0.75
Test Accuracy: 0.68
Classification Report for Naive Bayes:
              precision    recall  f1-score   support

    acrostic       0.70      0.58      0.64        24
      ballad       0.53      0.69      0.60        13
     epigram       0.71      0.25      0.37        20
       haiku       0.63      0.86      0.73        22
    limerick       0.94      0.94      0.94        18
     sestina       0.65      0.81      0.72        21
      sonnet       0.57      0.87      0.68        15
  villanelle       0.86      0.55      0.67        22

    accuracy                           0.68       155
   macro avg       0.70      0.69      0.67       155
weighted avg       0.71      0.68      0.67       155

Tuning hyperparameters for Support Vector Machine...
Train Accuracy: 0.97
Test Accuracy: 0.81
Classification Report for Support Vector Machine:
              precision    recall  f1-score   support

    acrostic       0.90     

Create a DataFrame to tabulate the results

In [27]:
results_df = pd.DataFrame(results)
print("\nResults Summary:")
print(results_df)


Results Summary:
                                                             Naive Bayes  \
Train Accuracy                                                  0.746774   
Test Accuracy                                                   0.683871   
Classification Report                precision    recall  f1-score   ...   

                                                  Support Vector Machine  \
Train Accuracy                                                  0.967742   
Test Accuracy                                                   0.812903   
Classification Report                precision    recall  f1-score   ...   

                                                           Decision Tree  \
Train Accuracy                                                       1.0   
Test Accuracy                                                   0.483871   
Classification Report                precision    recall  f1-score   ...   

                                                           Random F