<a href="https://colab.research.google.com/github/phamducdai092/ML_LAB/blob/main/Lab_8_21130304_PhamDucDai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The main aim of this lab is to deal with the **pipeline** technique and **MultilayerPerceptron** algorithm

*   **Deadline: 23:59, 06/5/2024**



# Import libraries

In [None]:

from sklearn.pipeline import Pipeline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.impute import SimpleImputer
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression,LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
from prettytable import PrettyTable
from sklearn import svm, datasets
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.model_selection import GridSearchCV
from sklearn import set_config
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import f1_score
from sklearn.metrics import recall_score
from sklearn.neural_network import MLPClassifier

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')
%cd '/content/gdrive/MyDrive/ML/lab6'

Mounted at /content/gdrive
/content/gdrive/MyDrive/ML/lab6


#Task 1. With **iris** dataset
*  Apply **pipeline** including preprocessing steps (i.e., **StandardScaler**, **SimpleImputer**, **feature selection**, **KBinsDiscretizer**, …) and classification algorithms (i.e., **Random forest, kNN, Naïve Bayes**).


In [None]:
iris = datasets.load_iris()

X = iris.data
y = iris.target
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2)

iris_pipeline = Pipeline([
    ('scl', StandardScaler()),
    ('pca', PCA(n_components=2)),
    ('clf', KNeighborsClassifier())
])

iris_pipeline.fit(X_train,y_train)
y_pred = iris_pipeline.predict(X_test)
accuracy = accuracy_score(y_test,y_pred)
accuracy

0.9666666666666667

#Task 2. With **fashion** dataset
*   2.1. Apply **MultilayerPerceptron** classification with 1 hidden layer
having 10 nodes

In [None]:
def getScore(estimator,name,X_train,X_test,y_train,y_test,fit=True):
  if(fit):
    estimator.fit(X_train,y_train)
  y_pred = estimator.predict(X_test)
  accuracy = accuracy_score(y_test,y_pred)
  pre = precision_score(y_test,y_pred,average="macro")
  recall = recall_score(y_test,y_pred,average="macro")
  f1 = f1_score(y_test,y_pred,average="macro")
  return [name,accuracy,pre,recall,f1]

In [None]:
fashion_train = pd.read_csv('fashion_train.csv');
fashion_test = pd.read_csv('fashion_test.csv');

X_train = fashion_train.drop(columns="y")
y_train = fashion_train[["y"]]
X_test = fashion_test.drop(columns="y")
y_test = fashion_test[["y"]]

MP = MLPClassifier(max_iter=1000,hidden_layer_sizes=(10))
MP.fit(X_train, y_train.values.ravel())

table = PrettyTable(["algo","Accuracy","Precision","Recall","F1"])
table.add_row(getScore(MP, MP, X_train, X_test, y_train.values.ravel(), y_test.values.ravel()))
print(table)

+-----------------------------------------------------+----------+--------------------+--------------------+---------------------+
|                         algo                        | Accuracy |     Precision      |       Recall       |          F1         |
+-----------------------------------------------------+----------+--------------------+--------------------+---------------------+
| MLPClassifier(hidden_layer_sizes=10, max_iter=1000) |  0.092   | 0.1091182364729459 | 0.1008695652173913 | 0.01843671827997847 |
+-----------------------------------------------------+----------+--------------------+--------------------+---------------------+


  _warn_prf(average, modifier, msg_start, len(result))


*   2.2. Apply **MultilayerPerceptron** algorithm with the following settings (the first hidden layer has 250 neuron, the second one has 100 neurons).

In [None]:
MP_2 = MLPClassifier(max_iter=1000,hidden_layer_sizes=(250,100))
MP_2.fit(X_train,y_train.values.ravel())
table.add_row(getScore(MP_2,MP_2,X_train,X_test,y_train.values.ravel(),y_test.values.ravel(),fit=False))
print(table)

+-------------------------------------------------------------+----------+--------------------+--------------------+---------------------+
|                             algo                            | Accuracy |     Precision      |       Recall       |          F1         |
+-------------------------------------------------------------+----------+--------------------+--------------------+---------------------+
|     MLPClassifier(hidden_layer_sizes=10, max_iter=1000)     |  0.092   | 0.1091182364729459 | 0.1008695652173913 | 0.01843671827997847 |
| MLPClassifier(hidden_layer_sizes=(250, 100), max_iter=1000) |  0.747   | 0.7591937820373404 | 0.7459921528096172 |  0.7329244288936508 |
+-------------------------------------------------------------+----------+--------------------+--------------------+---------------------+


*   2.3. Find the best hyperparameters using **GridSearchCV**

In [None]:
MP_param = {
      'hidden_layer_sizes': [(100, 60), (100, 80), (200, 100, 150)],
      'activation': ['tanh', 'relu']
}

In [None]:
grid_fashion = GridSearchCV(estimator=MLPClassifier(max_iter=10000),param_grid=MP_param,n_jobs=-1)
grid_fashion.fit(X_train,y_train.values.ravel())
grid_fashion.best_estimator_

*   2.4. Compare the **MultilayerPerceptron** using the best hyperparameters in 2.3 and other classification algorithms (i.e., Random forest, kNN, Naïve Bayes)  in termns of accuracy, precision, recall, and F1

In [None]:
table2 = PrettyTable(["algo","Accuracy","Precision","Recall","F1"])
table2.add_row(getScore(RandomForestClassifier(),RandomForestClassifier(),X_train,X_test,y_train.values.ravel(),y_test.values.ravel()))
table2.add_row(getScore(KNeighborsClassifier(),KNeighborsClassifier(),X_train,X_test,y_train.values.ravel(),y_test.values.ravel()))
table2.add_row(getScore(GaussianNB(),GaussianNB(),X_train,X_test,y_train.values.ravel(),y_test.values.ravel()))
table2.add_row(getScore(grid_fashion,grid_fashion.best_estimator_,X_train,X_test,y_train.values.ravel(),y_test.values.ravel(),fit=False))
print(table2)

+-------------------------------------------------------------------------------+----------+--------------------+--------------------+--------------------+
|                                      algo                                     | Accuracy |     Precision      |       Recall       |         F1         |
+-------------------------------------------------------------------------------+----------+--------------------+--------------------+--------------------+
|                            RandomForestClassifier()                           |  0.806   | 0.8004866103681323 | 0.8024226650797782 | 0.7985592476960566 |
|                             KNeighborsClassifier()                            |  0.761   | 0.7769873089533864 |  0.76181486566761  | 0.7569379032729887 |
|                                  GaussianNB()                                 |  0.556   | 0.5788628371304589 | 0.559496772854223  | 0.5256907025966638 |
| MLPClassifier(alpha=0.05, hidden_layer_sizes=(200, 100, 150), 

#Task 3. With **breast cancer** dataset

*   3.1. Apply **GridSearchCV** to **MultilayperPerceptron** to find the best hyperparameters (the setting of hyperparameters chosen by students)

In [None]:
cancer = datasets.load_breast_cancer()

X = cancer.data
y = cancer.target

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2)

grid_cancer = GridSearchCV(estimator=MLPClassifier(max_iter=10000),param_grid=MP_param,n_jobs=-1)
grid_cancer.fit(X_train,y_train)
grid_cancer.best_estimator_

*   3.2. Compare the **MultilayerPerceptron** using the best hyperparameters in 3.1) and other classification algorithms (i.e., Random forest, kNN, Naïve Bayes)  in termns of accuracy, precision, recall, and F1

In [None]:
table3 = PrettyTable(["algo","Accuracy","Precision","Recall","F1"])
table3.add_row(getScore(RandomForestClassifier(),RandomForestClassifier(),X_train,X_test,y_train,y_test))
table3.add_row(getScore(KNeighborsClassifier(),KNeighborsClassifier(),X_train,X_test,y_train,y_test))
table3.add_row(getScore(GaussianNB(),GaussianNB(),X_train,X_test,y_train,y_test))
table3.add_row(getScore(grid_cancer,grid_cancer.best_estimator_,X_train,X_test,y_train,y_test,fit=False))
print(table3)

+-------------------------------------------------------------------------+--------------------+--------------------+--------------------+--------------------+
|                                   algo                                  |      Accuracy      |     Precision      |       Recall       |         F1         |
+-------------------------------------------------------------------------+--------------------+--------------------+--------------------+--------------------+
|                         RandomForestClassifier()                        | 0.9385964912280702 | 0.9405054405054405 | 0.9239864864864865 | 0.9313666465984347 |
|                          KNeighborsClassifier()                         | 0.9298245614035088 | 0.9175324675324675 | 0.9344594594594595 | 0.9246031746031746 |
|                               GaussianNB()                              | 0.9473684210526315 | 0.954059829059829  | 0.9307432432432432 | 0.9407894736842105 |
| MLPClassifier(alpha=0.05, hidden_layer

#Task 4. With **mobile price classification** dataset


*   4.1. Build your own Neural Network using **MultilayerPerceptron**  



In [None]:
mobile = pd.read_csv('mobile.csv')

X = mobile.drop(columns="price_range")
y = mobile[["price_range"]]
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2)

mobile_MP = MLPClassifier(max_iter=10000,hidden_layer_sizes=(200,100,20))
mobile_MP.fit(X_train, y_train)

table4 = PrettyTable(["algo","Accuracy","Precision","Recall","F1"])
table4.add_row(getScore(mobile_MP,mobile_MP,X_train,X_test,y_train,y_test,fit=False))
print(table4)

  y = column_or_1d(y, warn=True)


+------------------------------------------------------------------+----------+--------------------+--------------------+--------------------+
|                               algo                               | Accuracy |     Precision      |       Recall       |         F1         |
+------------------------------------------------------------------+----------+--------------------+--------------------+--------------------+
| MLPClassifier(hidden_layer_sizes=(200, 100, 20), max_iter=10000) |  0.5975  | 0.6357120131196807 | 0.6026656189555126 | 0.5921816003886656 |
+------------------------------------------------------------------+----------+--------------------+--------------------+--------------------+


*   4.2. Apply **GridSearchCV** to **MultilayperPerceptron** to find the best hyperparameters (the setting of hyperparameters chosen by students)

In [None]:
grid_moblie = GridSearchCV(estimator=MLPClassifier(max_iter=10000),param_grid=MP_param,n_jobs=-1)
grid_moblie.fit(X_train,y_train.values.ravel())
grid_moblie.best_estimator_

#Finally,
Save a copy in your Github. Remember renaming the notebook.