<a href="https://colab.research.google.com/github/luferIPCA/MIA-MLA-24-25/blob/main/6_Modelling_Classification_MultiLabel.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Masters' in Applied Artificial Intelligence
## Machine Learning Algorithms Course

Notebooks for the MLA course

by [*lufer*](mailto:lufer@ipca.pt)

vers(2.0)

---



# ML Modelling - Part VI - Multi Label Classification Problems
\
**Contents**:

1.  **Create a Multi Label Classification ML Model**
2.  **L....***



This notebook explores the creation of Machine Learning models for Multi Label Classification Supervised Learning.

# Environment preparation


**Importing necessary Libraries**

In [None]:

!pip install scikit-multilearn

In [None]:
from skmultilearn.adapt import MLARAM #Adapted algotihm: Multi-label ARAM model
#see http://scikit.ml/api/skmultilearn.adapt.mlaram.html

from skmultilearn.problem_transform import BinaryRelevance, ClassifierChain, LabelPowerset #Transformation to be used

from sklearn.svm import SVC   #model to be used
from sklearn.model_selection import train_test_split  #prepare the dataset
from sklearn.metrics import hamming_loss              #meetric to be used
import pandas as pd
import scipy


**Mounting Drive**

In [None]:

from google.colab import drive

# it will ask for your google drive credentiaals
drive.mount('/content/gDrive/', force_remount=True)

Mounted at /content/gDrive/


#1 - Notes about Classification

**Note 1:**

Classification problems are different than regression problems primarily in their outputs. Classification problems involve categorizing data into discrete classes or labels, such as “spam” or “not spam” in email filtering models. In contrast,regression problems predict continuous, numerical outputs, like orecasting sales or temperatures.

**Note 2:**


**Classification:**
*   Simple
*   Multi-Class
*   Multi-Label

Where:

* In Simple Classification, the predicted value can be one of the two existing classes;

* Int he Mult-Class Classification Problems, the predicted can be one of the many existing classes.

* In Multi Label Classification Problems the predicted value can be a combination of all existing labels, i.e., each instance (input) can be assigned multiple labels instead of just one.

# 2 - Multi-Class Classification

The typical example of Iris Dataset, using the LinearSVC model.

**Steps:**
1. Load the Iris dataset.
2. Split dataset into training and testing sets.
3. Train the LinearSVC model.
4. Evaluate


In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target  # Features and Labels

# Split into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features (always important for SVM performance)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Linear SVC model for multi-class classification
model = LinearSVC(random_state=42, dual=False)  # dual=False is recommended when n_samples > n_features
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

#Note:
#In the primal formulation of linear SVC (i.e dual = False ), the optimisation variable is
#of dimension n_features. Whereas in the dual formulation (i.e dual = True ), the variable is of
#dimension n_samples.
#More importantly, the dual formulation requires the computation of an n_samplesxn_samples matrix.
#For this reason, when n_samples > n_features it is better to use dual = False



Model Accuracy: 1.00


Try to compare with others multi-class classification model, like:

* Randon Forest Classifier
* SVC (Support Vector Classifier) with Kernels
* Logistic Regression (Multi-class)
* K-Nearest Neighbors (KNN)
* Others

### 2.1 - Logist Regression with OvR

OvR | OoR: One versus the Rest | One over the Rest

Each class is predicted agains the others combined!

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix

In [None]:
## create the dummy dataset with 3 categories
X, y = make_classification(n_samples=1000, n_features=10,n_informative=3, n_classes=3, random_state=15)

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.30,random_state=42)

In [None]:
#Explore one-versus-rest
from sklearn.linear_model import LogisticRegression
logistic=LogisticRegression(multi_class='ovr')
logistic.fit(X_train,y_train)
y_pred=logistic.predict(X_test)



In [None]:
y_pred

array([2, 1, 2, 1, 1, 0, 0, 0, 2, 0, 2, 1, 2, 2, 2, 2, 2, 0, 0, 2, 2, 1,
       1, 1, 1, 0, 0, 0, 2, 1, 0, 2, 2, 1, 2, 0, 0, 2, 2, 1, 2, 2, 2, 1,
       2, 0, 1, 2, 0, 1, 0, 0, 0, 1, 1, 2, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0,
       2, 1, 0, 1, 0, 1, 2, 1, 2, 2, 1, 0, 1, 0, 1, 0, 1, 2, 2, 0, 1, 2,
       2, 1, 1, 2, 2, 0, 0, 0, 2, 2, 0, 1, 2, 1, 2, 1, 0, 2, 0, 2, 0, 1,
       2, 1, 2, 2, 1, 1, 1, 1, 2, 0, 2, 0, 1, 2, 0, 0, 2, 2, 2, 1, 2, 0,
       2, 2, 0, 0, 0, 2, 0, 2, 0, 1, 2, 1, 1, 2, 0, 0, 1, 1, 2, 2, 2, 1,
       2, 0, 2, 2, 2, 1, 0, 2, 0, 0, 2, 0, 2, 0, 0, 1, 2, 0, 1, 1, 1, 1,
       0, 2, 1, 0, 0, 1, 2, 2, 2, 2, 2, 0, 1, 1, 2, 2, 1, 2, 2, 2, 2, 1,
       0, 0, 1, 2, 2, 0, 0, 2, 1, 2, 1, 0, 0, 2, 1, 1, 1, 2, 2, 1, 2, 1,
       0, 1, 0, 0, 1, 0, 2, 1, 0, 2, 2, 1, 1, 1, 2, 1, 1, 0, 1, 1, 0, 0,
       0, 0, 2, 0, 0, 2, 2, 2, 2, 0, 2, 0, 1, 2, 2, 2, 1, 0, 0, 1, 0, 2,
       1, 2, 0, 0, 0, 2, 2, 1, 2, 0, 1, 1, 0, 0, 0, 1, 0, 2, 2, 0, 2, 0,
       0, 0, 1, 1, 2, 0, 1, 2, 2, 0, 1, 2, 0, 2])

In [None]:
score=accuracy_score(y_pred,y_test)
print(score)
print(classification_report(y_pred,y_test))
print(confusion_matrix(y_pred,y_test))

0.79
              precision    recall  f1-score   support

           0       0.82      0.87      0.84        97
           1       0.73      0.81      0.77        91
           2       0.82      0.71      0.76       112

    accuracy                           0.79       300
   macro avg       0.79      0.79      0.79       300
weighted avg       0.79      0.79      0.79       300

[[84  3 10]
 [10 74  7]
 [ 8 25 79]]


# 3 - Multi Label Classification Problems





**Strategies for supporting:**

*  Problem Transformation
*  Adapting Algorithms



## 2.1 - Dataset Preparation



### *Download Dataset*

This dataset has data about music classification according to several (77) audio technical attributes


In [None]:
#Importing a real world dataset preparaed for Regression

filePath='/content/gDrive/MyDrive/Colab Notebooks/MIA - ML - 2024-2025/Datasets/'
pd.set_option("display.precision", 2)
music = pd.read_csv(filePath+"Musica.csv")
music.shape

In [None]:
music.head()


We want to use the influencing attributes (72) to predict the target 6 categories (type of music).

In [None]:
len(music)
#answer: more than 50

### Understanding the Dataset

In [None]:
labelClassses = music.iloc[:,0:6].values
attributes = music.iloc[:,7:78].values
#attributes
#DlabelClassses

In [None]:
# Class labels : type of musics
music.columns[:6]

In [None]:
#Influencing Attributes
music.columns[7:78]

### Analysing initial dataset profile

In [None]:
#! pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

In [None]:
#from pandas_profiling import ProfileReport

In [None]:
#profile = ProfileReport(music, title="Music Types", html={'style' : {'full_width':True}})
#send result to file
#profile.to_file(output_file=filePath+"MusicTypes.html")
#ATTENTION: it takes to long...

### Prepare the trainning dataset

In [None]:
X_train, X_test, y_train, y_test = train_test_split(attributes, labelClassses, test_size=0.3, random_state=0)


## 2.2 - Strategy A - Adapting Algoritm


We are going to explore the Multi-label ARAM model.

MLARAM - Machine Learning Adaptive Resonance Associative Map

>Is a neural network model based on *Adaptive Resonance Theory (ART)*. It is used for incremental learning, pattern recognition, and classification tasks.


>MLARAM is great for adaptive, real-time learning.

See: http://scikit.ml/api/skmultilearn.adapt.mlaram.html

In [None]:
import numpy as np
import scipy

scipy.ones = np.ones    #scipy.ones was deprecated

In [None]:
#create an instance of the MLARAM model
ann = MLARAM()
ann.fit(X_train, y_train)

In [None]:
#make predictions
pred1 = ann.predict(X_test)
#evaluate
print(f"Hamming Loss:  {hamming_loss(y_test,pred1)}")
#Harmming Loss is a Lost Function =>t he lower value, the best performance

## 2.3 - Strategy B: Problem Transformation

* In all cases we'll explore the SVC (Support Vector Classifier) as base classification model.

### **2.3.1 - Using Binary Relevance**

* Binary Relevance explore an individual binary classification between attributes and each class.

* It prepares the data (transform the data) to be used by a normal classification model.


In [None]:
# Binary Relevance with SVC
# Define a base classifier
binary = BinaryRelevance(classifier=SVC())
binary.fit(X_train, y_train)    #train
pred2 = binary.predict(X_test)  #predict
print(f"Hamming Loss:  {hamming_loss(y_test,pred2)}") #evaluate
#better performance?
#Yes! indeed the HL is lower!

### **2.3.1 - Using Classifier Chain**

* Ensemble method used for multi-label classification, where labels are not independent but may have dependencies.

* It also prepares the data (transform the data) to be used by a normal classification model.

* Transforms multi-label classification into a sequence of binary classification problems:
  * Train one classifier per label.
  * Pass the predictions of previous classifiers as additional features to the next classifiers.


In [None]:
# Create Classifier Chain using SVC as base classifier
chain = ClassifierChain(classifier=SVC())
#train the model
chain.fit(X_train, y_train)
#predict
pred3 = chain.predict(X_test)
#evaluate
print(f"Hamming Loss:  {hamming_loss(y_test,pred3)}")

In [None]:
#H2
#using Randon Forest base model
from sklearn.ensemble import RandomForestClassifier
from sklearn.multioutput import ClassifierChain
# Define a base classifier (e.g., Random Forest)
base_model = RandomForestClassifier(n_estimators=100, random_state=42)
# Create Classifier Chain
chain_model = ClassifierChain(base_model, order='random')
# Train the model
chain_model.fit(X_train, y_train)

# Predict
pred4 = chain_model.predict(X_test)

# Convert sparse matrix to NumPy array (if needed)
if scipy.sparse.issparse(pred4):
    pred4 = pred4.toarray()

#Note:
#y_test is a binary indicator matrix (only 0s and 1s).
# pred4 is not in binary format—it contains continuous values (probabilities) instead of 0s and 1s.
# this happens because RandomForestClassifier outputs probabilities rather than hard label assignments.
# Convert predictions to binary format (0s and 1s)
pred4 = (pred4 >= 0.5).astype(int)  # Apply threshold at 0.5

# Compute Hamming Loss
hl = hamming_loss(y_test, pred4)

# Print first 5 predictions
# print(pred4[:5])

print(f"Hamming Loss: {hl:.4f}")

# ATTENTION:
# Even the HL is lower (performed better) it should not be compared, because the base model are different
# (SVC and RF)

### **2.3.1 - Using Label Powerset**

* Create a new class label for any common sequence of used classes
* Transforms the problem into a multi-class classification problem
* Instead of treating each label separately (like Binary Relevance) or chaining them (like Classifier Chain), Label Powerset treats each unique combination of labels as a single class.

>Works well when label dependencies are important

>More efficient than training separate classifiers for each label

>Not good for large numbers of labels (too many unique combinations → too many classes)

In [None]:
#create instance of LabelPowerset, using SVC as base model
label = LabelPowerset(classifier = SVC())
#train
label.fit(X_train, y_train)
#predict
pred5 = label.predict(X_test)
#evaluate
print(f"Hamming Loss: {hamming_loss(y_test, pred5):.4f}")

## **2.4 - Comparing all strategies**

In [None]:
#Display all Hamming Loss results

data = {
    "MLARM": hamming_loss(y_test,pred1),
    "Binary Relevance": hamming_loss(y_test,pred2),
    "Classifier Chain": hamming_loss(y_test,pred3),
    "Label Powerset": hamming_loss(y_test,pred4),
    }

# Convert dictionary to DataFrame
df = pd.DataFrame(data, index=["All Hamming Loss"])

# Display DataFrame
#print(df)

from IPython.display import display
display(df)

## 2.5 - Final Remarks

* Use Classifier Chain if labels are dependent but you want a simple approach.
* Use Label Powerset if you strongly believe in label dependencies and have fewer unique label combinations.
* Use Binary Relevance for fast training when labels are mostly independent.

Explore the books:
*   *Machine Learning*, Tom M. Mitchel
*   *Mastering Machine Learning with Python in Six Steps*, M
Manohar Swamynathan

## 2.6 - Exercise

1. Implement a "pipeline" to process all the three alternatives, Binary Relevance (BR), Classifier Chain (CC) and Label Powerset, using SVC as the base model.
2. Computes Hamming Loss & Accuracy for comparison
3. Displays results in a pandas DataFrame (formatted as a table)
4. Try the same, but using Randon Forest as the base model


In [None]:
#Solution here
#use the same previous splits: X_train, X_test, y_train, y_test


In [None]:
#End!