<a href="https://colab.research.google.com/github/luferIPCA/MIA-MLA-24-25/blob/main/5_Modelling_Classification_MultiLabel.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Masters' in Applied Artificial Intelligence
## Machine Learning Algorithms Course

Notebooks for the MLA course

by [*lufer*](mailto:lufer@ipca.pt)

vers(2.0)

---



# ML Modelling - Part V-II - Multi Label Classification Problems
\
**Contents**:

1.  **Create a Multi Label Classification ML Model**
2.  **L....***



This notebook explores the creation of Machine Learning models for Multi Label Classification Supervised Learning.

# Environment preparation


**Importing necessary Libraries**

In [6]:

!pip install scikit-multilearn

Collecting scikit-multilearn
  Downloading scikit_multilearn-0.2.0-py3-none-any.whl.metadata (6.0 kB)
Downloading scikit_multilearn-0.2.0-py3-none-any.whl (89 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.4/89.4 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: scikit-multilearn
Successfully installed scikit-multilearn-0.2.0


In [7]:
from skmultilearn.adapt import MLARAM #Adapted algotihm: Multi-label ARAM model
#see http://scikit.ml/api/skmultilearn.adapt.mlaram.html

from skmultilearn.problem_transform import BinaryRelevance, ClassifierChain, LabelPowerset #Transformation to be used

from sklearn.svm import SVC   #model to be used
from sklearn.model_selection import train_test_split  #prepare the dataset
from sklearn.metrics import hamming_loss              #meetric to be used
import pandas as pd
import scipy


**Mounting Drive**

In [8]:

from google.colab import drive

# it will ask for your google drive credentiaals
drive.mount('/content/gDrive/', force_remount=True)

Mounted at /content/gDrive/


# 1 - Multi Label Classification Problems


**Note :**

Classification problems are different than regression problems primarily in their outputs. Classification problems involve categorizing data into discrete classes or labels, such as “spam” or “not spam” in email filtering models. In contrast,regression problems predict continuous, numerical outputs, like orecasting sales or temperatures.

**Note 2:**


**Classification:**
*   Simple
*   Multi-Class
*   Multi-Label

Where:

* In Simple Classification, the predicted value can be one of the two existing classes;

* Int he Mult-Class Classification Problems, the predicted can be one of the many existing classes.

* In Multi Label Classification Problems the predicted value can be a combination of all existing labels, i.e., each instance (input) can be assigned multiple labels instead of just one.


**Strategies for supporting:**

*  Problem Transformation
*  Adapting Algorithms



## 2.1 - Dataset Preparation



### *Download Dataset*

This dataset has data about music classification according to several (77) audio technical attributes


In [9]:
#Importing a real world dataset preparaed for Regression

filePath='/content/gDrive/MyDrive/Colab Notebooks/MIA - ML - 2024-2025/Datasets/'
pd.set_option("display.precision", 2)
music = pd.read_csv(filePath+"Musica.csv")
music.shape

(592, 77)

In [6]:
music.head()


Unnamed: 0,amazed-suprised,happy-pleased,relaxing-clam,quiet-still,sad-lonely,angry-aggresive,Mean_Acc1298_Mean_Mem40_Centroid,Mean_Acc1298_Mean_Mem40_Rolloff,Mean_Acc1298_Mean_Mem40_Flux,Mean_Acc1298_Mean_Mem40_MFCC_0,...,Std_Acc1298_Std_Mem40_MFCC_10,Std_Acc1298_Std_Mem40_MFCC_11,Std_Acc1298_Std_Mem40_MFCC_12,BH_LowPeakAmp,BH_LowPeakBPM,BH_HighPeakAmp,BH_HighPeakBPM,BHSUM1,BHSUM2,BHSUM3
0,0,1,1,0,0,0,0.13,0.08,0.23,0.6,...,0.2,0.2,0.16,0.03,0.25,0.00847,0.24,0.14,0.06,0.11
1,1,0,0,0,0,1,0.38,0.36,0.17,0.85,...,0.09,0.09,0.03,0.18,0.29,0.157,0.27,0.19,0.15,0.2
2,0,1,0,0,0,1,0.54,0.36,0.15,0.79,...,0.2,0.11,0.14,0.1,0.14,0.0,0.59,0.11,0.03,0.12
3,0,0,1,0,0,0,0.17,0.24,0.25,0.44,...,0.24,0.22,0.24,0.02,0.22,0.117,0.21,0.06,0.13,0.09
4,0,0,0,1,0,0,0.35,0.16,0.1,0.13,...,0.72,0.57,0.41,0.02,0.76,0.0817,0.72,0.11,0.17,0.19


We want to use the influencing attributes (72) to predict the target 6 categories (type of music).

In [7]:
len(music)
#answer: more than 50

592

### Understanding the Dataset

In [10]:
labelClassses = music.iloc[:,0:6].values
attributes = music.iloc[:,7:78].values
#attributes
#DlabelClassses

In [16]:
# Class labels : type of musics
music.columns[:6]

Index(['amazed-suprised', 'happy-pleased', 'relaxing-clam', 'quiet-still',
       'sad-lonely', 'angry-aggresive'],
      dtype='object')

In [None]:
#Influencing Attributes
music.columns[7:78]

### Analysing initial dataset profile

In [None]:
#! pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

In [24]:
#from pandas_profiling import ProfileReport

  from pandas_profiling import ProfileReport


In [None]:
#profile = ProfileReport(music, title="Music Types", html={'style' : {'full_width':True}})
#send result to file
#profile.to_file(output_file=filePath+"MusicTypes.html")
#ATTENTION: it takes to long...

### Prepare the trainning dataset

In [11]:
X_train, X_test, y_train, y_test = train_test_split(attributes, labelClassses, test_size=0.3, random_state=0)


## 2.2 - Strategy A - Adapting Algoritm


We are going to explore the Multi-label ARAM model.

MLARAM - Machine Learning Adaptive Resonance Associative Map

>Is a neural network model based on *Adaptive Resonance Theory (ART)*. It is used for incremental learning, pattern recognition, and classification tasks.


>MLARAM is great for adaptive, real-time learning.

See: http://scikit.ml/api/skmultilearn.adapt.mlaram.html

In [12]:
import numpy as np
import scipy

scipy.ones = np.ones    #scipy.ones was deprecated

In [13]:
#create an instance of the MLARAM model
ann = MLARAM()
ann.fit(X_train, y_train)

In [14]:
#make predictions
pred1 = ann.predict(X_test)
#evaluate
print(f"Hamming Loss:  {hamming_loss(y_test,pred1)}")
#Harmming Loss is a Lost Function =>t he lower value, the best performance

Hamming Loss:  0.24906367041198502


## 2.3 - Strategy B: Problem Transformation

* In all cases we'll explore the SVC (Support Vector Classifier) as base classification model.

### **2.3.1 - Using Binary Relevance**

* Binary Relevance explore an individual binary classification between attributes and each class.

* It prepares the data (transform the data) to be used by a normal classification model.


In [15]:
# Binary Relevance with SVC
# Define a base classifier
binary = BinaryRelevance(classifier=SVC())
binary.fit(X_train, y_train)    #train
pred2 = binary.predict(X_test)  #predict
print(f"Hamming Loss:  {hamming_loss(y_test,pred2)}") #evaluate
#better performance?
#Yes! indeed the HL is lower!

Hamming Loss:  0.199438202247191


### **2.3.1 - Using Classifier Chain**

* Ensemble method used for multi-label classification, where labels are not independent but may have dependencies.

* It also prepares the data (transform the data) to be used by a normal classification model.

* Transforms multi-label classification into a sequence of binary classification problems:
  * Train one classifier per label.
  * Pass the predictions of previous classifiers as additional features to the next classifiers.


In [16]:
# Create Classifier Chain using SVC as base classifier
chain = ClassifierChain(classifier=SVC())
#train the model
chain.fit(X_train, y_train)
#predict
pred3 = chain.predict(X_test)
#evaluate
print(f"Hamming Loss:  {hamming_loss(y_test,pred3)}")

Hamming Loss:  0.2340823970037453


In [41]:
#H2
#using Randon Forest base model
from sklearn.ensemble import RandomForestClassifier
from sklearn.multioutput import ClassifierChain
# Define a base classifier (e.g., Random Forest)
base_model = RandomForestClassifier(n_estimators=100, random_state=42)
# Create Classifier Chain
chain_model = ClassifierChain(base_model, order='random')
# Train the model
chain_model.fit(X_train, y_train)

# Predict
pred4 = chain_model.predict(X_test)

# Convert sparse matrix to NumPy array (if needed)
if scipy.sparse.issparse(pred4):
    pred4 = pred4.toarray()

#Note:
#y_test is a binary indicator matrix (only 0s and 1s).
# pred4 is not in binary format—it contains continuous values (probabilities) instead of 0s and 1s.
# this happens because RandomForestClassifier outputs probabilities rather than hard label assignments.
# Convert predictions to binary format (0s and 1s)
pred4 = (pred4 >= 0.5).astype(int)  # Apply threshold at 0.5

# Compute Hamming Loss
hl = hamming_loss(y_test, pred4)

# Print first 5 predictions
# print(pred4[:5])

print(f"Hamming Loss: {hl:.4f}")

# ATTENTION:
# Even the HL is lower (performed better) it should not be compared, because the base model are different
# (SVC and RF)

Hamming Loss: 0.1910


### **2.3.1 - Using Label Powerset**

* Create a new class label for any common sequence of used classes
* Transforms the problem into a multi-class classification problem
* Instead of treating each label separately (like Binary Relevance) or chaining them (like Classifier Chain), Label Powerset treats each unique combination of labels as a single class.

>Works well when label dependencies are important

>✅ More efficient than training separate classifiers for each label

>🚫 Not good for large numbers of labels (too many unique combinations → too many classes)

In [42]:
#create instance of LabelPowerset, using SVC as base model
label = LabelPowerset(classifier = SVC())
#train
label.fit(X_train, y_train)
#predict
pred5 = label.predict(X_test)
#evaluate
print(f"Hamming Loss: {hamming_loss(y_test, pred5):.4f}")

Hamming Loss: 0.2210


## **2.4 - Comparing all strategies**

In [60]:
#Display all Hamming Loss results

data = {
    "MLARM": hamming_loss(y_test,pred1),
    "Binary Relevance": hamming_loss(y_test,pred2),
    "Classifier Chain": hamming_loss(y_test,pred3),
    "Label Powerset": hamming_loss(y_test,pred4),
    }

# Convert dictionary to DataFrame
df = pd.DataFrame(data, index=["All Hamming Loss"])

# Display DataFrame
#print(df)

from IPython.display import display
display(df)

Unnamed: 0,MLARM,Binary Relevance,Classifier Chain,Label Powerset
All Hamming Loss,0.25,0.2,0.23,0.19


## 2.5 - Final Remarks

* Use Classifier Chain if labels are dependent but you want a simple approach.
* Use Label Powerset if you strongly believe in label dependencies and have fewer unique label combinations.
* Use Binary Relevance for fast training when labels are mostly independent.

Explore the books:
*   *Machine Learning*, Tom M. Mitchel
*   *Mastering Machine Learning with Python in Six Steps*, M
Manohar Swamynathan

## 2.6 - Exercise

1. Implement a "pipeline" to process all the three alternatives, Binary Relevance (BR), Classifier Chain (CC) and Label Powerset, using SVC as the base model.
2. Computes Hamming Loss & Accuracy for comparison
3. Displays results in a pandas DataFrame (formatted as a table)
4. Try the same, but using Randon Forest as the base model


In [None]:
#Solution here
#use the same previous splits: X_train, X_test, y_train, y_test


In [None]:
#End!