<div style="line-height:0.5">
<h1 style="color:#F3F326 "> Multioutput classification 1 </h1>
</div>
<div style="line-height:1.5">
<h4> Example of multilabel classification with various classifiers as basis. 
</h4>
<br>
<div style="margin-top: -50px;">
<span style="display: inline-block;">
    <h3 style="color: lightblue; display: inline;">Keywords:</h3> jaccard_score + zero_one_loss + zero_division=1 option 
</span>
</div>
</div>

In [56]:
%%script echo Skipping since already installed
!pip install -U scikit-learn

Skipping since already installed


In [57]:
import numpy as np

from sklearn.svm import SVC 
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.multioutput import MultiOutputClassifier

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_multilabel_classification
from sklearn.metrics import hamming_loss, jaccard_score, f1_score, classification_report, zero_one_loss

In [58]:
# Generate a synthetic dataset
X, y = make_multilabel_classification(n_samples=10000, n_features=20, n_classes=3, n_labels=2, random_state=42)
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=98)
type(X_train), X_test[:6], y

(numpy.ndarray,
 array([[0., 1., 0., 0., 0., 3., 2., 8., 4., 1., 4., 0., 0., 5., 2., 0.,
         6., 5., 0., 5.],
        [2., 3., 0., 2., 0., 1., 2., 6., 3., 2., 7., 0., 0., 7., 2., 0.,
         6., 2., 0., 1.],
        [1., 0., 7., 0., 1., 3., 0., 1., 0., 2., 5., 2., 1., 1., 2., 2.,
         3., 4., 2., 4.],
        [3., 4., 3., 4., 1., 5., 3., 1., 4., 1., 4., 2., 1., 0., 5., 5.,
         6., 3., 1., 6.],
        [2., 3., 0., 0., 1., 1., 1., 1., 1., 2., 7., 0., 0., 4., 2., 1.,
         6., 5., 0., 2.],
        [0., 5., 1., 4., 2., 3., 0., 1., 2., 1., 6., 0., 0., 1., 2., 2.,
         3., 4., 0., 0.]]),
 array([[0, 1, 0],
        [0, 0, 0],
        [1, 1, 1],
        ...,
        [0, 0, 0],
        [0, 1, 0],
        [0, 0, 0]]))

In [59]:
# Find the sum along axis 1 (columns) and compare it to 0 to create a boolean array
zero_rows = np.sum(y, axis=1) == 0
zero_row_indices = np.where(zero_rows)[0]

y[:10], zero_rows[:10], zero_row_indices[:2]

(array([[0, 1, 0],
        [0, 0, 0],
        [1, 1, 1],
        [1, 1, 0],
        [0, 1, 0],
        [1, 1, 0],
        [1, 1, 1],
        [0, 0, 0],
        [1, 1, 1],
        [0, 1, 1]]),
 array([False,  True, False, False, False, False, False,  True, False,
        False]),
 array([1, 7]))

<h2 style="color:#F3F326 "> #1 with K-Nearest Neighbors </h2>

In [60]:
# Initialize the base classifier
knn = KNeighborsClassifier(n_neighbors=3)
# Create the MultiOutputClassifier
multi_target_knn = MultiOutputClassifier(knn, n_jobs=-1)

In [61]:
# Train the MultiOutputClassifier
multi_target_knn.fit(X_train, y_train)

# Predicting
predictions = multi_target_knn.predict(X_test)
predictions[:7]

array([[0, 1, 0],
       [0, 1, 0],
       [1, 0, 1],
       [1, 0, 1],
       [0, 1, 0],
       [1, 1, 1],
       [1, 1, 1]])

<h4 style="color:#F3F326 "> => Model evaluation with metrics #1 </h4>

In [62]:
# Accuracy 
accuracy = accuracy_score(y_test, predictions)
# Hamming Loss
hamming_loss_value = hamming_loss(y_test, predictions)
# Jaccard Score
jaccard_score_value = jaccard_score(y_test, predictions, average='samples', zero_division=1)
# F1 Score
f1_score_value = f1_score(y_test, predictions, average='micro', zero_division=1)
# Zero-one Loss
zero_one_loss_value = zero_one_loss(y_test, predictions)

print(f"Model Accuracy is: {accuracy * 100:.2f}%")
print(f"Hamming Loss is: {hamming_loss_value:.2f}")
print(f"Jaccard Score is: {jaccard_score_value:.2f}")
print(f"F1 Score is: {f1_score_value:.2f}")
print(f"Zero-one Loss is: {zero_one_loss_value:.2f}")

Model Accuracy is: 66.35%
Hamming Loss is: 0.15
Jaccard Score is: 0.81
F1 Score is: 0.85
Zero-one Loss is: 0.34


In [63]:
# Classification Report
report = classification_report(y_test, predictions, zero_division=1)
print("Classification Report:")
print(report)

Classification Report:
              precision    recall  f1-score   support

           0       0.77      0.76      0.77       828
           1       0.88      0.94      0.91      1184
           2       0.84      0.88      0.86      1098

   micro avg       0.84      0.87      0.85      3110
   macro avg       0.83      0.86      0.84      3110
weighted avg       0.84      0.87      0.85      3110
 samples avg       0.89      0.92      0.85      3110



<h3 style="color:#F3F326 "> Notes: </h3> 
<div style="margin-top: -20px;">
Certain metrics like Jaccard, Precision, Recall, and F-score can be undefined in multilabel classification scenarios. <br>
Undefined scenarios in multilabel classification are quite common in multilabel classification tasks due to the complexity and imbalance often present in the label space. <br>
In fact, the presence of samples with all-zero labels (either in prediction or ground truth) makes the denominators of the formulas zero, due to the lacking of TP / TN or FP / FN. <br>
Moreover, there are cases where there’s no intersection or union between the predicted and true labels. <br>

Since those metrics can be be misleading, the 'UndefinedMetricWarning' is triggered! <br> 
To avoid the warning...set 'zero_division' option to 1!
</div>

<h2 style="color:#F3F326 "> #2 with Decision Tree </h2>

In [64]:
# Initialize the base classifier
dt_classifier = DecisionTreeClassifier(random_state=42)
# Initialize the multi-output classifier
dt_multioutput = MultiOutputClassifier(dt_classifier)

# Train
dt_multioutput.fit(X_train, y_train)

In [65]:
y_pred = dt_multioutput.predict(X_test)
y_pred[:9]

array([[0, 1, 0],
       [0, 1, 0],
       [1, 0, 1],
       [1, 1, 1],
       [0, 1, 0],
       [1, 1, 0],
       [1, 1, 1],
       [1, 1, 0],
       [1, 1, 1]])

<h4 style="color:#F3F326 "> => Model evaluation with metrics #2 </h4>

In [66]:
# Accuracy 
accuracy = accuracy_score(y_test, y_pred)
# Hamming Loss
hamming_loss_value = hamming_loss(y_test, y_pred)
# Jaccard Score
jaccard_score_value = jaccard_score(y_test, y_pred, average='samples', zero_division=1)
# F1 Score
f1_score_value = f1_score(y_test, y_pred, average='micro', zero_division=1)
# Zero-one Loss
zero_one_loss_value = zero_one_loss(y_test, y_pred)

print(f"Model Accuracy is: {accuracy * 100:.2f}%")
print(f"Hamming Loss is: {hamming_loss_value:.2f}")
print(f"Jaccard Score is: {jaccard_score_value:.2f}")
print(f"F1 Score is: {f1_score_value:.2f}")
print(f"Zero-one Loss is: {zero_one_loss_value:.2f}")

Model Accuracy is: 52.10%
Hamming Loss is: 0.21
Jaccard Score is: 0.71
F1 Score is: 0.80
Zero-one Loss is: 0.48


In [67]:
# Classification Report
report = classification_report(y_test, y_pred, zero_division=1)
print("Classification Report:")
print(report)

Classification Report:
              precision    recall  f1-score   support

           0       0.69      0.69      0.69       828
           1       0.85      0.88      0.86      1184
           2       0.79      0.81      0.80      1098

   micro avg       0.79      0.80      0.80      3110
   macro avg       0.78      0.79      0.78      3110
weighted avg       0.79      0.80      0.80      3110
 samples avg       0.82      0.87      0.76      3110



<h2 style="color:#F3F326 "> #3 with Support Vector Machine </h2>

In [68]:
# Initialize the base classifier
svc_classifier = SVC(probability=True, random_state=42)

# Initialize the multi-output classifier
svc_multioutput = MultiOutputClassifier(svc_classifier)

# Train 
svc_multioutput.fit(X_train, y_train);  #add semicolon to avoid output 

In [69]:
y_pred = dt_multioutput.predict(X_test)
y_pred[:9]

array([[0, 1, 0],
       [0, 1, 0],
       [1, 0, 1],
       [1, 1, 1],
       [0, 1, 0],
       [1, 1, 0],
       [1, 1, 1],
       [1, 1, 0],
       [1, 1, 1]])

<h4 style="color:#F3F326 "> => Model evaluation with metrics </h4>

In [70]:
# Accuracy 
accuracy = accuracy_score(y_test, predictions)
# Hamming Loss
hamming_loss_value = hamming_loss(y_test, predictions)
# Jaccard Score
jaccard_score_value = jaccard_score(y_test, predictions, average='samples', zero_division=1)
# F1 Score
f1_score_value = f1_score(y_test, predictions, average='micro', zero_division=1)
# Zero-one Loss
zero_one_loss_value = zero_one_loss(y_test, predictions)

print(f"Model Accuracy is: {accuracy * 100:.2f}%")
print(f"Hamming Loss is: {hamming_loss_value:.2f}")
print(f"Jaccard Score is: {jaccard_score_value:.2f}")
print(f"F1 Score is: {f1_score_value:.2f}")
print(f"Zero-one Loss is: {zero_one_loss_value:.2f}")

Model Accuracy is: 66.35%
Hamming Loss is: 0.15
Jaccard Score is: 0.81
F1 Score is: 0.85
Zero-one Loss is: 0.34


In [71]:
report = classification_report(y_test, y_pred, zero_division=1)
print("Classification Report:")
print(report)

Classification Report:
              precision    recall  f1-score   support

           0       0.69      0.69      0.69       828
           1       0.85      0.88      0.86      1184
           2       0.79      0.81      0.80      1098

   micro avg       0.79      0.80      0.80      3110
   macro avg       0.78      0.79      0.78      3110
weighted avg       0.79      0.80      0.80      3110
 samples avg       0.82      0.87      0.76      3110



<h2 style="color:#F3F326 "> #4 with Logistic Regression </h2>

In [72]:
# Initialize the base classifier
lr_classifier = LogisticRegression(solver='lbfgs', random_state=42)
# Initialize the multi-output classifier
lr_multioutput = MultiOutputClassifier(lr_classifier)
# Train the classifier
lr_multioutput.fit(X_train, y_train);

In [73]:
y_pred = dt_multioutput.predict(X_test)
y_pred[:9]

array([[0, 1, 0],
       [0, 1, 0],
       [1, 0, 1],
       [1, 1, 1],
       [0, 1, 0],
       [1, 1, 0],
       [1, 1, 1],
       [1, 1, 0],
       [1, 1, 1]])

<h4 style="color:#F3F326 "> => Model evaluation with metrics #4 </h4>

In [74]:
# Accuracy 
accuracy = accuracy_score(y_test, predictions)
# Hamming Loss
hamming_loss_value = hamming_loss(y_test, predictions)
# Jaccard Score
jaccard_score_value = jaccard_score(y_test, predictions, average='samples', zero_division=1)
# F1 Score
f1_score_value = f1_score(y_test, predictions, average='micro', zero_division=1)
# Zero-one Loss
zero_one_loss_value = zero_one_loss(y_test, predictions)

print(f"Model Accuracy is: {accuracy * 100:.2f}%")
print(f"Hamming Loss is: {hamming_loss_value:.2f}")
print(f"Jaccard Score is: {jaccard_score_value:.2f}")
print(f"F1 Score is: {f1_score_value:.2f}")
print(f"Zero-one Loss is: {zero_one_loss_value:.2f}")

Model Accuracy is: 66.35%
Hamming Loss is: 0.15
Jaccard Score is: 0.81
F1 Score is: 0.85
Zero-one Loss is: 0.34


In [75]:
report = classification_report(y_test, y_pred, zero_division=1)
print("Classification Report:")
print(report)

Classification Report:
              precision    recall  f1-score   support

           0       0.69      0.69      0.69       828
           1       0.85      0.88      0.86      1184
           2       0.79      0.81      0.80      1098

   micro avg       0.79      0.80      0.80      3110
   macro avg       0.78      0.79      0.78      3110
weighted avg       0.79      0.80      0.80      3110
 samples avg       0.82      0.87      0.76      3110



<h2 style="color:#F3F326 "> #5 with Random Forest </h2>

In [76]:
# Initialize the base classifier
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# Initialize the multi-output classifier
rf_multioutput = MultiOutputClassifier(rf_classifier)

# Train the classifier
rf_multioutput.fit(X_train, y_train);

In [77]:
y_pred = dt_multioutput.predict(X_test)
y_pred[:9]

array([[0, 1, 0],
       [0, 1, 0],
       [1, 0, 1],
       [1, 1, 1],
       [0, 1, 0],
       [1, 1, 0],
       [1, 1, 1],
       [1, 1, 0],
       [1, 1, 1]])

<h4 style="color:#F3F326 "> => Model evaluation with metrics #5 </h4>

In [78]:
# Accuracy 
accuracy = accuracy_score(y_test, predictions)
# Hamming Loss
hamming_loss_value = hamming_loss(y_test, predictions)
# Jaccard Score
jaccard_score_value = jaccard_score(y_test, predictions, average='samples', zero_division=1)
# F1 Score
f1_score_value = f1_score(y_test, predictions, average='micro', zero_division=1)
# Zero-one Loss
zero_one_loss_value = zero_one_loss(y_test, predictions)

print(f"Model Accuracy is: {accuracy * 100:.2f}%")
print(f"Hamming Loss is: {hamming_loss_value:.2f}")
print(f"Jaccard Score is: {jaccard_score_value:.2f}")
print(f"F1 Score is: {f1_score_value:.2f}")
print(f"Zero-one Loss is: {zero_one_loss_value:.2f}")

Model Accuracy is: 66.35%
Hamming Loss is: 0.15
Jaccard Score is: 0.81
F1 Score is: 0.85
Zero-one Loss is: 0.34


In [79]:
report = classification_report(y_test, y_pred, zero_division=1)
print("Classification Report:")
print(report)

Classification Report:
              precision    recall  f1-score   support

           0       0.69      0.69      0.69       828
           1       0.85      0.88      0.86      1184
           2       0.79      0.81      0.80      1098

   micro avg       0.79      0.80      0.80      3110
   macro avg       0.78      0.79      0.78      3110
weighted avg       0.79      0.80      0.80      3110
 samples avg       0.82      0.87      0.76      3110



<h2 style="color:#F3F326 "> #6 withGradient Boosting </h2>

In [80]:
# Initialize the base classifier
gb_classifier = GradientBoostingClassifier(n_estimators=100, random_state=42)
# Initialize the multi-output classifier
gb_multioutput = MultiOutputClassifier(gb_classifier)

# Train the classifier
gb_multioutput.fit(X_train, y_train);

In [81]:
y_pred = dt_multioutput.predict(X_test)
y_pred[:9]

array([[0, 1, 0],
       [0, 1, 0],
       [1, 0, 1],
       [1, 1, 1],
       [0, 1, 0],
       [1, 1, 0],
       [1, 1, 1],
       [1, 1, 0],
       [1, 1, 1]])

<h4 style="color:#F3F326 "> => Model evaluation with metrics #6 </h4>

In [82]:
# Accuracy 
accuracy = accuracy_score(y_test, predictions)
# Hamming Loss
hamming_loss_value = hamming_loss(y_test, predictions)
# Jaccard Score
jaccard_score_value = jaccard_score(y_test, predictions, average='samples', zero_division=1)
# F1 Score
f1_score_value = f1_score(y_test, predictions, average='micro', zero_division=1)
# Zero-one Loss
zero_one_loss_value = zero_one_loss(y_test, predictions)

print(f"Model Accuracy is: {accuracy * 100:.2f}%")
print(f"Hamming Loss is: {hamming_loss_value:.2f}")
print(f"Jaccard Score is: {jaccard_score_value:.2f}")
print(f"F1 Score is: {f1_score_value:.2f}")
print(f"Zero-one Loss is: {zero_one_loss_value:.2f}")

Model Accuracy is: 66.35%
Hamming Loss is: 0.15
Jaccard Score is: 0.81
F1 Score is: 0.85
Zero-one Loss is: 0.34


In [83]:
report = classification_report(y_test, y_pred, zero_division=1)
print("Classification Report:")
print(report)

Classification Report:
              precision    recall  f1-score   support

           0       0.69      0.69      0.69       828
           1       0.85      0.88      0.86      1184
           2       0.79      0.81      0.80      1098

   micro avg       0.79      0.80      0.80      3110
   macro avg       0.78      0.79      0.78      3110
weighted avg       0.79      0.80      0.80      3110
 samples avg       0.82      0.87      0.76      3110

