Multilabel classification is a machine learning task where an algorithm assigns multiple labels to each input instance. It predicts multiple output variables simultaneously (each variable can have one or more possible classes or labels).

example:

Movie	Genre

Movie 1	Action, Adventure

Movie 2	Drama, Romance

Movie 3	Comedy, Romance

Movie 4	Action, Drama, Thriller

Movie 5	Comedy

In this case, the multilabel classification task is to predict the genres for a new movie, given its features. The model needs to output multiple labels for each movie, indicating the genres that are relevant to it.

For instance, if you input the features of a new movie to the trained model, it might predict the following labels: Action, Drama, and Thriller. This means that the model believes the movie belongs to those genres based on its characteristics.

Multilabel classification is useful in various domains, such as text classification (e.g., assigning multiple tags to a document), image classification (e.g., identifying multiple objects in an image), and many other applications where multiple labels need to be assigned to each input instance.

classification report:

precision(p) = true positive/total positive

recall(r) = true positive/(true positive+false negative) = how many actual positives were predicted right

f1-score = 2pr/(p+r)

support = number of occurences of that class in the entire dataset

In [1]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import MultiLabelBinarizer

# Sample training data
documents = [
    "I love action movies and adventure.",
    "Romantic dramas always make me emotional.",
    "Comedy movies are my favorite, especially romantic comedies.",
    "Thriller movies keep me on the edge of my seat.",
    "I enjoy watching comedy films."
]

# Corresponding labels for each document
labels = [
    ['Action', 'Adventure'],
    ['Drama', 'Romance'],
    ['Comedy', 'Romance'],
    ['Action', 'Drama', 'Thriller'],
    ['Comedy']
]

# Vectorize the text data
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(documents)

# Convert the labels to binary array format
label_binarizer = MultiLabelBinarizer()
Y = label_binarizer.fit_transform(labels)

# Train the multilabel classifier
classifier = OneVsRestClassifier(SVC(kernel='linear'))
classifier.fit(X, Y)

# Calculate accuracy on the training data
train_predictions = classifier.predict(X)
train_accuracy = accuracy_score(Y, train_predictions)
print(f"Training accuracy: {train_accuracy}")

# Sample test data
test_documents = [
    "I like to watch action and adventure movies.",
    "I'm in the mood for a horror movie.",
    "I'm a fan of action-packed thrillers"
]

# Vectorize the test data
X_test = vectorizer.transform(test_documents)

# Predict the labels for the test data
predicted_labels = classifier.predict(X_test)

# Convert the predicted labels back to their corresponding categories
predicted_categories = label_binarizer.inverse_transform(predicted_labels)

# Print the predicted categories for each test document
for i, doc in enumerate(test_documents):
    print(f"Document: {doc}")
    print(f"Predicted categories: {', '.join(predicted_categories[i])}")
    print("----------")

# Generate a classification report
classification_report = classification_report(Y, train_predictions, target_names=label_binarizer.classes_)
print("Classification Report:")
print(classification_report)

Training accuracy: 1.0
Document: I like to watch action and adventure movies.
Predicted categories: Action, Adventure
----------
Document: I'm in the mood for a horror movie.
Predicted categories: Action
----------
Document: I'm a fan of action-packed thrillers
Predicted categories: Action
----------
Classification Report:
              precision    recall  f1-score   support

      Action       1.00      1.00      1.00         2
   Adventure       1.00      1.00      1.00         1
      Comedy       1.00      1.00      1.00         2
       Drama       1.00      1.00      1.00         2
     Romance       1.00      1.00      1.00         2
    Thriller       1.00      1.00      1.00         1

   micro avg       1.00      1.00      1.00        10
   macro avg       1.00      1.00      1.00        10
weighted avg       1.00      1.00      1.00        10
 samples avg       1.00      1.00      1.00        10

