Change of category name #8

angrymeir · 2020-05-12T21:24:47Z

Description

The category names are changed in the learning process, this results in a mismatch between predicted category names and true category names.

Example

text = ["Document 1", "Document 2"]
groundtruth = ["Label 1", "Label 2"]

clf = SS3()
clf.fit(doc, groundtruth)

y_pred = clf.predict(doc)
print(y_pred) #["label 1", "label 2"]

Explanation

While training the categories are modified by .lower() here.
When calling .predict() the modified labels are returned here.

Why is this a problem

When calling .predict() with parameter labels=True (the default setting), the predicted category names have to be postprocessed for a direct comparison to the true category names.

Fix

Remove .lower() :)
However, I'm not entirely sure about the consequences for the rest of the project.

The text was updated successfully, but these errors were encountered:

sergioburdisso · 2020-05-15T14:16:26Z

Hi @angrymeir

yes, you're totally right, actually, I don't know why this "lower()ing" thing was added in the first place, I think it was added when this project was a prototype and was mostly used using the PySS3 Command Line Tool, to make things easier for the user while typing the category names.

But now, it makes no sense to automatically convert category names to lower case, it should be a user's decision not pyss3's. Once I finish working on Issue #5 I'll remove the lower() as you suggest and make sure that it does not negatively affect other parts of the library before releasing the new version (0.6.0) which will fully support multilabel classification. Speaking of which, I've just finished adding multilabel support to the Evaluation.test() (0a897dd), it now supports Hamming Loss metric along with all previous ones and also plots a binary confusion matrix for each possible label.

PySS3 now fully support multi-label classification! :) - The ``load_from_files_multilabel()`` function was added to the ``Dataset`` class (7ece7ce, resolved #6) - The ``Evaluation`` class now supports multi-label classification (#5) - Add multi-label support to ``train()/fit()`` (4d00476) - Add multi-label support to ``Evaluation.test()`` (0a897dd) - Add multi-label support to ``show_best and get_best()`` (ef2419b) - Add multi-label support to ``kfold_cross_validation()`` (aacd3a0) - Add multi-label support to ``grid_search()`` (925156d, 79f1e9d) - Add multi-label support to the 3D Evaluation Plot (42bbc65) - The Live Test tool now supports multi-label classification as well (15657ee, b617bb7, resolved #9) - Category names are no longer case-insensitive (4ec009a, resolved #8)

sergioburdisso added the enhancement New feature or request label May 13, 2020

sergioburdisso closed this as completed in 4ec009a May 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change of category name #8

Change of category name #8

angrymeir commented May 12, 2020

sergioburdisso commented May 15, 2020 •

edited

Change of category name #8

Change of category name #8

Comments

angrymeir commented May 12, 2020

Description

Example

Explanation

Why is this a problem

Fix

sergioburdisso commented May 15, 2020 • edited

sergioburdisso commented May 15, 2020 •

edited