Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change of category name #8

Closed
angrymeir opened this issue May 12, 2020 · 1 comment
Closed

Change of category name #8

angrymeir opened this issue May 12, 2020 · 1 comment
Labels
enhancement New feature or request

Comments

@angrymeir
Copy link
Collaborator

Description

The category names are changed in the learning process, this results in a mismatch between predicted category names and true category names.

Example

text = ["Document 1", "Document 2"]
groundtruth = ["Label 1", "Label 2"]

clf = SS3()
clf.fit(doc, groundtruth)

y_pred = clf.predict(doc)
print(y_pred) #["label 1", "label 2"]

Explanation

While training the categories are modified by .lower() here.
When calling .predict() the modified labels are returned here.

Why is this a problem

When calling .predict() with parameter labels=True (the default setting), the predicted category names have to be postprocessed for a direct comparison to the true category names.

Fix

Remove .lower() :)
However, I'm not entirely sure about the consequences for the rest of the project.

@sergioburdisso sergioburdisso added the enhancement New feature or request label May 13, 2020
@sergioburdisso
Copy link
Owner

sergioburdisso commented May 15, 2020

Hi @angrymeir

yes, you're totally right, actually, I don't know why this "lower()ing" thing was added in the first place, I think it was added when this project was a prototype and was mostly used using the PySS3 Command Line Tool, to make things easier for the user while typing the category names.

But now, it makes no sense to automatically convert category names to lower case, it should be a user's decision not pyss3's. Once I finish working on Issue #5 I'll remove the lower() as you suggest and make sure that it does not negatively affect other parts of the library before releasing the new version (0.6.0) which will fully support multilabel classification. Speaking of which, I've just finished adding multilabel support to the Evaluation.test() (0a897dd), it now supports Hamming Loss metric along with all previous ones and also plots a binary confusion matrix for each possible label.

sergioburdisso added a commit that referenced this issue May 24, 2020
PySS3 now fully support multi-label classification! :)

- The ``load_from_files_multilabel()`` function was added to the
  ``Dataset`` class (7ece7ce, resolved #6)

- The ``Evaluation`` class now supports multi-label classification (#5)
  - Add multi-label support to ``train()/fit()`` (4d00476)
  - Add multi-label support to ``Evaluation.test()`` (0a897dd)
  - Add multi-label support to ``show_best and get_best()`` (ef2419b)
  - Add multi-label support to ``kfold_cross_validation()`` (aacd3a0)
  - Add multi-label support to ``grid_search()`` (925156d, 79f1e9d)
  - Add multi-label support to the 3D Evaluation Plot (42bbc65)

- The Live Test tool now supports multi-label classification as well
  (15657ee, b617bb7, resolved #9)

- Category names are no longer case-insensitive (4ec009a, resolved #8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants