Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-label support #13

Open
shaypal5 opened this issue Aug 10, 2020 · 4 comments
Open

Add multi-label support #13

shaypal5 opened this issue Aug 10, 2020 · 4 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@shaypal5
Copy link
Owner

Add support to providing multi-label labels in a scikit-learn-compliant format, utilizing (under the hood) fasttext's support for multi-label scenarios.

@shaypal5 shaypal5 added enhancement New feature or request help wanted Extra attention is needed labels Aug 10, 2020
@shaypal5 shaypal5 self-assigned this Aug 10, 2020
@e412
Copy link

e412 commented Jul 11, 2022

Hi, is this implemented? I am having issues with the Multilabel Case. The Transformation with MultiLabelBinarizer leads to the error: ValueError: FastTextClassifier methods must get a one-dimensional numpy array as the y parameter.

What can I do?

Thank you very much.

@e412
Copy link

e412 commented Jul 11, 2022

Or do you have any other recommendation how to Cross Validate the Results of fastText supervised training (MultiLabel)? I am looking for a solution for weeks now... Any help is very much appreciated.

Kind Regards,
Eva

@shaypal5
Copy link
Owner Author

Hey Eva!

I'll try to help you as best as I can. However, I don't have the time to implement it right now. I can guide you through contributing the code yourself. :)

First, as the issue is open, it shouldn't come as a surprise that this isn't implemented.

As as you can see in this example file from the FastText tutorial for text classification, this is the format for multilabel problems:

__label__sauce __label__cheese How much does potato starch affect a cheese sauce recipe?
__label__food-safety __label__acidity Dangerous pathogens capable of growing in acidic environments
__label__cast-iron __label__stove How do I cover up the white spots on my cast iron stove?

So, very much like the multiclass format, just with multiple __label__ tags at the start of each line.

Two main areas of code in skift require adaptation for multilabel problems to be supported:

  1. The FtClassifierABC class must be adapted to accept y arguments that are also of shape (n_samples, n_outputs), as in sklearn. This includes such methods as _validate_y and fit.
y: array-like of shape (n_samples,) or (n_samples, n_outputs)
  1. The util.dump_xy_to_fasttext_format() function must be adapted to properly dump multilabel targets, in the format I linked to above.

@e412
Copy link

e412 commented Jul 12, 2022

Hi Shaypal,

thanks a lot for replying!

I already got the correct format in my data. But unfortunately I dont think I am able to implement the feature by myself.

Do you by any chance have some experience perfoming a cross validation on the outcome of fasttext supervised training? Because that is the reason I was looking into this wrapper class. I couldnt find a lot of up to date information regarding validation of fasttext.

Cheers
Eva

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants