Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for string labels #14

Closed
michelole opened this issue Oct 12, 2020 · 3 comments · Fixed by #20
Closed

Support for string labels #14

michelole opened this issue Oct 12, 2020 · 3 comments · Fixed by #20
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@michelole
Copy link

skift seems to expect integer labels and will fail when using string labels.

For instance, when running

from skift import FirstColFtClassifier
import pandas as pd
df = pd.DataFrame(
    data=[
        ['woof', 'a'],
        ['meow', 'b'],
        ['squick', 'c'],
    ],
    columns=['txt', 'lbl'],
)
sk_clf = FirstColFtClassifier(lr=0.3, epoch=10)
sk_clf.fit(df[['txt']], df['lbl'])
sk_clf.predict([['squick']])

I get

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-32-52a73258e761> in <module>
----> 1 sk_clf.predict([['squick']])

/usr/local/Caskroom/miniconda/base/envs/base/lib/python3.7/site-packages/skift/core.py in predict(self, X)
    165         return np.array([
    166             self._clean_label(res[0][0])
--> 167             for res in self._predict(X)
    168         ], dtype=np.float_)
    169 

/usr/local/Caskroom/miniconda/base/envs/base/lib/python3.7/site-packages/skift/core.py in <listcomp>(.0)
    165         return np.array([
    166             self._clean_label(res[0][0])
--> 167             for res in self._predict(X)
    168         ], dtype=np.float_)
    169 

/usr/local/Caskroom/miniconda/base/envs/base/lib/python3.7/site-packages/skift/core.py in _clean_label(ft_label)
    135     @staticmethod
    136     def _clean_label(ft_label):
--> 137         return int(ft_label[9:])
    138 
    139     def _predict_on_str_arr(self, str_arr, k=1):

ValueError: invalid literal for int() with base 10: 'c'

This is a bit unexpected since neither sklearn nor fasttext require integer labels.

I guess skift could handle that either by:

  • passing the string labels directly to fasttext (caveat: might require some cleaning)
  • automatically calling LabelEncoder (e.g. as in sklearn's code for LR)
@shaypal5 shaypal5 added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Oct 14, 2020
@shaypal5
Copy link
Owner

Hmmm. Good point!

I would accept a PR solving this either way. Would you consider writing one? :)

@michelole
Copy link
Author

Sure, let me just find some spare cycles...

@shaypal5
Copy link
Owner

:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants