Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLkNN breaks when sklearn >= 1.0 #250

Open
alexeyev opened this issue Oct 31, 2022 · 3 comments
Open

MLkNN breaks when sklearn >= 1.0 #250

alexeyev opened this issue Oct 31, 2022 · 3 comments

Comments

@alexeyev
Copy link

Hi, dear colleagues, thank you for your work.

This is a bug report.

With scikit-learn==0.24.0, this code (X_train is a dense 2D numpy array, y_train is a sparse scipy matrix with the same number of rows) works:

classifier = MLkNN(k=2)
classifier.fit(X=X_train, y=y_train)

With scikit-learn==1.0 and scikit-learn==1.1.3, however, I get:

    classifier.fit(X=X_train, y=y_train)
  File "C:\Users\<me>\AppData\Local\Programs\Python\Python38\lib\site-packages\skmultilearn\adapt\mlknn.py", line 218, in fit
    self._cond_prob_true, self._cond_prob_false = self._compute_cond(X, self._label_cache)
  File "C:\Users\<me>\AppData\Local\Programs\Python\Python38\lib\site-packages\skmultilearn\adapt\mlknn.py", line 165, in _compute_cond
    self.knn_ = NearestNeighbors(self.k).fit(X)
TypeError: __init__() takes 1 positional argument but 2 were given

Thanks.

@jamesee
Copy link

jamesee commented Feb 20, 2023

I faced the same problem.

The following is the example given.
my sklearn version :
Screenshot 2023-02-20 at 6 44 54 PM

from skmultilearn.adapt import MLkNN
from sklearn.model_selection import GridSearchCV
x, y = make_multilabel_classification(n_samples=10000, n_features=20,
                                      n_classes=5, random_state=88)

parameters = {'k': range(1,3), 's': [0.5, 0.7, 1.0]}

clf = GridSearchCV(MLkNN(), parameters, scoring='f1_macro')
clf.fit(x, y)

print (clf.best_params_, clf.best_score_)

stacktrace as follows:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[56], line 12
      9 parameters = {'k': range(1,3), 's': [0.5, 0.7, 1.0]}
     11 clf = GridSearchCV(MLkNN(), parameters, scoring='f1_macro')
---> 12 clf.fit(x, y)
     14 print (clf.best_params_, clf.best_score_)

File ~/miniconda3/envs/torch/lib/python3.9/site-packages/sklearn/model_selection/_search.py:875, in BaseSearchCV.fit(self, X, y, groups, **fit_params)
    869     results = self._format_results(
    870         all_candidate_params, n_splits, all_out, all_more_results
    871     )
    873     return results
--> 875 self._run_search(evaluate_candidates)
    877 # multimetric is determined here because in the case of a callable
    878 # self.scoring the return type is only known after calling
    879 first_test_score = all_out[0]["test_scores"]

File ~/miniconda3/envs/torch/lib/python3.9/site-packages/sklearn/model_selection/_search.py:1389, in GridSearchCV._run_search(self, evaluate_candidates)
   1387 def _run_search(self, evaluate_candidates):
   1388     """Search all candidates in param_grid"""
-> 1389     evaluate_candidates(ParameterGrid(self.param_grid))

File ~/miniconda3/envs/torch/lib/python3.9/site-packages/sklearn/model_selection/_search.py:852, in BaseSearchCV.fit.<locals>.evaluate_candidates(candidate_params, cv, more_results)
    845 elif len(out) != n_candidates * n_splits:
    846     raise ValueError(
    847         "cv.split and cv.get_n_splits returned "
    848         "inconsistent results. Expected {} "
    849         "splits, got {}".format(n_splits, len(out) // n_candidates)
    850     )
--> 852 _warn_or_raise_about_fit_failures(out, self.error_score)
    854 # For callable self.scoring, the return type is only know after
    855 # calling. If the return type is a dictionary, the error scores
    856 # can now be inserted with the correct key. The type checking
    857 # of out will be done in `_insert_error_scores`.
    858 if callable(self.scoring):

File ~/miniconda3/envs/torch/lib/python3.9/site-packages/sklearn/model_selection/_validation.py:367, in _warn_or_raise_about_fit_failures(results, error_score)
    360 if num_failed_fits == num_fits:
    361     all_fits_failed_message = (
    362         f"\nAll the {num_fits} fits failed.\n"
    363         "It is very likely that your model is misconfigured.\n"
    364         "You can try to debug the error by setting error_score='raise'.\n\n"
    365         f"Below are more details about the failures:\n{fit_errors_summary}"
    366     )
--> 367     raise ValueError(all_fits_failed_message)
    369 else:
    370     some_fits_failed_message = (
    371         f"\n{num_failed_fits} fits failed out of a total of {num_fits}.\n"
    372         "The score on these train-test partitions for these parameters"
   (...)
    376         f"Below are more details about the failures:\n{fit_errors_summary}"
    377     )

ValueError: 
All the 30 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
30 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/james/miniconda3/envs/torch/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/james/miniconda3/envs/torch/lib/python3.9/site-packages/skmultilearn/adapt/mlknn.py", line 218, in fit
    self._cond_prob_true, self._cond_prob_false = self._compute_cond(X, self._label_cache)
  File "/Users/james/miniconda3/envs/torch/lib/python3.9/site-packages/skmultilearn/adapt/mlknn.py", line 165, in _compute_cond
    self.knn_ = NearestNeighbors(self.k).fit(X)
TypeError: __init__() takes 1 positional argument but 2 were given

@ChristianSch
Copy link
Member

pull requests are welcome 🎉

@northern-64bit
Copy link

The issue is the following row:

self.knn_ = NearestNeighbors(self.k).fit(X)

It has to be changed to:

self.knn_ = NearestNeighbors(n_neighbors=self.k, n_jobs=self.n_jobs)

Note that this has been fixed in https://github.com/scikit-multilearn-ng/scikit-multilearn-ng/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants