Error while fitting GridSearchCV #1959

Closed
cheparukhin opened this Issue May 13, 2013 · 14 comments

Projects

None yet

4 participants

@cheparukhin
svm = Pipeline([
    ('chi2', SelectKBest(chi2)),
    ('svm', LinearSVC(class_weight='auto'))
])

vectorizer = TfidfVectorizer(input='filename')

classifier = Pipeline([
    ('vect', vectorizer),
    ('clf', OneVsRestClassifier(svm))
])

parameters = {
    'vect__min_df': (1, 2, 5),
    'vect__max_df': (0.5, 0.75, 1.0),
    'vect__ngram_range': ((1, 1), (1, 2), (1, 3)),
    'clf__estimator__chi2__k': (100, 1000, 'all'),
    'clf__estimator__svm__C': (1.0, 3.0, 10.0),
    'clf__estimator__svm__fit_intercept': (True, False)
}

data = ...
target = ...
hamming_loss = ...

grid = GridSearchCV(classifier, parameters, n_jobs=-1, verbose=1, scoring=Scorer(hamming_loss, greater_is_better=False))

grid.fit(data, target)

Traceback (most recent call last):
File "main.py", line 140, in
grid.fit(data, target)
File "/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.py", line 687, in fit
return self._fit(X, y, ParameterGrid(self.param_grid), **params)
File "/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.py", line 456, in _fit
parameter_iterator for train, test in cv)
File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 516, in call
self.retrieve()
File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 449, in retrieve
raise exception_type(report)
TypeError: function takes exactly 5 arguments (1 given)

@amueller
Member

Thanks for the report. Which version is this? Could you please try with n_jobs=1?

@cheparukhin

This is the last version from repository (d6e9598). Don't have enough time to try it with n_jobs=1, sorry (it takes a while).

@amueller
Member

Then I misunderstood the error. I though it would be on the first call to fit. Is it not?

@cheparukhin

No, it occurs somewhere in the middle. I suspect that some other error occurs, possibly not enough features for SelectKBest, and it's presented in such way.

@amueller
Member

Then it will be very hard to investigate without having the data. Could you try to produce a minimal example?

@jnothman
Member

If that traceback is complete, it looks like it's a problem in joblib, which seems to assume that exceptions can be constructed with a single argument (see parallel.py):

from sklearn.externals.joblib.my_exceptions import _mk_exception
class MyException(object):
    def __init__(self, a,b,c,d):
        pass
_mk_exception(MyException)[0]('this is the report')
TypeError: __init__() takes exactly 5 arguments (2 given)

The "1 given" makes me think I've got it wrong, or the exception in this case is a function (not a bound method).

@cheparukhin

Sorry, I wasn't able to reproduce the error in a minimalistic example.

@jnothman
Member

Looking at the error message again, it has to be a function, not an exception class, which means that if this is indeed the source of error, someone's very strangely raising a function as an exception...?

@amueller
Member

@jnothman I think the real error is hidden by joblib.
@cheparukhin If you can not produce a minimalistic example and you can not run it with a single job, then I have no idea how we could help you.

@amueller
Member

Maybe more precise: joblib hides the place where the error occurs.

@jnothman
Member

I think you must be right Andy... The exception type is retrieved from sys.exc_info, and I can't believe it could be a function.

@amueller
Member

Closing this as we can't reproduce and don't have a useful trace.

@amueller amueller closed this Jul 18, 2014
@vhermecz
vhermecz commented Oct 1, 2014

Also encountered this phenomenon, and managed to log the exception_type variable. It is <class 'sklearn.externals.joblib.my_exceptions.JoblibUnicodeDecodeError'>.

And indeed UnicodeDecodeError requires 5 params

>>> UnicodeDecodeError(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: function takes exactly 5 arguments (1 given)
@jnothman
Member
jnothman commented Oct 1, 2014

@vhermecz can you please report this at the joblib repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment