Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

linear_model giving AttributeError: 'numpy.float64' object has no attribute 'exp' #3142

Closed
vm-wylbur opened this Issue · 18 comments

6 participants

Patrick Ball jnothman Gael Varoquaux Olivier Grisel Andreas Mueller Arnaud Joly
Patrick Ball

sklearn 0.14.1, and it happens only with a particular dataset I'm using, thus I'm not sure how to provide reproducible data, sorry.

Per the stacktrace, I've tracked the problem to the lines in linear_model/base.py. I think that LinearClassifierMixin.decision_function is returning an array of dtype=object which makes np.exp() fail. None of the values in the array look to my eye like anything other than floats. Casting the array explicitly as a float (as the commented line shows) allows predict_proba to exponentiate.

There might be something happening in decision_function such it returns a non-float result, but I can't spot it. thanks -- PB.

from sklearn.linear_model import LogisticRegression
clf_lr = LogisticRegression(penalty='l1')
full = clf_lr.fit(X, Y)
probs = full.predict_proba(X)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-72-2db71751c630> in <module>()
      2 clf_lr = LogisticRegression(penalty='l1')
      3 full = clf_lr.fit(X, Y)
----> 4 probs = full.predict_proba(X)

/Users/pball/miniconda3/lib/python3.3/site-packages/sklearn/linear_model/logistic.py in predict_proba(self, X)
    120             where classes are ordered as they are in ``self.classes_``.
    121         """
--> 122         return self._predict_proba_lr(X)
    123 
    124     def predict_log_proba(self, X):

/Users/pball/miniconda3/lib/python3.3/site-packages/sklearn/linear_model/base.py in _predict_proba_lr(self, X)
    237         prob = self.decision_function(X)
    238         # PB hack
--> 239         # prob = prob.astype(float)
    240         prob *= -1
    241         np.exp(prob, prob)

AttributeError: 'numpy.float64' object has no attribute 'exp'
Patrick Ball

Similar problem with GaussianNB. In utils/extmath.py, both vmax and arr are turning up as dtype=object. I cast them to floats like this:

    vtmp = (arr - vmax).astype(float)
    # out = np.log(np.sum(np.exp(arr - vmax), axis=0))
    out = np.log(np.sum(np.exp(vtmp), axis=0))

numpy complains that I'm casting between incompatible kinds. But with this bug, I get the same error (AttributeError: 'numpy.float64' object has no attribute 'exp') in sklearn/naive_bayes.py at line 99, which I paper over with this at line 83:

        return (jll - np.atleast_2d(log_prob_x).T).astype(float)

The data are all floats when they go in, I've got no idea what's up.

Patrick Ball

Now in QDA:

Traceback (most recent call last):
  File "src/train.py", line 97, in <module>
    probs = full.predict_proba(X)[:, 1]
  File "/Users/pball/miniconda3/lib/python3.3/site-packages/sklearn/qda.py", line 221, in predict_proba
    likelihood = np.exp(values - values.max(axis=1)[:, np.newaxis])
AttributeError: 'numpy.float64' object has no attribute 'exp'

yet another monkey patch:

        # compute the likelihood of the underlying gaussian models
        # up to a multiplicative constant.
        norm_values = (values - values.max(axis=1)[:, np.newaxis]).astype(float)
        # likelihood = np.exp(values - values.max(axis=1)[:, np.newaxis])
        likelihood = np.exp(norm_values)
        # compute posterior probabilities
        return likelihood / likelihood.sum(axis=1)[:, np.newaxis]

And it runs. But curiously, numpy/linalg/linalg.py is giving a ton of DeprecationWarnings about implicitly casting between incompatible kinds around this line:

  u, s, vt = gufunc(a, signature=signature, extobj=extobj)
jnothman
Owner
Patrick Ball

@jnothman changing the .astype(float) to .astype(np.float64) per the StackOverflow note makes more sense. I've done it and it runs with sensible answers. thx.

jnothman
Owner

So if I understand the numpy issue correctly, this come about by overflowing integer (numpy.int64), rather than float precision. Have you identified where that's happening? Is it within scikit-learn code or yours?

Patrick Ball

@jnothman In each case (LinearRegression, GNB, QDA) the error occurs when the class's predict_proba method is operating on the return from self.decision_function. I suppose my data could be passing something weird to decision_function, but I'm not sure what it would be (I'd be happy to do tests if there are ideas in that direction). I'm giving decision_function float64 (1593082, 22) columns from a pandas/numpy structure that was saved on disk by pandas in hdf5 (not sure if any of that is relevant, but FWIW). Whatever I'm passing works fine in decision_function with safe_sparse_dot (for example) but crashes np.exp() in {_}pred_proba*. I'm inclined to think that the dtype might be checked or recast by decision_function when it gets X. Sensible?

Arnaud Joly arjoly added the Bug label
Gael Varoquaux
Patrick Ball

Ok, will do. I'm a git/github n00b, but will try to put together tweaks+test later this week.

jnothman
Owner

It should come with a test case, but what we need is a minimal example of data/model that breaks, for which the code snippets above don't suffice. Then we might be able to work out what the best solution is...

Patrick Ball

Ok, found it, and it was bad data. Several boolean columns became string 'True'/'False' in the write/read through hdf5. Some classifiers (LinearRegression, GNB, QDA) were willing to fit the model with these as independent variables (that doesn't make sense to me), but then the predict_proba method crashed. Checking that every col is float fixes the issue. It's a weird error message, though. Thanks for your help.

Olivier Grisel
Owner

It's weird, I thought the line X, y = check_arrays(X, y, sparse_format='dense') in the fit method of GNB for instance would raise an exception on such data.

jnothman
Owner
Olivier Grisel
Owner

Had you expected the _assert_all_finite call to catch it?

My bad, for some reason I thought check_arrays would reject arrays with an object dtype by default but this is not the case. The object dtype is actually useful for inputs that are arrays of string objects.

Input validation routines should probably be improved to output more informative error messages but I cannot think of a simple and clean way to do it.

Andreas Mueller amueller added this to the 0.15.1 milestone
Andreas Mueller
Owner

How about if force_array=True we also ensure that the dtype is numeric?

Andreas Mueller
Owner

We need to refactor the input validation before we can do this. I think we should refactor the input validation anyhow.

Gael Varoquaux
Andreas Mueller
Owner

We should probably make a list of things that we want to check and possible combinations. See #3440.

Andreas Mueller
Owner

Closed by #4057.

Andreas Mueller amueller closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.