Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sklearn 0.19.1 OverflowError: signed integer is greater than maximum #10937

Closed
xiaokc opened this issue Apr 8, 2018 · 6 comments
Closed

sklearn 0.19.1 OverflowError: signed integer is greater than maximum #10937

xiaokc opened this issue Apr 8, 2018 · 6 comments

Comments

@xiaokc
Copy link

xiaokc commented Apr 8, 2018

hi, I augment my data set in order to update my model, this error occurs after running a perioid of time, I saw this issue #9147
and this: #6183
I upgrade my version from 0.18 to 0.19.1, but it doesn't work. Now I don't know how to deal with it. Any thoughts? thanks very much.

Description

File "/usr/lib64/python2.7/site-packages/sklearn/feature_extraction/text.py", line 1361, in fit
X = super(TfidfVectorizer, self).fit_transform(raw_documents)
File "/usr/lib64/python2.7/site-packages/sklearn/feature_extraction/text.py", line 869, in fit_transform
self.fixed_vocabulary_)
File "/usr/lib64/python2.7/site-packages/sklearn/feature_extraction/text.py", line 805, in _count_vocab
indptr.append(len(j_indices))
OverflowError: signed integer is greater than maximum

@rth
Copy link
Member

rth commented Apr 8, 2018

This was fixed after the 0.19.1 was released so you would need to use the development version.

Once this large document term matrix is created several algorithms will still fail to work with 64 bit indexed sparse arrays cf #2969

@qinhanmin2014
Copy link
Member

@xiaokc Thanks for the issue. I think it's resolved in #9147, please try current master.
Please reopen with standalone snippet if the problem still exists.

@ghost
Copy link

ghost commented Apr 21, 2018

default
scikit 0.19.1

@qinhanmin2014
Copy link
Member

@KeL3vRa See above comments. It's resolved in master and will be included in the next release.

@skasturi
Copy link

@qinhanmin2014 , Can you please let me know if this is released?

@jnothman
Copy link
Member

Yes, the fix has been released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants