New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault with SVC with probability = True and degree 3 #15008
Comments
Please provide self-contained example code, including imports and data (if possible), so that other contributors can just run it and reproduce your issue. Ideally your example code should be minimal. |
There were known segfaults in the polynomial SVC kernel in the past #6687 that should have been resolved, but maybe not everything was. |
Dummy code import numpy as np
features = np.random.rand(2100000,768)
target = np.random.randint(low =100000,high=500000, size=2100000)
from sklearn.svm import SVC
svc = SVC(kernel='poly',gamma=100,C=10,degree=3,decision_function_shape='ovo',probability=True,random_state=42,verbose=True)
svc.fit(features,target) |
Any update on this ? are you guys able to re-produce the issue? |
Yes, I can reproduce the segfault with the above example. The backtrace is,
more investigation is needed though to understand what is exactly wrong in |
Given that it segfaults on random data with different initial seeds my guess it has more to with the large size of the data. |
We have tried on r5a.24xlarge (96 core,768 GiB) machine available on ec2 instance. gave same issue. |
I mean not an issue with hardware but that 2million*768 will overflow for an int32. So maybe some index needs to be changed from |
@rth that would be my hunch as well. |
There's also 400000 classes in the synthetic dataset. Does it still error with, say 2 or 10? It didn't segfault for me immediately with less classes but I don't have enough ram to actually try... |
The stdout before the segfault was,
so I imagine it means that the first 5 classes when through fine. |
I believe I have a fix for this. Just wondering if that code above should be added to the tests? It takes quite a while to run. |
I am using SVC with predict proba with below hyper-parameter on large data set. it is giving segmentation fault error. code works well for smaller data set.
I don't think its memory issue we are using 96 GB RAM. I have also monitored system using htop during run & I didn't gave impression of memory issue.
hyper-parameter :
SVC(kernel='poly',gamma=100,C=10,degree=3,decision_function_shape='ovo',probability=True,random_state=42,verbose=True)
No of Features : 768
No of rows : 2.1 million
The text was updated successfully, but these errors were encountered: