Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
In Scikit-learn version 0.23.2 calling the predict() method maliciously crafted model SVM can result in a segmentation fault. Such models can be introduced via pickle, json, or any other model permanence standard. The behaviour is triggered when one of the members of the _n_support array has a very large value, example 1000000 when calling libsvm.predict()
Steps/Code to Reproduce
from sklearn import svm from sklearn import datasets if __name__ == '__main__': X,y = datasets.load_iris(return_X_y=True) clf = svm.SVC() clf.fit(X, y) clf._n_support = 1000000 y_pred = clf.predict(X)
not to fail
Segmentation fault, this is a debugger trace
Built with OpenMP: True
It's expected that manually changing a private attribute will lead to wrong results, or even segfault as is the case here.
May I ask why you want to change the number of SVs, and why you expected it to work? Changing it likely leads the program to overshoot indexing in an array somewhere. For ref, there are 7 SVs for class
The snippet above is changing a private attribute (it has a leading underscore).
By definition, these aren't supposed to be changed by the user. When they're used by the estimator, it's fair to assume that they are properly set to a correct value.
We can't safe-guard against every single misuse, and this is a misuse.
Consider the common scenario when a model is trained in a research machine and then transfered for production use to live environment, if a malicious actor is able to take a hold of the model and change it, it should not be able to trigger a segmentation fault, please bear in mind the segfaults may have security implications other than crashing the process.
This is where it's out of scope here: we can't guard against everything. We have a responsibility to provide safe code when that code is used under the limits of what's a normal use-case, but that's pretty much it. Private attributes shouldn't be modified, and it's up to users to make sure that the estimator isn't maliciously altered.
I might go on a limb and use a poor analogy but when I buy a car, I can't complain that it breaks if I replace the steering wheel by a potato.
If you really want to prevent segfaults, maybe you can intercept the
To reuse you example: If someone switches the steering wheel for a potato, the car should not completelly break/fail unrecoverably the moment you turn it on and if it happens it would be considered a security risk would it no? You would not just say the driver to wear a helmet instead or would you?