Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on SVMLIB #18891

Closed
pabloec20 opened this issue Nov 21, 2020 · 11 comments · Fixed by #21336
Closed

Segmentation fault on SVMLIB #18891

pabloec20 opened this issue Nov 21, 2020 · 11 comments · Fixed by #21336
Labels
Bug: triage Reported bugs that are not confirmed

Comments

@pabloec20
Copy link

pabloec20 commented Nov 21, 2020

Description:

In Scikit-learn version 0.23.2 calling the predict() method maliciously crafted model SVM can result in a segmentation fault. Such models can be introduced via pickle, json, or any other model permanence standard. The behaviour is triggered when one of the members of the _n_support array has a very large value, example 1000000 when calling libsvm.predict()

####Tested environment:

Ubuntu 9.3.0-17ubuntu1~20.04
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0] on linux
Numpy version: '1.19.2'
Sklearn.version: '0.23.2'

Steps/Code to Reproduce

from sklearn import svm
from sklearn import datasets


if __name__ == '__main__':
    X,y = datasets.load_iris(return_X_y=True)
    clf = svm.SVC()
    clf.fit(X, y)
    clf._n_support[0] = 1000000
    y_pred = clf.predict(X)

Expected Results

not to fail

Actual Results

Segmentation fault, this is a debugger trace

Thread 1 "python3-dbg" received signal SIGSEGV, Segmentation fault. 0x00007fffd7174df7 in svm_predict_values () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so (gdb) bt #0 0x00007fffd7174df7 in svm_predict_values () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #1 0x00007fffd717504f in svm_predict () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #2 0x00007fffd7163e57 in copy_predict () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #3 0x00007fffd716bb3a in ?? () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #4 0x00007fffd716d47d in ?? () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #5 0x000000000043663e in cfunction_call_varargs (func=0x7fffd76e9dd0, args=0x7fffd40fc1d0, kwargs=0x7fffd40fbdd0) at ../Objects/call.c:742 #6 0x000000000043920c in PyCFunction_Call (func=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:772 #7 0x000000000043708b in _PyObject_MakeTpCall (callable=callable@entry=0x7fffd76e9dd0, args=args@entry=0x18433a0, nargs=<optimized out>, keywords=keywords@entry=0x7fffd7821de0) at ../Objects/call.c:159 #8 0x00000000004ebe5b in _PyObject_Vectorcall (kwnames=0x7fffd7821de0, nargsf=9223372036854775816, args=0x18433a0, callable=0x7fffd76e9dd0) at ../Include/cpython/abstract.h:125 #9 call_function (kwnames=0x7fffd7821de0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x954500) at ../Python/ceval.c:4963 #10 _PyEval_EvalFrameDefault (f=0x1843210, throwflag=<optimized out>) at ../Python/ceval.c:3515 #11 0x00000000004df2ef in PyEval_EvalFrameEx (f=

Versions

System:
python: 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0]
executable: /usr/bin/python3
machine: Linux-5.4.0-53-generic-x86_64-with-glibc2.29

Python dependencies:
pip: 20.0.2
setuptools: 45.2.0
sklearn: 0.23.2
numpy: 1.19.2
scipy: 1.5.2
Cython: None
pandas: 1.1.2
matplotlib: 3.3.2
joblib: 0.16.0
threadpoolctl: 2.1.0

Built with OpenMP: True

@pabloec20 pabloec20 added the Bug: triage Reported bugs that are not confirmed label Nov 21, 2020
@carnil
Copy link

carnil commented Nov 22, 2020

This issue was assigned CVE-2020-28975

@NicolasHug
Copy link
Member

clf._n_support[0] = 1000000

It's expected that manually changing a private attribute will lead to wrong results, or even segfault as is the case here.

May I ask why you want to change the number of SVs, and why you expected it to work? Changing it likely leads the program to overshoot indexing in an array somewhere. For ref, there are 7 SVs for class 0.

@pabloec20
Copy link
Author

It should not segfault but fail gracefully.

@NicolasHug
Copy link
Member

The snippet above is changing a private attribute (it has a leading underscore).

By definition, these aren't supposed to be changed by the user. When they're used by the estimator, it's fair to assume that they are properly set to a correct value.

We can't safe-guard against every single misuse, and this is a misuse.

@pabloec20
Copy link
Author

Consider the common scenario when a model is trained in a research machine and then transfered for production use to live environment, if a malicious actor is able to take a hold of the model and change it, it should not be able to trigger a segmentation fault, please bear in mind the segfaults may have security implications other than crashing the process.

@NicolasHug
Copy link
Member

if a malicious actor

This is where it's out of scope here: we can't guard against everything. We have a responsibility to provide safe code when that code is used under the limits of what's a normal use-case, but that's pretty much it. Private attributes shouldn't be modified, and it's up to users to make sure that the estimator isn't maliciously altered.

I might go on a limb and use a poor analogy but when I buy a car, I can't complain that it breaks if I replace the steering wheel by a potato.

If you really want to prevent segfaults, maybe you can intercept the SIGSEGV?

@pabloec20
Copy link
Author

pabloec20 commented Nov 26, 2020

To reuse you example: If someone switches the steering wheel for a potato, the car should not completelly break/fail unrecoverably the moment you turn it on and if it happens it would be considered a security risk would it no? You would not just say the driver to wear a helmet instead or would you?

@jnothman
Copy link
Member

jnothman commented Nov 27, 2020 via email

@pabloec20
Copy link
Author

There are other ways to export a model which are not via pickle, and this does not require pickle at all.

@bkreider
Copy link

This is where it's out of scope here: we can't guard against everything.

Isn't the model loader improperly validating the model and then segfaulting? Is pickle the only way to load a model?

The proof of concept exploit code is just using the private attribute to shorten the example. Perhaps it should have used a comment there describing "loading a model with a modified n value"?

This high CVE score is tripping up people trying to use scikit learn in regulated and secure environments.

@rth
Copy link
Member

rth commented Oct 14, 2021

This high CVE score is tripping up people trying to use scikit learn in regulated and secure environments.

Hmm, yes https://nvd.nist.gov/vuln/detail/CVE-2020-28975 and those are really used for enterprise deployment. We could do a bit more sanity checks in

def predict(np.ndarray[np.float64_t, ndim=2, mode='c'] X,
to make that CVE go await, even if it would be difficult to be fully fail proof. A PR would be welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug: triage Reported bugs that are not confirmed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants