Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on SVMLIB #18891

Closed
pabloec20 opened this issue Nov 21, 2020 · 9 comments
Closed

Segmentation fault on SVMLIB #18891

pabloec20 opened this issue Nov 21, 2020 · 9 comments
Labels

Comments

@pabloec20
Copy link

@pabloec20 pabloec20 commented Nov 21, 2020

Description:

In Scikit-learn version 0.23.2 calling the predict() method maliciously crafted model SVM can result in a segmentation fault. Such models can be introduced via pickle, json, or any other model permanence standard. The behaviour is triggered when one of the members of the _n_support array has a very large value, example 1000000 when calling libsvm.predict()

####Tested environment:

Ubuntu 9.3.0-17ubuntu1~20.04
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0] on linux
Numpy version: '1.19.2'
Sklearn.version: '0.23.2'

Steps/Code to Reproduce

from sklearn import svm
from sklearn import datasets


if __name__ == '__main__':
    X,y = datasets.load_iris(return_X_y=True)
    clf = svm.SVC()
    clf.fit(X, y)
    clf._n_support[0] = 1000000
    y_pred = clf.predict(X)

Expected Results

not to fail

Actual Results

Segmentation fault, this is a debugger trace

Thread 1 "python3-dbg" received signal SIGSEGV, Segmentation fault. 0x00007fffd7174df7 in svm_predict_values () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so (gdb) bt #0 0x00007fffd7174df7 in svm_predict_values () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #1 0x00007fffd717504f in svm_predict () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #2 0x00007fffd7163e57 in copy_predict () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #3 0x00007fffd716bb3a in ?? () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #4 0x00007fffd716d47d in ?? () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #5 0x000000000043663e in cfunction_call_varargs (func=0x7fffd76e9dd0, args=0x7fffd40fc1d0, kwargs=0x7fffd40fbdd0) at ../Objects/call.c:742 #6 0x000000000043920c in PyCFunction_Call (func=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:772 #7 0x000000000043708b in _PyObject_MakeTpCall (callable=callable@entry=0x7fffd76e9dd0, args=args@entry=0x18433a0, nargs=<optimized out>, keywords=keywords@entry=0x7fffd7821de0) at ../Objects/call.c:159 #8 0x00000000004ebe5b in _PyObject_Vectorcall (kwnames=0x7fffd7821de0, nargsf=9223372036854775816, args=0x18433a0, callable=0x7fffd76e9dd0) at ../Include/cpython/abstract.h:125 #9 call_function (kwnames=0x7fffd7821de0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x954500) at ../Python/ceval.c:4963 #10 _PyEval_EvalFrameDefault (f=0x1843210, throwflag=<optimized out>) at ../Python/ceval.c:3515 #11 0x00000000004df2ef in PyEval_EvalFrameEx (f=

Versions

System:
python: 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0]
executable: /usr/bin/python3
machine: Linux-5.4.0-53-generic-x86_64-with-glibc2.29

Python dependencies:
pip: 20.0.2
setuptools: 45.2.0
sklearn: 0.23.2
numpy: 1.19.2
scipy: 1.5.2
Cython: None
pandas: 1.1.2
matplotlib: 3.3.2
joblib: 0.16.0
threadpoolctl: 2.1.0

Built with OpenMP: True

@carnil
Copy link

@carnil carnil commented Nov 22, 2020

This issue was assigned CVE-2020-28975

@NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Nov 22, 2020

clf._n_support[0] = 1000000

It's expected that manually changing a private attribute will lead to wrong results, or even segfault as is the case here.

May I ask why you want to change the number of SVs, and why you expected it to work? Changing it likely leads the program to overshoot indexing in an array somewhere. For ref, there are 7 SVs for class 0.

@NicolasHug NicolasHug closed this Nov 22, 2020
@pabloec20
Copy link
Author

@pabloec20 pabloec20 commented Nov 22, 2020

It should not segfault but fail gracefully.

@NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Nov 22, 2020

The snippet above is changing a private attribute (it has a leading underscore).

By definition, these aren't supposed to be changed by the user. When they're used by the estimator, it's fair to assume that they are properly set to a correct value.

We can't safe-guard against every single misuse, and this is a misuse.

@pabloec20
Copy link
Author

@pabloec20 pabloec20 commented Nov 22, 2020

Consider the common scenario when a model is trained in a research machine and then transfered for production use to live environment, if a malicious actor is able to take a hold of the model and change it, it should not be able to trigger a segmentation fault, please bear in mind the segfaults may have security implications other than crashing the process.

@NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Nov 22, 2020

if a malicious actor

This is where it's out of scope here: we can't guard against everything. We have a responsibility to provide safe code when that code is used under the limits of what's a normal use-case, but that's pretty much it. Private attributes shouldn't be modified, and it's up to users to make sure that the estimator isn't maliciously altered.

I might go on a limb and use a poor analogy but when I buy a car, I can't complain that it breaks if I replace the steering wheel by a potato.

If you really want to prevent segfaults, maybe you can intercept the SIGSEGV?

@pabloec20
Copy link
Author

@pabloec20 pabloec20 commented Nov 26, 2020

To reuse you example: If someone switches the steering wheel for a potato, the car should not completelly break/fail unrecoverably the moment you turn it on and if it happens it would be considered a security risk would it no? You would not just say the driver to wear a helmet instead or would you?

@jnothman
Copy link
Member

@jnothman jnothman commented Nov 27, 2020

@pabloec20
Copy link
Author

@pabloec20 pabloec20 commented Nov 30, 2020

There are other ways to export a model which are not via pickle, and this does not require pickle at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.