New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault on SVMLIB #18891
Comments
|
This issue was assigned CVE-2020-28975 |
|
It's expected that manually changing a private attribute will lead to wrong results, or even segfault as is the case here. May I ask why you want to change the number of SVs, and why you expected it to work? Changing it likely leads the program to overshoot indexing in an array somewhere. For ref, there are 7 SVs for class |
|
It should not segfault but fail gracefully. |
|
The snippet above is changing a private attribute (it has a leading underscore). By definition, these aren't supposed to be changed by the user. When they're used by the estimator, it's fair to assume that they are properly set to a correct value. We can't safe-guard against every single misuse, and this is a misuse. |
|
Consider the common scenario when a model is trained in a research machine and then transfered for production use to live environment, if a malicious actor is able to take a hold of the model and change it, it should not be able to trigger a segmentation fault, please bear in mind the segfaults may have security implications other than crashing the process. |
This is where it's out of scope here: we can't guard against everything. We have a responsibility to provide safe code when that code is used under the limits of what's a normal use-case, but that's pretty much it. Private attributes shouldn't be modified, and it's up to users to make sure that the estimator isn't maliciously altered. I might go on a limb and use a poor analogy but when I buy a car, I can't complain that it breaks if I replace the steering wheel by a potato. If you really want to prevent segfaults, maybe you can intercept the |
|
To reuse you example: If someone switches the steering wheel for a potato, the car should not completelly break/fail unrecoverably the moment you turn it on and if it happens it would be considered a security risk would it no? You would not just say the driver to wear a helmet instead or would you? |
|
Pickles are unsafe in any case. If you submit a pull request to avoid the
segfault we can consider it.
|
|
There are other ways to export a model which are not via pickle, and this does not require pickle at all. |
Isn't the model loader improperly validating the model and then segfaulting? Is pickle the only way to load a model? The proof of concept exploit code is just using the private attribute to shorten the example. Perhaps it should have used a comment there describing "loading a model with a modified n value"? This high CVE score is tripping up people trying to use scikit learn in regulated and secure environments. |
Hmm, yes https://nvd.nist.gov/vuln/detail/CVE-2020-28975 and those are really used for enterprise deployment. We could do a bit more sanity checks in scikit-learn/sklearn/svm/_libsvm.pyx Line 279 in b43f057
|
Description:
In Scikit-learn version 0.23.2 calling the predict() method maliciously crafted model SVM can result in a segmentation fault. Such models can be introduced via pickle, json, or any other model permanence standard. The behaviour is triggered when one of the members of the _n_support array has a very large value, example 1000000 when calling libsvm.predict()
####Tested environment:
Ubuntu 9.3.0-17ubuntu1~20.04
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0] on linux
Numpy version: '1.19.2'
Sklearn.version: '0.23.2'
Steps/Code to Reproduce
Expected Results
not to fail
Actual Results
Segmentation fault, this is a debugger trace
Thread 1 "python3-dbg" received signal SIGSEGV, Segmentation fault. 0x00007fffd7174df7 in svm_predict_values () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so (gdb) bt #0 0x00007fffd7174df7 in svm_predict_values () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #1 0x00007fffd717504f in svm_predict () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #2 0x00007fffd7163e57 in copy_predict () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #3 0x00007fffd716bb3a in ?? () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #4 0x00007fffd716d47d in ?? () from /home/pablo/.local/lib/python3.8/site-packages/sklearn/svm/_libsvm.cpython-38-x86_64-linux-gnu.so #5 0x000000000043663e in cfunction_call_varargs (func=0x7fffd76e9dd0, args=0x7fffd40fc1d0, kwargs=0x7fffd40fbdd0) at ../Objects/call.c:742 #6 0x000000000043920c in PyCFunction_Call (func=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:772 #7 0x000000000043708b in _PyObject_MakeTpCall (callable=callable@entry=0x7fffd76e9dd0, args=args@entry=0x18433a0, nargs=<optimized out>, keywords=keywords@entry=0x7fffd7821de0) at ../Objects/call.c:159 #8 0x00000000004ebe5b in _PyObject_Vectorcall (kwnames=0x7fffd7821de0, nargsf=9223372036854775816, args=0x18433a0, callable=0x7fffd76e9dd0) at ../Include/cpython/abstract.h:125 #9 call_function (kwnames=0x7fffd7821de0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x954500) at ../Python/ceval.c:4963 #10 _PyEval_EvalFrameDefault (f=0x1843210, throwflag=<optimized out>) at ../Python/ceval.c:3515 #11 0x00000000004df2ef in PyEval_EvalFrameEx (f=Versions
System:
python: 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0]
executable: /usr/bin/python3
machine: Linux-5.4.0-53-generic-x86_64-with-glibc2.29
Python dependencies:
pip: 20.0.2
setuptools: 45.2.0
sklearn: 0.23.2
numpy: 1.19.2
scipy: 1.5.2
Cython: None
pandas: 1.1.2
matplotlib: 3.3.2
joblib: 0.16.0
threadpoolctl: 2.1.0
Built with OpenMP: True
The text was updated successfully, but these errors were encountered: