-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] modify disadvantage #8521
Conversation
svm can work effectively when feature number is >> number of samples. But to avoid over-fitting usually happens in such situation by choosing appropriate kernel (model selection) is important
LGTM. |
@@ -28,7 +28,8 @@ The advantages of support vector machines are: | |||
The disadvantages of support vector machines include: | |||
|
|||
- If the number of features is much greater than the number of | |||
samples, the method is likely to give poor performances. | |||
samples, avoid over-fitting in choosing :ref:`svm_kernels` and regularization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you respect the 80 character line length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like its 84 characters, is that a major issue?
Codecov Report
@@ Coverage Diff @@
## master #8521 +/- ##
=======================================
Coverage 95.48% 95.48%
=======================================
Files 342 342
Lines 60913 60913
=======================================
Hits 58160 58160
Misses 2753 2753 Continue to review full report at Codecov.
|
LGTM as well. |
Congrats @Ellen-Co2 ! |
[MRG+2] modify disadvantage
[MRG+2] modify disadvantage
[MRG+2] modify disadvantage
[MRG+2] modify disadvantage
[MRG+2] modify disadvantage
[MRG+2] modify disadvantage
svm can work effectively when feature number is >> number of samples.
But to avoid over-fitting usually happens in such situation by choosing
appropriate kernel (model selection) is important
Reference Issue
<-- Fixes #8450 -->
What does this implement/fix? Explain your changes.
In case of high dimensionality, SVM can still work effectively, but the over-fitting issue still need to be considered, cause the vc dimension might be close to infinite in such case, thus choose of kernel or control the regularization factor "C" is essential.
Any other comments?
To test for over-fitting, use cross validation or larger hold-out can be useful. Check some discussions regarding dimensionality here