Skip to content

MRG describe SVM probability calibration #1820

wants to merge 1 commit into from

2 participants

scikit-learn member

After I caught a colleague using SVC.predict_proba where decision_function would have been fine, answering some SO/MO questions, and reading a bunch of stuff on how SVM probabilities work, I tried to improve the docs accordingly.

I'd like a quick check to see if I'm not saying anything stupid here.

@ogrisel ogrisel commented on the diff Mar 28, 2013
@@ -196,6 +195,30 @@ this:
+.. _scores_probabilities:
+Scores and probabilities
+The :class:`SVC` method ``decision_function`` gives per-class scores
+for each sample (or a single score per sample in the binary case).
+When the constructor option ``probability`` is set to ``True``,
+class membership probability estimates
+(from the methods ``predict_proba`` and ``predict_log_proba``) are enabled.
+In the binary case, the probabilities are calibrated using Platt's method:
+logistic regression on the SVM's scores,
+fit by an additional cross-validation on the training data.
+Needless to say, this is an expensive operation for large datasets.
+In the multiclass case, this is extended as per Wu et al. (2004).
scikit-learn member
ogrisel added a note Mar 28, 2013

I would be even more explicit and add something like:

In un-calibreated confidence estimates are sufficient (e.g. to compare pairwise predictions on the same model) it is advised to leave probability to False and use the decision_function method instead.

scikit-learn member
larsmans added a note Mar 28, 2013

Good point. I also forgot to note that the probabilities can be inconsistent with the predictions because of the bias term in Platt scaling, will fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
scikit-learn member
ogrisel commented Mar 28, 2013

Apart from the previous remark, +1 for merging. Thanks for digging up the references.

scikit-learn member

Pushed as b1a97de.

@larsmans larsmans closed this Mar 30, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.