New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unclear docstring for random_state in OCSVM #9497

Closed
albertcthomas opened this Issue Aug 4, 2017 · 11 comments

Comments

Projects
None yet
4 participants
@albertcthomas
Contributor

albertcthomas commented Aug 4, 2017

Description

Following the discussion on the mailing list about the random_state of the One-Class SVM it might be good to clarify a few things.

  • The random_state parameter in OneClassSVM is defined by its docstring as:
    "The seed of the pseudo random number generator to use when shuffling the data. [...]"
    If this refers to the shuffling used for probability estimation then I think it is incorrect as there is no probability estimation for the One-Class SVM. If this refers to something else it might be good to clarify it in the docstring or in the User Guide. But from the Libsvm paper it seems that the underlying SMO implementation is not random?

  • Same issue for the random_state parameter of LinearSVC? (LinearSVC does not seem to provide probability estimation). Does random_state control the seed of the random number generator used in the underlying implementation? As written in the doc:
    "The underlying LinearSVC implementation uses a random number generator to select features when fitting the model. It is thus not uncommon, to have slightly different results for the same input data. If that happens, try with a smaller tol parameter."

Other comments

The way randomness is handled for SVC and nuSVC could deserve a bit more explanation in the doc as well. The docstring of random_state is the same as the one used for the OneClassSVM. However the dosctring of random_seed in sklearn.libsvm.fit clearly states that this parameter is used for probability estimation:
"Seed for the random number generator used for probability estimates. 0 by default".
We should maybe do the same and specifies if the underlying LibSVM implementation is random or not?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Aug 4, 2017

Member

thanks for the report.

Member

amueller commented Aug 4, 2017

thanks for the report.

@shivamgargsya

This comment has been minimized.

Show comment
Hide comment
@shivamgargsya

shivamgargsya Aug 4, 2017

Contributor

working on it.

Contributor

shivamgargsya commented Aug 4, 2017

working on it.

@shivamgargsya

This comment has been minimized.

Show comment
Hide comment
@shivamgargsya

shivamgargsya Aug 6, 2017

Contributor

@albertcthomas in case of OneClassSVM It seems random_seed is not used for OneClassSVM implementation it can be checked at PREFIX(model) *PREFIX(train) function in svm/src/libsvm/svm.cpp file. @amuller random_seed variable can be dropped for OneClassSVM .

Contributor

shivamgargsya commented Aug 6, 2017

@albertcthomas in case of OneClassSVM It seems random_seed is not used for OneClassSVM implementation it can be checked at PREFIX(model) *PREFIX(train) function in svm/src/libsvm/svm.cpp file. @amuller random_seed variable can be dropped for OneClassSVM .

@albertcthomas

This comment has been minimized.

Show comment
Hide comment
@albertcthomas

albertcthomas Aug 7, 2017

Contributor

Thanks @shivamgargsya. Just to clarify, from what I understand of what's being done in svm.cpp, the random seed is used at the beginning of PREFIX(model) *PREFIX(train) to initialize the random number generator whatever the chosen type of SVM:

if(param->random_seed >= 0)
{
    srand(param->random_seed);
}

The random generator seems, however, to be only used in svm_binary_svc_probability and svm_svr_probability (still in svm.cpp), i.e. only for probability estimation as indicated in the docstring of libsvm.fit. If we confirm this, the random_state parameter of the OneClassSVM is indeed useless.

Contributor

albertcthomas commented Aug 7, 2017

Thanks @shivamgargsya. Just to clarify, from what I understand of what's being done in svm.cpp, the random seed is used at the beginning of PREFIX(model) *PREFIX(train) to initialize the random number generator whatever the chosen type of SVM:

if(param->random_seed >= 0)
{
    srand(param->random_seed);
}

The random generator seems, however, to be only used in svm_binary_svc_probability and svm_svr_probability (still in svm.cpp), i.e. only for probability estimation as indicated in the docstring of libsvm.fit. If we confirm this, the random_state parameter of the OneClassSVM is indeed useless.

@shivamgargsya

This comment has been minimized.

Show comment
Hide comment
@shivamgargsya

shivamgargsya Aug 8, 2017

Contributor

@albertcthomas thanks for looking into the svm implementation.Also simimilar behaviour is in for LinearSVC.

Contributor

shivamgargsya commented Aug 8, 2017

@albertcthomas thanks for looking into the svm implementation.Also simimilar behaviour is in for LinearSVC.

@albertcthomas

This comment has been minimized.

Show comment
Hide comment
@albertcthomas

albertcthomas Aug 24, 2017

Contributor

After a discussion with @agramfort I confirm that the underlying SMO implementation of the OneClass SVM is not random. And as there is no probability estimation for OCSVM, the random_state parameter is useless.

Contributor

albertcthomas commented Aug 24, 2017

After a discussion with @agramfort I confirm that the underlying SMO implementation of the OneClass SVM is not random. And as there is no probability estimation for OCSVM, the random_state parameter is useless.

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Aug 24, 2017

Member
Member

agramfort commented Aug 24, 2017

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Aug 29, 2017

Member

indeed. I think LinearSVC has randomization in the optimization though.

Member

amueller commented Aug 29, 2017

indeed. I think LinearSVC has randomization in the optimization though.

@albertcthomas

This comment has been minimized.

Show comment
Hide comment
@albertcthomas

albertcthomas Aug 31, 2017

Contributor

LinearSVC is indeed random (when dual=True) as explained for LogisticRegression (see comment). And this randomness is controlled by random_state. However, the docstring of random_state in SVC and LinearSVC is exactly the same whereas in SVC it controls the randomness of the probability estimation. Should we make this clearer? or we consider that as long as setting the same value to random_state leads to the same results this is OK?

Contributor

albertcthomas commented Aug 31, 2017

LinearSVC is indeed random (when dual=True) as explained for LogisticRegression (see comment). And this randomness is controlled by random_state. However, the docstring of random_state in SVC and LinearSVC is exactly the same whereas in SVC it controls the randomness of the probability estimation. Should we make this clearer? or we consider that as long as setting the same value to random_state leads to the same results this is OK?

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Aug 31, 2017

Member
Member

agramfort commented Aug 31, 2017

@albertcthomas

This comment has been minimized.

Show comment
Hide comment
@albertcthomas

albertcthomas Sep 11, 2017

Contributor

Closing as fixed in PR #9703

Contributor

albertcthomas commented Sep 11, 2017

Closing as fixed in PR #9703

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment