Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example to compare classifiers #171

Merged
merged 9 commits into from Jun 1, 2022

Conversation

qbarthelemy
Copy link
Member

This PR adds an example to compare several Riemannian classifiers on low-dimensional synthetic datasets, adapted to SPD matrices from https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

@gabelstein

@qbarthelemy
Copy link
Member Author

Display of the current example:
comp

@agramfort
Copy link
Member

decision functions are bit weird no? for example I am surprised that the kNN decision function is not more irregular.

@gabelstein
Copy link
Contributor

Thanks, looks good! will include it @qbarthelemy

@gabelstein
Copy link
Contributor

On closer inspection, this will probably be very confusing, as we can only display two dimensions of the three dimensional space. it could be alleviated with a 3d-plot, but that plot couldn't show decision boundaries properly. I'll look into another way of including an example.

@qbarthelemy
Copy link
Member Author

decision functions are bit weird no? for example I am surprised that the kNN decision function is not more irregular.

Good catch! Digging into the code, I have discovered that current version of KNN does not implement predict_proba() : it only inherites from MDM, which has only one center by class.

will include it

Ok, but do not merge this branch in yours. Wait the merge, and rebase your branch on the last master.

On closer inspection, this will probably be very confusing, as we can only display two dimensions of the three dimensional space. it could be alleviated with a 3d-plot, but that plot couldn't show decision boundaries properly. I'll look into another way of including an example.

Previous example plots the decision boundary for the horizontal 2D plane going through the mean value of the third coordinates.
3D decision boundaries are not easy to show, but I try a new display.

@qbarthelemy
Copy link
Member Author

qbarthelemy commented May 2, 2022

I use RandomState to generate data, but figures are not reproducible from one run to another.
comp_3d

pyriemann/classification.py Show resolved Hide resolved
for il, ll in enumerate(self.classes_):
prob[m, il] = np.sum(
probas[m, neighbors_classes[m, 0:self.n_neighbors] == ll]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had checked: methods are very similar.

The difference is the definition of probabilities from distances:
pyRiemann uses a softmax of negative squared distances (this formula is derived from the Riemannian Gaussian distribution, as explained in #100),
whereas sklearn uses reciprocal of distances, and then divides by the sum (to have the sum of probas equal to 1) (but, I don't know the origin of this formula).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before committing and making a choice that differs from sklearn I would run a tiny benchmark on Gaussian data following LDA model to see what approach leads to the best calibrated probability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would run a tiny benchmark

Nice idea! But, who is "I"? you or me? ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joking aside, I don't see how to do this benchmark.
Because inputs of sklearn are multivariate vectors (generated by a mixture of multivariate Gaussian distributions, ok), while inputs of pyRiemann are covariance matrices. So, results can't be compared.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do it in sklearn using euclidian data. Basically replace the
predict_proba in sklearn and see what works best
in sklearn. Then copy this in PyRiemann

I plot log-probabilities of kNN applied on bivariate Gaussian distributions:

  • weights="max_lik" : computing softmax of negative squared distances, equivalent to an Euclidean Gaussian modelization (implem added in this branch).
  • weights="distance" : computing reciprocal of distances, equivalent to a power law modelization (classical sklearn implem);

Results are really close, but I think that new option is better, because:

  • it is derived from a Gaussian prior, more coherent than a power law prior;
  • it naturally deals with the case when we attempt to classify a point that is zero distance from one or more training points, contrary to reciprocal computation giving a singularity in 0.

Moreover, this new option could be added in sklearn.

test_knn_comparison

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agramfort , can we merge the branch?

doc/whatsnew.rst Show resolved Hide resolved
@sylvchev
Copy link
Member

I use RandomState to generate data, but figures are not reproducible from one run to another.

I had a look into that, but I did not find a reason. Does the covariances are the same or is it only the resulting classification?

@sylvchev
Copy link
Member

Good catch for the missing RandomState in make_gaussian_blobs!
The doc is nicely generated, but is it possible to use a different to have a more interpretable example on the 3rd line? See below on the artifact generated by GH Action, the 3rd dataset seems to have only blue class or indecisive probability for the contourf plot. On the 3D example you posted above, the partition of the space seems more conclusive.

sphx_glr_plot_classifier_comparison_001

@agramfort
Copy link
Member

agramfort commented May 18, 2022 via email

@agramfort
Copy link
Member

thx @qbarthelemy !

sorry it had slipped through the cracks

@sylvchev
Copy link
Member

sylvchev commented Jun 1, 2022

LGTM
Kudos @qbarthelemy

@sylvchev sylvchev merged commit b3f09bd into pyRiemann:master Jun 1, 2022
@qbarthelemy qbarthelemy deleted the example_classif branch June 1, 2022 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants