Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

[MRG] multilabel accuracy with jaccard the index #1795

Merged
merged 2 commits into from

6 participants

@arjoly
Owner

This pr intends to bring multilabel accuracy and zero-one loss based on the jaccard index.

For reference, see section 7.1.1 of Mining Multi-label Data and the Wikipedia entry on Jaccard index.

TODO list:

  • Add multilabel accuracy based on jaccard similarity score
  • write narrative doc for accuracy based on jaccard similarity score
  • Update what's new?

This was removed of the pr scope:

  • Add multilabel zero-one loss based on jaccard distance
  • write narrative doc for multilabel zero-one loss based on jaccard distance
@jaquesgrobler

Had a quick look..looks cool so far. Nice work

@arjoly
Owner

Had a quick look..looks cool so far. Nice work

Thanks for you encouragements! :-)

sklearn/metrics/metrics.py
@@ -939,16 +965,44 @@ def accuracy_score(y_true, y_pred, normalize=True):
y_true = lb.transform(y_true)
y_pred = lb.transform(y_pred)
- if is_label_indicator_matrix(y_true):
- score = (y_pred != y_true).sum(axis=1) == 0
+ if similarity == "subset":
+ if is_label_indicator_matrix(y_true):
+ score = (y_pred != y_true).sum(axis=1) == 0
+ else:
+ score = np.array([len(set(true) ^ set(pred)) == 0
+ for pred, true in zip(y_pred, y_true)])
+
+ elif similarity == "jaccard":
@mblondel Owner

Why don't you create a new dedicated function?

@arjoly Owner
arjoly added a note

In the multilabel litterature, subset accuracy and jaccard index accuracy are often simply called accuracy.
They will give the same score in the multiclass case.

And I think that it would reduce the amount of redundant code.

Note: the hamming loss metrics could be integrate in the accuracy score function with the hamming similarity.
Note2 : I got inspired to do this with the design of the precision, recall, f-score function.

@larsmans Owner

I too would prefer a new function, say jaccard_accuracy. Redundant code isn't the first worry: it's consistent and easy to use APIs. The implementation can always be changed later.

@arjoly Owner
arjoly added a note

Thanks a lot for the feedback ! I will do as you propose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly
Owner

What do you think of the interface of accuracy_score and zero_one_loss for multilabels with an argument to specify the distance or similarity function?

Given the new interface, I would like to merge the hamming loss function in the accuracy_score and zero_one_loss function.

@arjoly
Owner

I have written the narrative doc and finished to implement the new jaccard metric.
Reviews are welcomed.

@arjoly
Owner

I am not able to retrieve the travis log :-(

@arjoly
Owner

Travis should be happy now.

@arjoly
Owner

@larsmans @mblondel now there is only one function jaccard_similarity_score.

@arjoly
Owner

I've rebased on top of master

@arjoly
Owner

Reviews are welcomed.

@jaquesgrobler

Nice! I'll have a read through this evening. Thanks

@arjoly
Owner

Thanks :-)

@jaquesgrobler

Read through this. Very nice PR..
Thanks for doing this!
I'm +1, though I'm sure you'll get some more reviews from the others
Great work, once again :)

@arjoly
Owner

Thanks for the review ! :-)

@glouppe glouppe commented on the diff
doc/modules/model_evaluation.rst
((14 lines not shown))
+
+The Jaccard similarity coefficient between two label sets :math:`y` and
+:math:`\hat{y}` is defined as
+
+.. math::
+
+ J(y, \hat{y}) = \frac{y \cap \hat{\hat{y}}}{y \cup \hat{\hat{y}}}.
+
+::
+
+ >>> import numpy as np
+ >>> from sklearn.metrics import jaccard_similarity_score
+ >>> y_pred = [0, 2, 1, 3]
+ >>> y_true = [0, 1, 2, 3]
+ >>> jaccard_similarity_score(y_true, y_pred)
+ 0.5
@glouppe Owner
glouppe added a note

If y_pred and y_true are sets, shouldn't the score be 1.0 in this case? (Both y_pred and y_true contains the same labels, modulo their order, but this shouldn't be of importance since we are talking about sets, should it?)

@glouppe Owner
glouppe added a note

Okat, I get it, y_true[i] and y_pred[i] are the sets to be compared. I think this should be make clearer then: jaccard_similarity_score computes the average (?) Jaccard similarity (as you define it above) over several pairs of sets.

@arjoly Owner
arjoly added a note

Improve the doc in 7267e3c

@jaquesgrobler Owner

Reads easier now :+1:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
doc/modules/model_evaluation.rst
@@ -72,11 +73,11 @@ Accuracy score
---------------
The :func:`accuracy_score` function computes the
`accuracy <http://en.wikipedia.org/wiki/Accuracy_and_precision>`_, the fraction
-(default) or the number of correct predictions. In multilabel classification,
-the function returns the subset accuracy:
-the entire set of labels for a sample must be entirely correct
-or the sample has an accuracy of zero.
-(See the Hamming loss for a more forgiving evaluation metric.)
+(default) or the number of correct predictions.
+
+In multilabel classification, the function returns the subset accuracy: the
+entire set of labels for a sample must be entirely correct or the sample has an
+accuracy of zero.
@glouppe Owner
glouppe added a note

I would rephrase this as "In multilabel classification, the function returns the subset accuracy: if the entire set of predicted labels for a sample strictly match with the true set of labels, then the subset accuracy is 1.0, otherwise it is 0.0."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
doc/modules/model_evaluation.rst
@@ -283,6 +284,48 @@ and with a list of labels format:
over samples, the Hamming loss is always between zero and one.
+Jaccard similarity coefficient score
+------------------------------------
+
+The :func:`jaccard_similarity_score` function computes the average (default)
+or sum of `Jaccard similarity coefficients, also called
+Jaccard index <http://en.wikipedia.org/wiki/Jaccard_index>`_, between
@glouppe Owner
glouppe added a note

I would put the link on "Jaccard similarity coefficients" only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
doc/modules/model_evaluation.rst
@@ -283,6 +284,48 @@ and with a list of labels format:
over samples, the Hamming loss is always between zero and one.
+Jaccard similarity coefficient score
+------------------------------------
+
+The :func:`jaccard_similarity_score` function computes the average (default)
+or sum of `Jaccard similarity coefficients, also called
+Jaccard index <http://en.wikipedia.org/wiki/Jaccard_index>`_, between
+pairs of label set.
@glouppe Owner
glouppe added a note

Typo: sets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/metrics/metrics.py
((89 lines not shown))
+ y_true = lb.transform(y_true)
+ y_pred = lb.transform(y_pred)
+
+ if is_label_indicator_matrix(y_true):
+ try:
+ # oddly, we may get an "invalid" rather than a "divide"
+ # error here
+ old_err_settings = np.seterr(divide='ignore',
+ invalid='ignore')
+ score = (np.sum(np.logical_and(y_pred == pos_label,
+ y_true == pos_label),
+ axis=1) /
+ np.sum(np.logical_or(y_pred == pos_label,
+ y_true == pos_label),
+ axis=1))
+
@glouppe Owner
glouppe added a note

It may no be serious but y_pred == pos_label and y_true == pos_label are computed twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/metrics/metrics.py
((99 lines not shown))
+ y_true == pos_label),
+ axis=1) /
+ np.sum(np.logical_or(y_pred == pos_label,
+ y_true == pos_label),
+ axis=1))
+
+ # If there is no label, it results in a Nan instead, we set
+ # the jaccard to 1: lim_{x->0} x/x = 1
+ score[np.isnan(score)] = 1.0
+
+ finally:
+ np.seterr(**old_err_settings)
+ else:
+ score = np.array([len(set(true) & set(pred)) /
+ len(set(true) | set(pred))
+ if set(true) | set(pred)
@glouppe Owner
glouppe added a note

Same here, set(true) | set(pred) is computed twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@glouppe
Owner

Besides my comments above, this looks good to me. +1 after those are fixed.

@arjoly
Owner

If everything is ok, I can squash and push.

@arjoly arjoly merged commit be842f2 into scikit-learn:master

1 check failed

Details default The Travis build is in progress
@arjoly arjoly deleted the arjoly:metrics-jaccard branch
@arjoly
Owner

Thanks a lot for the review !!!

@arjoly
Owner

See <https://jenkins.shiningpanda-ci.com/scikit-learn/job/python-2.6-numpy-1.3.0-scipy-0.7.2/1800/changes>

Changes:

[arnaud.v.joly] ENH more pythonic way to treat list of list of labels

[arnaud.v.joly] ENH add jaccard similarity score metrics

------------------------------------------
[...truncated 2265 lines...]
sklearn.tests.test_preprocessing.test_scaler_without_centering ... ok
Check that StandardScaler.fit does not change input ... ok
sklearn.tests.test_preprocessing.test_scale_sparse_with_mean_raise_exception ... ok
sklearn.tests.test_preprocessing.test_scale_function_without_centering ... ok
Check warning when scaling integer data ... ok
sklearn.tests.test_preprocessing.test_normalizer_l1 ... ok
sklearn.tests.test_preprocessing.test_normalizer_l2 ... ok
Check that invalid arguments yield ValueError ... ok
sklearn.tests.test_preprocessing.test_binarizer ... ok
sklearn.tests.test_preprocessing.test_label_binarizer ... ok
sklearn.tests.test_preprocessing.test_label_binarizer_set_label_encoding ... ok
sklearn.tests.test_preprocessing.test_label_binarizer_multilabel ... ok
Check that invalid arguments yield ValueError ... ok
Test OneHotEncoder's fit and transform. ... ok
Test LabelEncoder's transform and inverse_transform methods ... ok
Test fit_transform ... ok
Test LabelEncoder's transform and inverse_transform methods with ... ok
Check that invalid arguments yield ValueError ... ok
sklearn.tests.test_preprocessing.test_label_binarizer_iris ... ok
Check that LabelBinarizer can handle an unlabeled sample ... ok
Test that KernelCenterer is equivalent to StandardScaler ... ok
sklearn.tests.test_preprocessing.test_fit_transform ... ok
sklearn.tests.test_preprocessing.test_add_dummy_feature ... ok
sklearn.tests.test_preprocessing.test_add_dummy_feature_coo ... ok
sklearn.tests.test_preprocessing.test_add_dummy_feature_csc ... ok
sklearn.tests.test_preprocessing.test_add_dummy_feature_csr ... ok
sklearn.tests.test_preprocessing.test_balance_weights ... ok
QDA classification. ... ok
sklearn.tests.test_qda.test_qda_priors ... ok
sklearn.tests.test_qda.test_qda_store_covariances ... ok
sklearn.tests.test_random_projection.test_invalid_jl_domain ... ok
sklearn.tests.test_random_projection.test_input_size_jl_min_dim ... ok
Check basic properties of random matrix generation ... ok
Check some statical properties of Gaussian random matrix ... ok
Check some statical properties of sparse random matrix ... ok
sklearn.tests.test_random_projection.test_sparse_random_projection_transformer_invalid_density ... ok
sklearn.tests.test_random_projection.test_random_projection_transformer_invalid_input ... ok
sklearn.tests.test_random_projection.test_try_to_transform_before_fit ... ok
sklearn.tests.test_random_projection.test_too_many_samples_to_find_a_safe_embedding ... ok
sklearn.tests.test_random_projection.test_random_projection_embedding_quality ... ok
sklearn.tests.test_random_projection.test_SparseRandomProjection_output_representation ... ok
sklearn.tests.test_random_projection.test_correct_RandomProjection_dimensions_embedding ... ok
sklearn.tests.test_random_projection.test_warning_n_components_greater_than_n_features ... ok
Doctest: sklearn._NoseTester.test ... ok

======================================================================
FAIL: sklearn.metrics.tests.test_metrics.test_multilabel_representation_invariance
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/home/slave/virtualenvs/cpython-2.6/lib/python2.6/site-packages/nose/case.py", line 197, in runTest
   self.test(*self.arg)
 File "<https://jenkins.shiningpanda-ci.com/scikit-learn/job/python-2.6-numpy-1.3.0-scipy-0.7.2/ws/sklearn/metrics/tests/test_metrics.py",> line 947, in test_multilabel_representation_invariance
   % name)
 File "/home/slave/virtualenvs/cpython-2.6/lib/python2.6/site-packages/numpy/testing/utils.py", line 265, in assert_almost_equal
   raise AssertionError(msg)
AssertionError: 
Items are not equal:
jaccard_similarity_score failed representation invariance  between list of list of labels format and dense binary indicator format.
ACTUAL: 0.3562091503267974
DESIRED: 0.29738562091503268
raise AssertionError('\nItems are not equal:\njaccard_similarity_score failed representation invariance between list of list of labels format and dense binary indicator format.\n ACTUAL: 0.3562091503267974\n DESIRED: 0.29738562091503268')


======================================================================
FAIL: sklearn.metrics.tests.test_metrics.test_multilabel_jaccard_similarity_score
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/home/slave/virtualenvs/cpython-2.6/lib/python2.6/site-packages/nose/case.py", line 197, in runTest
   self.test(*self.arg)
 File "<https://jenkins.shiningpanda-ci.com/scikit-learn/job/python-2.6-numpy-1.3.0-scipy-0.7.2/ws/sklearn/metrics/tests/test_metrics.py",> line 1103, in test_multilabel_jaccard_similarity_score
   assert_equal(1, jaccard_similarity_score(y1, y2, pos_label=10))
AssertionError: 1 != 0.0
raise self.failureException, \
         (None or '%r != %r' % (1, 0.0))


Name                                             Stmts   Miss  Cover   Missing
------------------------------------------------------------------------------
sklearn                                             34      4    88%   27, 52, 64-65
sklearn.__check_build                               18      3    83%   24, 45-46
sklearn.__check_build.setup                          9      2    78%   17-18
sklearn._build_utils                                17      4    76%   18, 22, 27-28
sklearn.base                                       136      3    98%   58, 76, 84
sklearn.cluster                                     11      0   100%   
sklearn.cluster._feature_agglomeration              22      2    91%   64, 70
sklearn.cluster.affinity_propagation_               90      6    93%   145-146, 167-169, 264
sklearn.cluster.dbscan_                             46      0   100%   
sklearn.cluster.hierarchical                       154      0   100%   
sklearn.cluster.k_means_                           381      5    99%   95, 360, 953, 1234, 1237
sklearn.cluster.mean_shift_                         78      5    94%   103, 120, 154-156
sklearn.cluster.setup                               18      2    89%   40-41
sklearn.cluster.spectral                           117     14    88%   140-142, 154, 253-256, 258-261, 408-411, 413-416, 442, 476
sklearn.covariance                                   6      0   100%   
sklearn.covariance.empirical_covariance_            62      0   100%   
sklearn.covariance.graph_lasso_                    201     18    91%   134, 156, 172-176, 202, 310, 335, 341, 344, 436-437, 480, 499-501, 503-505
sklearn.covariance.outlier_detection                38      2    95%   71, 100
sklearn.covariance.robust_covariance               211     19    91%   124, 134, 139-141, 147, 228, 322-323, 331, 380-386, 566, 574-579, 645
sklearn.covariance.shrunk_covariance_              127      5    96%   182, 184-188
sklearn.cross_validation                           428      1    99%   1294
sklearn.datasets                                    47      0   100%   
sklearn.datasets.base                              136     14    90%   455-464, 498-505
sklearn.datasets.california_housing                 37     24    35%   27-29, 64-101
sklearn.datasets.covtype                            50     23    54%   19-20, 69-78, 84-93, 100-104
sklearn.datasets.lfw                               157    135    14%   53-55, 66-105, 113-163, 178-208, 260-276, 294-334, 342, 402-429, 439
sklearn.datasets.mlcomp                             47     40    15%   11-13, 56-103
sklearn.datasets.mldata                             80      7    91%   15-19, 151-153
sklearn.datasets.olivetti_faces                     41     26    37%   32-35, 89-116
sklearn.datasets.samples_generator                 301     37    88%   116, 120, 123, 524, 573-593, 731, 1121-1124, 1276-1303
sklearn.datasets.setup                              14      2    86%   22-23
sklearn.datasets.species_distributions              72     55    24%   46-48, 69-82, 98-108, 125-135, 210-257
sklearn.datasets.svmlight_format                    96      6    94%   127, 238, 334, 339, 347-348
sklearn.datasets.twenty_newsgroups                 121     88    27%   71-92, 133-139, 143, 148-195, 225-271
sklearn.decomposition                                8      0   100%   
sklearn.decomposition.dict_learning                289     28    90%   87, 91-92, 305-306, 308, 417, 445, 450-451, 453, 476, 478, 481, 571, 581, 605, 647, 797, 1117-1133
sklearn.decomposition.factor_analysis               77      3    96%   138, 165-166
sklearn.decomposition.fastica_                     153     17    89%   79-83, 124-128, 239, 264, 271-275, 287-288, 312-314, 329
sklearn.decomposition.kernel_pca                    82      2    98%   172, 258
sklearn.decomposition.nmf                          202      8    96%   104, 254, 373, 393-395, 418, 540
sklearn.decomposition.pca                          153      5    97%   49, 62-63, 244, 329
sklearn.decomposition.sparse_pca                    59      0   100%   
sklearn.dummy                                      110      0   100%   
sklearn.ensemble                                    15      0   100%   
sklearn.ensemble.base                               25      1    96%   76
sklearn.ensemble.forest                            307     13    96%   77, 131-132, 148-149, 236, 306, 411, 446-449, 478, 593-594
sklearn.ensemble.gradient_boosting                 348     13    96%   53, 123, 188, 215, 368, 419, 546, 551, 595, 604, 607-608, 614
sklearn.ensemble.partial_dependence                159    104    35%   54, 56, 237-388
sklearn.ensemble.setup                              10      2    80%   16-17
sklearn.ensemble.weight_boosting                   272     24    91%   100-104, 143, 193, 230, 239-240, 379, 531-532, 595-596, 622, 675, 701-703, 806, 912, 987, 989, 1001, 1018
sklearn.externals                                    1      0   100%   
sklearn.externals.joblib                            10      0   100%   
sklearn.externals.joblib._compat                     4      2    50%   7-8
sklearn.externals.joblib.disk                       51     11    78%   28, 84-88, 94, 103-107
sklearn.externals.joblib.format_stack              227     46    80%   34-35, 50, 55, 63, 67-70, 133-135, 146, 148, 168-173, 193-197, 201-205, 208, 215-224, 246-247, 282-286, 299-300, 323, 345-346, 363-367, 372-375, 405, 412
sklearn.externals.joblib.func_inspect              117     12    90%   73, 98-102, 109-110, 138-139, 185, 224, 228
sklearn.externals.joblib.hashing                    91     18    80%   22, 54-55, 71, 88-99, 144, 155-159, 195
sklearn.externals.joblib.logger                     73     10    86%   29, 42, 68, 78, 94, 99, 115, 121, 138-139
sklearn.externals.joblib.memory                    235     20    91%   19-20, 58, 130, 159-160, 257, 289-290, 300, 353, 355, 375-376, 395-396, 406, 476, 522, 547
sklearn.externals.joblib.my_exceptions              42      3    93%   43, 70-71
sklearn.externals.joblib.numpy_pickle              170     20    88%   21-27, 86, 115, 122-125, 196-197, 243-246, 272-273, 290, 298
sklearn.externals.joblib.parallel                  219     21    90%   18-19, 28-29, 38-40, 53, 103, 123, 330-331, 398-400, 450, 456, 469-471, 477, 506
sklearn.externals.setup                              6      0   100%   
sklearn.externals.six                              170     64    62%   34-40, 54-56, 92-94, 107-115, 190, 195-201, 205-213, 228-230, 234-237, 240, 245, 260, 264, 272-284, 298-308, 319-320
sklearn.feature_extraction                           5      0   100%   
sklearn.feature_extraction.dict_vectorizer          98      3    97%   228, 246-247
sklearn.feature_extraction.hashing                  39      1    97%   97
sklearn.feature_extraction.image                   147      0   100%   
sklearn.feature_extraction.setup                    10      0   100%   
sklearn.feature_extraction.stop_words                1      0   100%   
sklearn.feature_extraction.text                    414     15    96%   104-105, 232, 400, 405-406, 595, 601, 767, 832, 891, 900, 941-944, 1049
sklearn.feature_selection                           13      0   100%   
sklearn.feature_selection.rfe                      102      7    93%   120, 129, 142, 149, 216, 219, 358
sklearn.feature_selection.selector_mixin            46      5    89%   45, 57, 88, 99-102
sklearn.feature_selection.univariate_selection     180      7    96%   226, 300, 315, 376, 382, 433, 584
sklearn.gaussian_process                             5      0   100%   
sklearn.gaussian_process.correlation_models         78     28    64%   48, 53, 91, 96, 129-148, 178-185, 221, 227, 271, 277
sklearn.gaussian_process.gaussian_process          335    105    69%   21, 309, 318, 320, 324, 329, 344, 349, 355, 361, 373-381, 426, 459-467, 495-520, 582-586, 597-598, 604-610, 616-623, 680-682, 688, 722-724, 742-744, 750-799, 812, 828, 834, 845, 848, 853-856, 868, 871, 876
sklearn.gaussian_process.regression_models          19      0   100%   
sklearn.grid_search                                227      6    97%   284, 369, 411, 440, 668-670
sklearn.hmm                                        452     34    92%   296-297, 402, 437-439, 521, 524, 695, 704-711, 737, 749, 800-801, 931, 995, 999, 1003, 1008, 1017, 1103, 1160, 1167, 1188, 1199-1201
sklearn.isotonic                                    54      1    98%   54
sklearn.kernel_approximation                       153      4    97%   247, 256, 454, 485
sklearn.lda                                         93     11    88%   91-95, 123, 130, 139, 141-142, 161
sklearn.linear_model                                14      0   100%   
sklearn.linear_model.base                          128      2    98%   264, 294
sklearn.linear_model.bayes                         126      8    94%   178-184, 210, 423
sklearn.linear_model.coordinate_descent            345     29    92%   138-139, 209-210, 262, 315, 513, 708-709, 730, 740-741, 766-771, 885, 888-890, 925, 980-982, 1192-1193, 1308-1309, 1363, 1382
sklearn.linear_model.least_angle                   362     18    95%   165-166, 285-286, 293-302, 403, 518, 787-790
sklearn.linear_model.logistic                       11      0   100%   
sklearn.linear_model.omp                           192     15    92%   87-88, 180-181, 279, 288, 376, 379, 384, 541-544, 547-548, 557
sklearn.linear_model.passive_aggressive             25      0   100%   
sklearn.linear_model.perceptron                      5      0   100%   
sklearn.linear_model.randomized_l1                 174      8    95%   69, 100, 122, 142, 573, 591, 600-601
sklearn.linear_model.ridge                         269     21    92%   134, 141-158, 511, 516, 537, 575-579, 591, 678, 885
sklearn.linear_model.setup                          19      2    89%   41-42
sklearn.linear_model.stochastic_gradient           309     13    96%   64-65, 136-137, 307-310, 350, 396-401, 833-836
sklearn.manifold                                     5      0   100%   
sklearn.manifold.isomap                             46      0   100%   
sklearn.manifold.locally_linear                    218     22    90%   45, 47, 147, 176, 271, 274, 283, 286, 289, 310, 333-334, 360, 384-389, 437, 482-483
sklearn.manifold.mds                               102     21    79%   79-80, 98-107, 122, 124-128, 229-233, 345, 378, 385-388
sklearn.manifold.spectral_embedding                144     17    88%   197-200, 270-282, 306, 387, 459
sklearn.metrics                                      8      0   100%   
sklearn.metrics.cluster                             14      0   100%   
sklearn.metrics.cluster.setup                       14      2    86%   22-23
sklearn.metrics.cluster.supervised                 110      0   100%   
sklearn.metrics.cluster.unsupervised                27      0   100%   
sklearn.metrics.metrics                            387     18    95%   139, 142, 669-671, 673-675, 681-683, 968-972, 1847-1848
sklearn.metrics.pairwise                           177      4    98%   173, 532, 636, 796
sklearn.metrics.scorer                              33      2    94%   72-74
sklearn.metrics.setup                               13      2    85%   20-21
sklearn.mixture                                      5      0   100%   
sklearn.mixture.dpgmm                              344     23    93%   211-216, 219, 222, 251, 266, 367-370, 460, 465, 502, 685, 692, 737-740
sklearn.mixture.gmm                                256     29    89%   244, 261-268, 299, 301, 303, 357-358, 419, 421, 440, 474, 566, 590, 619, 621, 624, 627, 637, 640, 645-648, 666
sklearn.multiclass                                 165      9    95%   53, 79, 126, 258, 277-281, 542
sklearn.naive_bayes                                124      6    95%   160, 230, 238, 246, 267, 440
sklearn.neighbors                                    7      0   100%   
sklearn.neighbors.base                             225      9    96%   65, 74, 90, 118, 122, 126, 152, 255, 463
sklearn.neighbors.classification                    62      0   100%   
sklearn.neighbors.graph                             10      0   100%   
sklearn.neighbors.nearest_centroid                  50      2    96%   90, 156
sklearn.neighbors.regression                        32      0   100%   
sklearn.neighbors.setup                             10      0   100%   
sklearn.neighbors.unsupervised                       7      0   100%   
sklearn.pipeline                                   138      6    96%   81, 91, 188, 272, 334, 341
sklearn.pls                                        202     24    88%   63-64, 98-99, 227, 231, 238, 242, 244, 247, 250, 366-372, 402-404, 834, 837, 842
sklearn.preprocessing                              389      5    99%   62, 430, 439, 456, 1027
sklearn.qda                                         75      0   100%   
sklearn.random_projection                          114      0   100%   
sklearn.semi_supervised                              2      0   100%   
sklearn.semi_supervised.label_propagation          112      3    97%   130, 135, 171
sklearn.setup                                       56      4    93%   68-70, 83-84
sklearn.svm                                          4      0   100%   
sklearn.svm.base                                   278      8    97%   129, 278, 307, 351, 494, 554, 624, 718
sklearn.svm.bounds                                  35      0   100%   
sklearn.svm.classes                                 24      0   100%   
sklearn.svm.setup                                   26      4    85%   54-55, 83-84
sklearn.tree                                         6      0   100%   
sklearn.tree.export                                 45      8    82%   71-76, 100, 127, 133
sklearn.tree.setup                                  14      2    86%   22-23
sklearn.tree.tree                                  180      5    97%   78, 215, 242, 246, 349
sklearn.utils                                      112      3    97%   321, 354, 358
sklearn.utils._csgraph                              21      3    86%   65-66, 69
sklearn.utils.arpack                               627    308    51%   307, 312, 315, 364-368, 429, 431, 433, 439-450, 453, 455, 463-497, 500, 503, 509, 516, 536, 542-543, 545-547, 553, 555, 578, 586, 629, 631, 633, 638-679, 682, 685, 691, 706, 718, 728, 734-735, 737, 739, 745-748, 771, 794-803, 810-834, 841-842, 848, 852, 859-875, 880, 886-887, 906, 933-945, 948-953, 963-984, 987, 990, 993-998, 1009, 1023, 1032-1046, 1184, 1186-1191, 1196, 1202, 1205, 1214-1253, 1423-1440, 1443, 1445-1450, 1455, 1462, 1470-1480, 1484, 1494-1495, 1499-1529, 1563-1599, 1603
sklearn.utils.bench                                  3      0   100%   
sklearn.utils.class_weight                          19      0   100%   
sklearn.utils.extmath                              134     11    92%   46-49, 54, 113-115, 190-192, 306, 369
sklearn.utils.fixes                                142     33    77%   25-26, 29, 33-38, 47, 72-73, 78-83, 93, 108-110, 116, 131, 141, 160, 167, 178, 196-197, 207-209, 217, 228-229, 231
sklearn.utils.graph                                 75      5    93%   52, 66, 72, 112, 131
sklearn.utils.multiclass                            20      0   100%   
sklearn.utils.setup                                 26      2    92%   70-71
sklearn.utils.sparsetools                            1      0   100%   
sklearn.utils.sparsetools.csgraph                   61     34    44%   17-19, 29, 33-34, 38-50, 54, 58-63, 67-71, 77-80
sklearn.utils.sparsetools.setup                     11      2    82%   16-17
sklearn.utils.validation                           105      1    99%   179
------------------------------------------------------------------------------
TOTAL                                            17190   1979    88%   
----------------------------------------------------------------------
Ran 1855 tests in 293.749s

FAILED (SKIP=16, failures=2)
<https://jenkins.shiningpanda-ci.com/scikit-learn/job/python-2.6-numpy-1.3.0-scipy-0.7.2/ws/sklearn/cluster/k_means_.py>:1161: RuntimeWarning: init_size=3 should be larger than k=8. Setting it to 3*k
 init_size=init_size)
[Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    0.0s finished
<https://jenkins.shiningpanda-ci.com/scikit-learn/job/python-2.6-numpy-1.3.0-scipy-0.7.2/ws/sklearn/qda.py>:158: RuntimeWarning: divide by zero encountered in log
 + np.log(self.priors_))
make: *** [test-coverage] Error 1
Build step 'Custom Python Builder' marked build as failure
Archiving artifacts
Skipping Cobertura coverage report as build was not UNSTABLE or better ...

@arjoly
Owner

I'm looking for the bug.

@amueller
Owner

did you see the jenkins failure?

Owner

Yes see the pr.
I will submit a patch soon.

@arjoly
Owner

Oddly with python 2.6 and numpy 1.3, I got :

In [1]: import numpy as np

In [2]: from __future__ import division

In [3]: a = np.array([2, 0, 0])

In [4]: b = np.array([4, 1, 0])

In [5]: a / b
Out[5]: array([ 0.5,  0. ,  0. ])

instead of

Out[5]: array([ 0.5,  0. ,  nan ])

patch is coming.

@arjoly
Owner

For reference, this is a new step to solve #558.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on May 6, 2013
  1. @arjoly

    ENH more pythonic way to treat list of list of labels

    arjoly authored arjoly committed
  2. @arjoly

    ENH add jaccard similarity score metrics

    arjoly authored arjoly committed
This page is out of date. Refresh to see the latest.
View
1  doc/modules/classes.rst
@@ -691,6 +691,7 @@ Classification metrics
metrics.fbeta_score
metrics.hamming_loss
metrics.hinge_loss
+ metrics.jaccard_similarity_score
metrics.matthews_corrcoef
metrics.precision_recall_curve
metrics.precision_recall_fscore_support
View
85 doc/modules/model_evaluation.rst
@@ -58,9 +58,10 @@ And some also work in the multilabel case:
.. autosummary::
:template: function.rst
- accuracy_score
- hamming_loss
- zero_one_loss
+ accuracy_score
+ hamming_loss
+ jaccard_similarity_score
+ zero_one_loss
Some metrics might require probability estimates of the positive class,
@@ -72,11 +73,11 @@ Accuracy score
---------------
The :func:`accuracy_score` function computes the
`accuracy <http://en.wikipedia.org/wiki/Accuracy_and_precision>`_, the fraction
-(default) or the number of correct predictions. In multilabel classification,
-the function returns the subset accuracy:
-the entire set of labels for a sample must be entirely correct
-or the sample has an accuracy of zero.
-(See the Hamming loss for a more forgiving evaluation metric.)
+(default) or the number of correct predictions.
+
+In multilabel classification, the function returns the subset accuracy: if
+the entire set of predicted labels for a sample strictly match with the true
+set of labels, then the subset accuracy is 1.0, otherwise it is 0.0.
If :math:`\hat{y}_i` is the predicted value of
the :math:`i`-th sample and :math:`y_i` is the corresponding true value,
@@ -99,15 +100,15 @@ where :math:`1(x)` is the `indicator function
>>> accuracy_score(y_true, y_pred, normalize=False)
2
- In the multilabel case with binary indicator format:
+In the multilabel case with binary indicator format:
- >>> accuracy_score(np.array([[0.0, 1.0], [1.0, 1.0]]), np.zeros((2, 2)))
- 0.0
+ >>> accuracy_score(np.array([[0.0, 1.0], [1.0, 1.0]]), np.ones((2, 2)))
+ 0.5
- and with a list of labels format:
+and with a list of labels format:
- >>> accuracy_score([(1, 2), (3,)], [(1, 2), tuple()])
- 0.5
+ >>> accuracy_score([(1,), (3,)], [(1, 2), tuple()])
+ 0.0
.. topic:: Example:
@@ -283,6 +284,48 @@ and with a list of labels format:
over samples, the Hamming loss is always between zero and one.
+Jaccard similarity coefficient score
+------------------------------------
+
+The :func:`jaccard_similarity_score` function computes the average (default)
+or sum of `Jaccard similarity coefficients
+<http://en.wikipedia.org/wiki/Jaccard_index>`_, also called Jaccard index,
+between pairs of label sets.
+
+The Jaccard similarity coefficient of the :math:`i`-th samples
+with a ground truth label set :math:`y_i` and a predicted label set
+:math:`\hat{y}_i` is defined as
+
+.. math::
+
+ J(y_i, \hat{y}_i) = \frac{|y_i \cap \hat{y}_i|}{|y_i \cup \hat{y}_i|}.
+
+In binary and multiclass classification, the Jaccard similarity coefficient
+score is equal to the classification accuracy.
+
+::
+
+ >>> import numpy as np
+ >>> from sklearn.metrics import jaccard_similarity_score
+ >>> y_pred = [0, 2, 1, 3]
+ >>> y_true = [0, 1, 2, 3]
+ >>> jaccard_similarity_score(y_true, y_pred)
+ 0.5
@glouppe Owner
glouppe added a note

If y_pred and y_true are sets, shouldn't the score be 1.0 in this case? (Both y_pred and y_true contains the same labels, modulo their order, but this shouldn't be of importance since we are talking about sets, should it?)

@glouppe Owner
glouppe added a note

Okat, I get it, y_true[i] and y_pred[i] are the sets to be compared. I think this should be make clearer then: jaccard_similarity_score computes the average (?) Jaccard similarity (as you define it above) over several pairs of sets.

@arjoly Owner
arjoly added a note

Improve the doc in 7267e3c

@jaquesgrobler Owner

Reads easier now :+1:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ >>> jaccard_similarity_score(y_true, y_pred, normalize=False)
+ 2
+
+In the multilabel case with binary indicator format:
+
+ >>> jaccard_similarity_score(np.array([[0.0, 1.0], [1.0, 1.0]]), np.ones((2, 2)))
+ 0.75
+
+and with a list of labels format:
+
+ >>> jaccard_similarity_score([(1,), (3,)], [(1, 2), tuple()])
+ 0.25
+
+
+
.. _precision_recall_f_measure_metrics:
Precision, recall and F-measures
@@ -690,6 +733,7 @@ then the 0-1 loss :math:`L_{0-1}` is defined as:
where :math:`1(x)` is the `indicator function
<http://en.wikipedia.org/wiki/Indicator_function>`_.
+
>>> from sklearn.metrics import zero_one_loss
>>> y_pred = [1, 2, 3, 4]
>>> y_true = [2, 2, 3, 4]
@@ -698,15 +742,16 @@ where :math:`1(x)` is the `indicator function
>>> zero_one_loss(y_true, y_pred, normalize=False)
1
- In the multilabel case with binary indicator format:
+In the multilabel case with binary indicator format:
- >>> zero_one_loss(np.array([[0.0, 1.0], [1.0, 1.0]]), np.zeros((2, 2)))
- 1.0
+ >>> zero_one_loss(np.array([[0.0, 1.0], [1.0, 1.0]]), np.ones((2, 2)))
+ 0.5
- and with a list of labels format:
+and with a list of labels format:
+
+ >>> zero_one_loss([(1,), (3,)], [(1, 2), tuple()])
+ 1.0
- >>> zero_one_loss([(1, 2), (3,)], [(1, 2), tuple()])
- 0.5
.. topic:: Example:
View
5 doc/whats_new.rst
@@ -27,8 +27,9 @@ Changelog
guide for details and examples.
- :func:`metrics.accuracy_score`, :func:`metrics.zero_one_loss` support
- multi-label classification and a new metric :func:`metrics.hamming_loss`
- is added with multi-label support by `Arnaud Joly`_.
+ multi-label classification and two new metrics :func:`metrics.hamming_loss`
+ and :func:`metrics.jaccard_similarity_score`
+ are added with multi-label support by `Arnaud Joly`_.
- Speed and memory usage improvements in
:class:`feature_extraction.text.CountVectorizer` and
View
4 sklearn/metrics/__init__.py
@@ -11,9 +11,10 @@
confusion_matrix,
explained_variance_score,
f1_score,
- hamming_loss,
fbeta_score,
+ hamming_loss,
hinge_loss,
+ jaccard_similarity_score,
matthews_corrcoef,
mean_squared_error,
mean_absolute_error,
@@ -66,6 +67,7 @@
'hinge_loss',
'homogeneity_completeness_v_measure',
'homogeneity_score',
+ 'jaccard_similarity_score',
'matthews_corrcoef',
'mean_squared_error',
'mean_absolute_error',
View
221 sklearn/metrics/metrics.py
@@ -90,7 +90,7 @@ def _check_1d_array(y1, y2, ravel=False):
It convert 1d arrays (y1 and y2) of various shape to a common shape
representation. Note that ``y1`` and ``y2`` should have the same number of
- element.
+ elements.
Parameters
----------
@@ -299,7 +299,7 @@ def average_precision_score(y_true, y_score):
References
----------
.. [1] `Wikipedia entry for the Average precision
- <http://en.wikipedia.org/wiki/Information_retrieval#Average_precision>`_
+ <http://en.wikipedia.org/wiki/Information_retrieval#Average_precision>`_
See also
--------
@@ -721,7 +721,7 @@ def confusion_matrix(y_true, y_pred, labels=None):
References
----------
- .. [2] `Wikipedia entry for the Confusion matrix
+ .. [1] `Wikipedia entry for the Confusion matrix
<http://en.wikipedia.org/wiki/Confusion_matrix>`_
Examples
@@ -797,8 +797,7 @@ def zero_one_loss(y_true, y_pred, normalize=True):
See also
--------
- accuracy_score : Compute the accuracy score
- hamming_loss : Compute the average Hamming loss
+ accuracy_score, hamming_loss, jaccard_similarity_score
Examples
--------
@@ -810,17 +809,21 @@ def zero_one_loss(y_true, y_pred, normalize=True):
>>> zero_one_loss(y_true, y_pred, normalize=False)
1
- In the multilabel case with binary indicator format
- >>> zero_one_loss(np.array([[0.0, 1.0], [1.0, 1.0]]), np.zeros((2, 2)))
- 1.0
+ In the multilabel case with binary indicator format:
- and with a list of labels format
- >>> zero_one_loss([(1, 2), (3,)], [(1, 2), tuple()])
+ >>> zero_one_loss(np.array([[0.0, 1.0], [1.0, 1.0]]), np.ones((2, 2)))
0.5
+ and with a list of labels format:
+
+ >>> zero_one_loss([(1,), (3,)], [(1, 2), tuple()])
+ 1.0
+
+
"""
y_true, y_pred = check_arrays(y_true, y_pred, allow_lists=True)
- score = accuracy_score(y_true, y_pred, normalize=normalize)
+ score = accuracy_score(y_true, y_pred,
+ normalize=normalize)
if normalize:
return 1 - score
@@ -880,9 +883,149 @@ def zero_one(y_true, y_pred, normalize=False):
###############################################################################
# Multiclass score functions
###############################################################################
+
+def jaccard_similarity_score(y_true, y_pred, normalize=True, pos_label=1):
+ """Jaccard similarity coefficient score
+
+ The Jaccard index [1], or Jaccard similarity coefficient, defined as
+ the size of the intersection divided by the size of the union of two label
+ sets, is used to compare set of predicted labels for a sample to the
+ corresponding set of labels in ``y_true``.
+
+ Parameters
+ ----------
+ y_true : array-like or list of labels or label indicator matrix
+ Ground truth (correct) labels.
+
+ y_pred : array-like or list of labels or label indicator matrix
+ Predicted labels, as returned by a classifier.
+
+ normalize : bool, optional (default=True)
+ If ``False``, return the sum of the Jaccard similarity coefficient
+ over the sample set. Otherwise, return the average of Jaccard
+ similarity coefficient.
+
+ pos_label : int, 1 by default
+ It is used to infer what is a positive label in the label indicator
+ matrix format.
+
+ Returns
+ -------
+ score : float
+ If ``normalize == True``, return the average Jaccard similarity
+ coefficient, else it returns the sum of the Jaccard similarity
+ coefficient over the sample set.
+
+ The best performance is 1 with ``normalize == True`` and the number
+ of samples with ``normalize == False``.
+
+ See also
+ --------
+ accuracy_score, hamming_loss, zero_one_loss
+
+ Notes
+ -----
+ In binary and multiclass classification, this function is equivalent
+ to the ``accuracy_score``. It differs in the multilabel classification
+ problem.
+
+ References
+ ----------
+ .. [1] `Wikipedia entry for the Jaccard index
+ <http://en.wikipedia.org/wiki/Jaccard_index>`_
+
+
+ Examples
+ --------
+ >>> import numpy as np
+ >>> from sklearn.metrics import jaccard_similarity_score
+ >>> y_pred = [0, 2, 1, 3]
+ >>> y_true = [0, 1, 2, 3]
+ >>> jaccard_similarity_score(y_true, y_pred)
+ 0.5
+ >>> jaccard_similarity_score(y_true, y_pred, normalize=False)
+ 2
+
+ In the multilabel case with binary indicator format:
+
+ >>> jaccard_similarity_score(np.array([[0.0, 1.0], [1.0, 1.0]]),\
+ np.ones((2, 2)))
+ 0.75
+
+ and with a list of labels format:
+
+ >>> jaccard_similarity_score([(1,), (3,)], [(1, 2), tuple()])
+ 0.25
+
+ """
+ y_true, y_pred = check_arrays(y_true, y_pred, allow_lists=True)
+
+ # Compute accuracy for each possible representation
+ if is_multilabel(y_true):
+
+ # Handle mix representation
+ if type(y_true) != type(y_pred):
+ labels = unique_labels(y_true, y_pred)
+ lb = LabelBinarizer()
+ lb.fit([labels.tolist()])
+ y_true = lb.transform(y_true)
+ y_pred = lb.transform(y_pred)
+
+ if is_label_indicator_matrix(y_true):
+ try:
+ # oddly, we may get an "invalid" rather than a "divide"
+ # error here
+ old_err_settings = np.seterr(divide='ignore',
+ invalid='ignore')
+ y_pred_pos_label = y_pred == pos_label
+ y_true_pos_label = y_true == pos_label
+ score = (np.sum(np.logical_and(y_pred_pos_label,
+ y_true_pos_label),
+ axis=1) /
+ np.sum(np.logical_or(y_pred_pos_label,
+ y_true_pos_label),
+ axis=1))
+
+ # If there is no label, it results in a Nan instead, we set
+ # the jaccard to 1: lim_{x->0} x/x = 1
+ score[np.isnan(score)] = 1.0
+ finally:
+ np.seterr(**old_err_settings)
+
+ else:
+ score = np.empty(len(y_true))
+ for i, (true, pred) in enumerate(zip(y_pred, y_true)):
+ true_set = set(true)
+ pred_set = set(pred)
+ size_true_union_pred = len(true_set | pred_set)
+ # If there is no label, it results in a Nan instead, we set
+ # the jaccard to 1: lim_{x->0} x/x = 1
+ if size_true_union_pred == 0:
+ score[i] = 1.
+ else:
+ score[i] = (len(true_set & pred_set) /
+ size_true_union_pred)
+
+ else:
+ y_true, y_pred = check_arrays(y_true, y_pred)
+
+ # Handle mix shape
+ y_true, y_pred = _check_1d_array(y_true, y_pred, ravel=True)
+ score = y_true == y_pred
+
+ if normalize:
+ return np.mean(score)
+ else:
+ return np.sum(score)
+
+
def accuracy_score(y_true, y_pred, normalize=True):
"""Accuracy classification score.
+ In multilabel classification, this function computes subset accuracy:
+ the set of labels predicted for a sample must *exactly* match the
+ corresponding set of labels in y_true.
+
Parameters
----------
y_true : array-like or list of labels or label indicator matrix
@@ -898,18 +1041,21 @@ def accuracy_score(y_true, y_pred, normalize=True):
Returns
-------
score : float
- The fraction of correct predictions in ``y_pred``.
- The best performance is 1.
+ If ``normalize == True``, return the correctly classified samples
+ (float), else it returns the number of correctly classified samples
+ (int).
+
+ The best performance is 1 with ``normalize == True`` and the number
+ of samples with ``normalize == False``.
See also
--------
- zero_one_loss : zero-one classification loss
+ jaccard_similarity_score, hamming_loss, zero_one_loss
Notes
-----
- In multilabel classification, this function computes subset accuracy:
- the set of labels predicted for a sample must *exactly* match the
- corresponding set of labels in y_true.
+ In binary and multiclass classification, this function is equal
+ to the ``jaccard_similarity_score`` function.
Examples
--------
@@ -924,18 +1070,20 @@ def accuracy_score(y_true, y_pred, normalize=True):
In the multilabel case with binary indicator format:
- >>> accuracy_score(np.array([[0.0, 1.0], [1.0, 1.0]]), np.zeros((2, 2)))
- 0.0
+ >>> accuracy_score(np.array([[0.0, 1.0], [1.0, 1.0]]), np.ones((2, 2)))
+ 0.5
and with a list of labels format:
- >>> accuracy_score([(1, 2), (3,)], [(1, 2), tuple()])
- 0.5
+
+ >>> accuracy_score([(1,), (3,)], [(1, 2), tuple()])
+ 0.0
"""
y_true, y_pred = check_arrays(y_true, y_pred, allow_lists=True)
# Compute accuracy for each possible representation
if is_multilabel(y_true):
+
# Handle mix representation
if type(y_true) != type(y_pred):
labels = unique_labels(y_true, y_pred)
@@ -947,13 +1095,8 @@ def accuracy_score(y_true, y_pred, normalize=True):
if is_label_indicator_matrix(y_true):
score = (y_pred != y_true).sum(axis=1) == 0
else:
- # numpy 1.3 : it is required to perform a unique before setxor1d
- # to get unique label in numpy 1.3.
- # This is needed in order to handle redundant labels.
- # FIXME : check if this can be simplified when 1.3 is removed
- score = np.array([np.size(np.setxor1d(np.unique(pred),
- np.unique(true))) == 0
- for pred, true in zip(y_pred, y_true)])
+ score = np.array([len(set(true) ^ set(pred)) == 0
+ for pred, true in zip(y_pred, y_true)])
else:
y_true, y_pred = check_arrays(y_true, y_pred)
@@ -1644,21 +1787,22 @@ def hamming_loss(y_true, y_pred, classes=None):
See Also
--------
- zero_one_loss : Zero-one classification loss
+ accuracy_score, jaccard_similarity_score, zero_one_loss
Notes
-----
In multiclass classification, the Hamming loss correspond to the Hamming
distance between ``y_true`` and ``y_pred`` which is equivalent to the
- ``zero_one_loss`` function.
+ subset ``zero_one_loss`` function.
In multilabel classification, the Hamming loss is different from the
- zero-one loss. The zero-one loss considers the entire set of labels for a
- given sample incorrect if it does entirely match the true set of labels.
- Hamming loss is more forgiving in that it penalizes the individual labels.
+ subset zero-one loss. The zero-one loss considers the entire set of labels
+ for a given sample incorrect if it does entirely match the true set of
+ labels. Hamming loss is more forgiving in that it penalizes the individual
+ labels.
- The Hamming loss is upperbounded by the zero-one loss. When normalized
- over samples, the Hamming loss is always between 0 and 1.
+ The Hamming loss is upperbounded by the subset zero-one loss. When
+ normalized over samples, the Hamming loss is always between 0 and 1.
References
----------
@@ -1706,12 +1850,7 @@ def hamming_loss(y_true, y_pred, classes=None):
if is_label_indicator_matrix(y_true):
return np.mean(y_true != y_pred)
else:
- # numpy 1.3 : it is required to perform a unique before setxor1d
- # to get unique label in numpy 1.3.
- # This is needed in order to handle redundant labels.
- # FIXME : check if this can be simplified when 1.3 is removed
- loss = np.array([np.size(np.setxor1d(np.unique(pred),
- np.unique(true)))
+ loss = np.array([len(set(pred) ^ set(true))
for pred, true in zip(y_pred, y_true)])
return np.mean(loss) / np.size(classes)
View
356 sklearn/metrics/tests/test_metrics.py
@@ -17,7 +17,8 @@
assert_almost_equal,
assert_not_equal,
assert_array_equal,
- assert_array_almost_equal)
+ assert_array_almost_equal,
+ assert_greater)
from sklearn.metrics import (accuracy_score,
@@ -31,6 +32,7 @@
fbeta_score,
hamming_loss,
hinge_loss,
+ jaccard_similarity_score,
matthews_corrcoef,
mean_squared_error,
mean_absolute_error,
@@ -45,21 +47,86 @@
zero_one_loss)
from sklearn.externals.six.moves import xrange
-ALL_METRICS = [accuracy_score,
- lambda y1, y2: accuracy_score(y1, y2, normalize=False),
- hamming_loss,
- zero_one_loss,
- lambda y1, y2: zero_one_loss(y1, y2, normalize=False),
- precision_score,
- recall_score,
- f1_score,
- lambda y1, y2: fbeta_score(y1, y2, beta=2),
- lambda y1, y2: fbeta_score(y1, y2, beta=0.5),
- matthews_corrcoef,
- mean_absolute_error,
- mean_squared_error,
- explained_variance_score,
- r2_score]
+ALL_METRICS = {
+ "accuracy_score": accuracy_score,
+ "unormalized_accuracy_score":
+ lambda y1, y2: accuracy_score(y1, y2, normalize=False),
+
+ "hamming_loss": hamming_loss,
+
+ "jaccard_similarity_score": jaccard_similarity_score,
+ "unormalized_jaccard_similarity_score":
+ lambda y1, y2: jaccard_similarity_score(y1, y2, normalize=False),
+
+ "zero_one_loss": zero_one_loss,
+ "unnormalized_zero_one_loss":
+ lambda y1, y2: zero_one_loss(y1, y2, normalize=False),
+
+ "precision_score": precision_score,
+ "recall_score": recall_score,
+ "f1_score": f1_score,
+ "f2_score": lambda y1, y2: fbeta_score(y1, y2, beta=2),
+ "f0.5_score": lambda y1, y2: fbeta_score(y1, y2, beta=0.5),
+ "matthews_corrcoef_score": matthews_corrcoef,
+
+ "mean_absolute_error": mean_absolute_error,
+ "mean_squared_error": mean_squared_error,
+ "explained_variance_score": explained_variance_score,
+ "r2_score": r2_score}
+
+METRICS_WITH_NORMALIZE_OPTION = {
+ "accuracy_score ": lambda y1, y2, normalize:
+ accuracy_score(y1, y2, normalize=normalize),
+ "jaccard_similarity_score": lambda y1, y2, normalize:
+ jaccard_similarity_score(y1, y2, normalize=normalize),
+ "zero_one_loss": lambda y1, y2, normalize:
+ zero_one_loss(y1, y2, normalize=normalize),
+}
+
+MULTILABELS_METRICS = {
+ "accuracy_score": accuracy_score,
+ "unormalized_accuracy_score":
+ lambda y1, y2: accuracy_score(y1, y2, normalize=False),
+
+ "hamming_loss": hamming_loss,
+
+ "jaccard_similarity_score": jaccard_similarity_score,
+ "unormalized_jaccard_similarity_score":
+ lambda y1, y2: jaccard_similarity_score(y1, y2, normalize=False),
+
+ "zero_one_loss": zero_one_loss,
+ "unnormalized_zero_one_loss":
+ lambda y1, y2: zero_one_loss(y1, y2, normalize=False),
+
+}
+
+SYMETRIC_METRICS = {
+ "accuracy_score": accuracy_score,
+ "unormalized_accuracy_score":
+ lambda y1, y2: accuracy_score(y1, y2, normalize=False),
+
+ "hamming_loss": hamming_loss,
+
+ "jaccard_similarity_score": jaccard_similarity_score,
+ "unormalized_jaccard_similarity_score":
+ lambda y1, y2: jaccard_similarity_score(y1, y2, normalize=False),
+
+ "zero_one_loss": zero_one_loss,
+ "unnormalized_zero_one_loss":
+ lambda y1, y2: zero_one_loss(y1, y2, normalize=False),
+
+ "f1_score": f1_score,
+ "matthews_corrcoef_score": matthews_corrcoef,
+ "mean_absolute_error": mean_absolute_error,
+ "mean_squared_error": mean_squared_error}
+
+NOT_SYMETRIC_METRICS = {
+ "precision_score": precision_score,
+ "recall_score": recall_score,
+ "f2_score": lambda y1, y2: fbeta_score(y1, y2, beta=2),
+ "f0.5_score": lambda y1, y2: fbeta_score(y1, y2, beta=0.5),
+ "explained_variance_score": explained_variance_score,
+ "r2_score": r2_score}
def make_prediction(dataset=None, binary=False):
@@ -585,25 +652,19 @@ def test_losses():
with warnings.catch_warnings(record=True):
# Throw deprecated warning
assert_equal(zero_one(y_true, y_pred), 11)
- assert_almost_equal(zero_one(y_true, y_pred, normalize=True),
- 11 / float(n_samples), 2)
assert_almost_equal(zero_one_loss(y_true, y_pred),
11 / float(n_samples), 2)
assert_equal(zero_one_loss(y_true, y_pred, normalize=False), 11)
- assert_almost_equal(zero_one_loss(y_true, y_true), 0.0, 2)
- assert_almost_equal(zero_one_loss(y_true, y_true, normalize=False), 0, 2)
+ assert_almost_equal(zero_one_loss(y_true, y_true), 0.0, 2)
assert_almost_equal(hamming_loss(y_true, y_pred),
2 * 11. / (n_samples * n_classes), 2)
assert_equal(accuracy_score(y_true, y_pred),
1 - zero_one_loss(y_true, y_pred))
- assert_equal(accuracy_score(y_true, y_pred, normalize=False),
- n_samples - zero_one_loss(y_true, y_pred, normalize=False))
-
- with warnings.catch_warnings(record=True):
+ with warnings.catch_warnings(True):
# Throw deprecated warning
assert_equal(zero_one_score(y_true, y_pred),
1 - zero_one_loss(y_true, y_pred))
@@ -648,31 +709,23 @@ def test_symmetry():
"""Test the symmetry of score and loss functions"""
y_true, y_pred, _ = make_prediction(binary=True)
- # Symmetric metric
- for metric in [accuracy_score,
- lambda y1, y2: accuracy_score(y1, y2, normalize=False),
- zero_one_loss,
- lambda y1, y2: zero_one_loss(y1, y2, normalize=False),
- hamming_loss,
- f1_score,
- matthews_corrcoef,
- mean_squared_error,
- mean_absolute_error]:
+ # We shouldn't forget any metrics
+ assert_equal(set(SYMETRIC_METRICS).union(set(NOT_SYMETRIC_METRICS)),
+ set(ALL_METRICS))
+
+ assert_equal(set(SYMETRIC_METRICS).intersection(set(NOT_SYMETRIC_METRICS)),
+ set([]))
+ # Symmetric metric
+ for name, metric in SYMETRIC_METRICS.items():
assert_equal(metric(y_true, y_pred),
metric(y_pred, y_true),
- msg="%s is not symetric" % metric)
+ msg="%s is not symetric" % name)
# Not symmetric metrics
- for metric in [precision_score,
- recall_score,
- lambda y1, y2: fbeta_score(y1, y2, beta=0.5),
- lambda y1, y2: fbeta_score(y1, y2, beta=2),
- explained_variance_score,
- r2_score]:
-
+ for name, metric in NOT_SYMETRIC_METRICS.items():
assert_true(metric(y_true, y_pred) != metric(y_pred, y_true),
- msg="%s seems to be symetric" % metric)
+ msg="%s seems to be symetric" % name)
# Deprecated metrics
with warnings.catch_warnings(record=True):
@@ -693,7 +746,7 @@ def test_sample_order_invariance():
y_true_shuffle, y_pred_shuffle = shuffle(y_true, y_pred,
random_state=0)
- for metric in ALL_METRICS:
+ for name, metric in ALL_METRICS.items():
assert_almost_equal(metric(y_true, y_pred),
metric(y_true_shuffle, y_pred_shuffle),
@@ -715,7 +768,7 @@ def test_format_invariance_with_1d_vectors():
y1_row = np.reshape(y1_1d, (1, -1))
y2_row = np.reshape(y2_1d, (1, -1))
- for metric in ALL_METRICS:
+ for name, metric in ALL_METRICS.items():
measure = metric(y1, y2)
@@ -851,11 +904,6 @@ def test_multioutput_regression_invariance_to_dimension_shuffling():
def test_multilabel_representation_invariance():
-
- MULTILABELS_METRICS = [hamming_loss,
- zero_one_loss,
- accuracy_score]
-
# Generate some data
n_classes = 4
n_samples = 50
@@ -887,7 +935,7 @@ def test_multilabel_representation_invariance():
y1_shuffle_binary_indicator = lb.transform(y1_shuffle)
y2_shuffle_binary_indicator = lb.transform(y2_shuffle)
- for metric in MULTILABELS_METRICS:
+ for name, metric in MULTILABELS_METRICS.items():
measure = metric(y1, y2)
# Check representation invariance
@@ -896,30 +944,30 @@ def test_multilabel_representation_invariance():
err_msg="%s failed representation invariance "
"between list of list of labels format "
"and dense binary indicator format."
- % metric)
+ % name)
# Check invariance with redundant labels with list of labels
assert_almost_equal(measure,
metric(y1, y2_redundant),
err_msg="%s failed rendundant label invariance"
- % metric)
+ % name)
assert_almost_equal(measure,
metric(y1_redundant, y2_redundant),
err_msg="%s failed rendundant label invariance"
- % metric)
+ % name)
assert_almost_equal(measure,
metric(y1_redundant, y2),
err_msg="%s failed rendundant label invariance"
- % metric)
+ % name)
# Check shuffling invariance with list of labels
assert_almost_equal(measure,
metric(y1_shuffle, y2_shuffle),
err_msg="%s failed shuffling invariance "
"with list of list of labels format."
- % metric)
+ % name)
# Check shuffling invariance with dense binary indicator matrix
assert_almost_equal(measure,
@@ -927,7 +975,7 @@ def test_multilabel_representation_invariance():
y2_shuffle_binary_indicator),
err_msg="%s failed shuffling invariance "
" with dense binary indicator format."
- % metric)
+ % name)
# Check invariance with mix input representation
assert_almost_equal(measure,
@@ -937,7 +985,7 @@ def test_multilabel_representation_invariance():
"invariance: y_true in list of list of "
"labels format and y_pred in dense binary"
"indicator format"
- % metric)
+ % name)
assert_almost_equal(measure,
metric(y1_binary_indicator,
@@ -946,15 +994,15 @@ def test_multilabel_representation_invariance():
"invariance: y_true in dense binary "
"indicator format and y_pred in list of "
"list of labels format."
- % metric)
+ % name)
-def test_multilabel_zero_one_loss():
+def test_multilabel_zero_one_loss_subset():
# Dense label indicator matrix format
- y1 = np.array([[0.0, 1.0, 1.0],
- [1.0, 0.0, 1.0]])
- y2 = np.array([[0.0, 0.0, 1.0],
- [1.0, 0.0, 1.0]])
+ y1 = np.array([[0, 1, 1],
+ [1, 0, 1]])
+ y2 = np.array([[0, 0, 1],
+ [1, 0, 1]])
assert_equal(0.5, zero_one_loss(y1, y2))
assert_equal(0.0, zero_one_loss(y1, y1))
@@ -964,20 +1012,9 @@ def test_multilabel_zero_one_loss():
assert_equal(1.0, zero_one_loss(y1, np.zeros(y1.shape)))
assert_equal(1.0, zero_one_loss(y2, np.zeros(y1.shape)))
- assert_equal(1, zero_one_loss(y1, y2, normalize=False))
- assert_equal(0, zero_one_loss(y1, y1, normalize=False))
- assert_equal(0, zero_one_loss(y2, y2, normalize=False))
- assert_equal(2, zero_one_loss(y2, np.logical_not(y2), normalize=False))
- assert_equal(2, zero_one_loss(y1, np.logical_not(y1), normalize=False))
- assert_equal(2, zero_one_loss(y1, np.zeros(y1.shape), normalize=False))
- assert_equal(2, zero_one_loss(y2, np.zeros(y1.shape), normalize=False))
-
# List of tuple of label
- y1 = [(1, 2,),
- (0, 2,)]
-
- y2 = [(2,),
- (0, 2,)]
+ y1 = [(1, 2,), (0, 2,)]
+ y2 = [(2,), (0, 2,)]
assert_equal(0.5, zero_one_loss(y1, y2))
assert_equal(0.0, zero_one_loss(y1, y1))
@@ -985,19 +1022,13 @@ def test_multilabel_zero_one_loss():
assert_equal(1.0, zero_one_loss(y2, [(), ()]))
assert_equal(1.0, zero_one_loss(y2, [tuple(), (10, )]))
- assert_equal(1, zero_one_loss(y1, y2, normalize=False))
- assert_equal(0, zero_one_loss(y1, y1, normalize=False))
- assert_equal(0, zero_one_loss(y2, y2, normalize=False))
- assert_equal(2, zero_one_loss(y2, [(), ()], normalize=False))
- assert_equal(2, zero_one_loss(y2, [tuple(), (10, )], normalize=False))
-
def test_multilabel_hamming_loss():
# Dense label indicator matrix format
- y1 = np.array([[0.0, 1.0, 1.0],
- [1.0, 0.0, 1.0]])
- y2 = np.array([[0.0, 0.0, 1.0],
- [1.0, 0.0, 1.0]])
+ y1 = np.array([[0, 1, 1],
+ [1, 0, 1]])
+ y2 = np.array([[0, 0, 1],
+ [1, 0, 1]])
assert_equal(1 / 6., hamming_loss(y1, y2))
assert_equal(0.0, hamming_loss(y1, y1))
@@ -1008,11 +1039,9 @@ def test_multilabel_hamming_loss():
assert_equal(0.5, hamming_loss(y2, np.zeros(y1.shape)))
# List of tuple of label
- y1 = [(1, 2,),
- (0, 2,)]
+ y1 = [(1, 2,), (0, 2,)]
- y2 = [(2,),
- (0, 2,)]
+ y2 = [(2,), (0, 2,)]
assert_equal(1 / 6., hamming_loss(y1, y2))
assert_equal(0.0, hamming_loss(y1, y1))
@@ -1023,12 +1052,12 @@ def test_multilabel_hamming_loss():
classes=np.arange(11)), 2)
-def test_multilabel_accuracy_score():
+def test_multilabel_accuracy_score_subset_accuracy():
# Dense label indicator matrix format
- y1 = np.array([[0.0, 1.0, 1.0],
- [1.0, 0.0, 1.0]])
- y2 = np.array([[0.0, 0.0, 1.0],
- [1.0, 0.0, 1.0]])
+ y1 = np.array([[0, 1, 1],
+ [1, 0, 1]])
+ y2 = np.array([[0, 0, 1],
+ [1, 0, 1]])
assert_equal(0.5, accuracy_score(y1, y2))
assert_equal(1.0, accuracy_score(y1, y1))
@@ -1038,27 +1067,128 @@ def test_multilabel_accuracy_score():
assert_equal(0.0, accuracy_score(y1, np.zeros(y1.shape)))
assert_equal(0.0, accuracy_score(y2, np.zeros(y1.shape)))
- assert_equal(1, accuracy_score(y1, y2, normalize=False))
- assert_equal(2, accuracy_score(y1, y1, normalize=False))
- assert_equal(2, accuracy_score(y2, y2, normalize=False))
- assert_equal(0, accuracy_score(y2, np.logical_not(y2), normalize=False))
- assert_equal(0, accuracy_score(y1, np.logical_not(y1), normalize=False))
- assert_equal(0, accuracy_score(y1, np.zeros(y1.shape), normalize=False))
- assert_equal(0, accuracy_score(y2, np.zeros(y1.shape), normalize=False))
-
# List of tuple of label
- y1 = [(1, 2,),
- (0, 2,)]
-
- y2 = [(2,),
- (0, 2,)]
+ y1 = [(1, 2,), (0, 2,)]
+ y2 = [(2,), (0, 2,)]
assert_equal(0.5, accuracy_score(y1, y2))
assert_equal(1.0, accuracy_score(y1, y1))
assert_equal(1.0, accuracy_score(y2, y2))
assert_equal(0.0, accuracy_score(y2, [(), ()]))
- assert_equal(1, accuracy_score(y1, y2, normalize=False))
- assert_equal(2, accuracy_score(y1, y1, normalize=False))
- assert_equal(2, accuracy_score(y2, y2, normalize=False))
- assert_equal(0, accuracy_score(y2, [(), ()], normalize=False))
+
+def test_multilabel_jaccard_similarity_score():
+ # Dense label indicator matrix format
+ y1 = np.array([[0.0, 1.0, 1.0],
+ [1.0, 0.0, 1.0]])
+ y2 = np.array([[0.0, 0.0, 1.0],
+ [1.0, 0.0, 1.0]])
+
+ # size(y1 \inter y2) = [1, 2]
+ # size(y1 \union y2) = [2, 2]
+
+ assert_equal(0.75, jaccard_similarity_score(y1, y2))
+ assert_equal(1.0, jaccard_similarity_score(y1, y1))
+
+ assert_equal(1.0, jaccard_similarity_score(y2, y2))
+ assert_equal(0.0, jaccard_similarity_score(y2, np.logical_not(y2)))
+ assert_equal(0.0, jaccard_similarity_score(y1, np.logical_not(y1)))
+ assert_equal(0.0, jaccard_similarity_score(y1, np.zeros(y1.shape)))
+ assert_equal(0.0, jaccard_similarity_score(y2, np.zeros(y1.shape)))
+
+ # With a given pos_label
+ assert_equal(0.75, jaccard_similarity_score(y1, y2, pos_label=0))
+ assert_equal(0.5, jaccard_similarity_score(y2, np.zeros(y1.shape),
+ pos_label=0))
+ assert_equal(1, jaccard_similarity_score(y1, y2, pos_label=10))
+
+ # List of tuple of label
+ y1 = [(1, 2,), (0, 2,)]
+ y2 = [(2,), (0, 2,)]
+
+ assert_equal(0.75, jaccard_similarity_score(y1, y2))
+ assert_equal(1.0, jaccard_similarity_score(y1, y1))
+ assert_equal(1.0, jaccard_similarity_score(y2, y2))
+ assert_equal(0.0, jaccard_similarity_score(y2, [(), ()]))
+
+ # |y3 inter y4 | = [0, 1, 1]
+ # |y3 union y4 | = [2, 1, 3]
+ y3 = [(0,), (1,), (3,)]
+ y4 = [(4,), (4,), (5, 6)]
+ assert_almost_equal(0, jaccard_similarity_score(y3, y4))
+
+ # |y5 inter y6 | = [0, 1, 1]
+ # |y5 union y6 | = [2, 1, 3]
+ y5 = [(0,), (1,), (2, 3)]
+ y6 = [(1,), (1,), (2, 0)]
+
+ assert_almost_equal((1 + 1. / 3) / 3, jaccard_similarity_score(y5, y6))
+
+
+def test_normalize_option_binary_classification():
+ # Test in the binary case
+ y_true, y_pred, _ = make_prediction(binary=True)
+ n_samples = y_true.shape[0]
+
+ for name, metrics in METRICS_WITH_NORMALIZE_OPTION.items():
+ measure = metrics(y_true, y_pred, normalize=True)
+ assert_greater(measure, 0,
+ msg="We failed to test correctly the normalize option")
+ assert_almost_equal(metrics(y_true, y_pred, normalize=False)
+ / n_samples, measure)
+
+
+def test_normalize_option_multiclasss_classification():
+ # Test in the multiclass case
+ y_true, y_pred, _ = make_prediction(binary=False)
+ n_samples = y_true.shape[0]
+
+ for name, metrics in METRICS_WITH_NORMALIZE_OPTION.items():
+ measure = metrics(y_true, y_pred, normalize=True)
+ assert_greater(measure, 0,
+ msg="We failed to test correctly the normalize option")
+ assert_almost_equal(metrics(y_true, y_pred, normalize=False)
+ / n_samples, measure)
+
+
+def test_normalize_option_multilabel_classification():
+ # Test in the multilabel case
+ n_classes = 4
+ n_samples = 100
+ _, y_true = make_multilabel_classification(n_features=1,
+ n_classes=n_classes,
+ random_state=0,
+ n_samples=n_samples)
+ _, y_pred = make_multilabel_classification(n_features=1,
+ n_classes=n_classes,
+ random_state=1,
+ n_samples=n_samples)
+
+ # Be sure to have at least one empty label
+ y_true += ([], )
+ y_pred += ([], )
+ n_samples += 1
+
+ lb = LabelBinarizer().fit([range(n_classes)])
+ y_true_binary_indicator = lb.transform(y_true)
+ y_pred_binary_indicator = lb.transform(y_pred)
+
+ for name, metrics in METRICS_WITH_NORMALIZE_OPTION.items():
+ # List of list of labels
+ measure = metrics(y_true, y_pred, normalize=True)
+ assert_greater(measure, 0,
+ msg="We failed to test correctly the normalize option")
+ assert_almost_equal(metrics(y_true, y_pred, normalize=False)
+ / n_samples, measure,
+ err_msg="Failed with %s" % name)
+
+ # Indicator matrix format
+ measure = metrics(y_true_binary_indicator,
+ y_pred_binary_indicator, normalize=True)
+ assert_greater(measure, 0,
+ msg="We failed to test correctly the normalize option")
+ assert_almost_equal(metrics(y_true_binary_indicator,
+ y_pred_binary_indicator, normalize=False)
+ / n_samples,
+ measure,
+ err_msg="Failed with %s" % name)
Something went wrong with that request. Please try again.