[MRG] multilabel accuracy with jaccard the index #1795

Merged
merged 2 commits into from May 6, 2013

Conversation

Projects
None yet
6 participants
Owner

arjoly commented Mar 20, 2013

This pr intends to bring multilabel accuracy and zero-one loss based on the jaccard index.

For reference, see section 7.1.1 of Mining Multi-label Data and the Wikipedia entry on Jaccard index.

TODO list:

  • Add multilabel accuracy based on jaccard similarity score
  • write narrative doc for accuracy based on jaccard similarity score
  • Update what's new?

This was removed of the pr scope:

  • Add multilabel zero-one loss based on jaccard distance
  • write narrative doc for multilabel zero-one loss based on jaccard distance
Owner

jaquesgrobler commented Mar 20, 2013

Had a quick look..looks cool so far. Nice work

Owner

arjoly commented Mar 20, 2013

Had a quick look..looks cool so far. Nice work

Thanks for you encouragements! :-)

sklearn/metrics/metrics.py
+ score = np.array([len(set(true) ^ set(pred)) == 0
+ for pred, true in zip(y_pred, y_true)])
+
+ elif similarity == "jaccard":
@mblondel

mblondel Mar 21, 2013

Owner

Why don't you create a new dedicated function?

@arjoly

arjoly Mar 22, 2013

Owner

In the multilabel litterature, subset accuracy and jaccard index accuracy are often simply called accuracy.
They will give the same score in the multiclass case.

And I think that it would reduce the amount of redundant code.

Note: the hamming loss metrics could be integrate in the accuracy score function with the hamming similarity.
Note2 : I got inspired to do this with the design of the precision, recall, f-score function.

@larsmans

larsmans Apr 18, 2013

Owner

I too would prefer a new function, say jaccard_accuracy. Redundant code isn't the first worry: it's consistent and easy to use APIs. The implementation can always be changed later.

@arjoly

arjoly Apr 18, 2013

Owner

Thanks a lot for the feedback ! I will do as you propose.

Owner

arjoly commented Mar 26, 2013

What do you think of the interface of accuracy_score and zero_one_loss for multilabels with an argument to specify the distance or similarity function?

Given the new interface, I would like to merge the hamming loss function in the accuracy_score and zero_one_loss function.

Owner

arjoly commented Mar 28, 2013

I have written the narrative doc and finished to implement the new jaccard metric.
Reviews are welcomed.

Owner

arjoly commented Mar 29, 2013

I am not able to retrieve the travis log :-(

Owner

arjoly commented Apr 5, 2013

Travis should be happy now.

Owner

arjoly commented Apr 19, 2013

@larsmans @mblondel now there is only one function jaccard_similarity_score.

Owner

arjoly commented May 1, 2013

I've rebased on top of master

Owner

arjoly commented May 2, 2013

Reviews are welcomed.

Owner

jaquesgrobler commented May 2, 2013

Nice! I'll have a read through this evening. Thanks

Owner

arjoly commented May 2, 2013

Thanks :-)

Owner

jaquesgrobler commented May 3, 2013

Read through this. Very nice PR..
Thanks for doing this!
I'm +1, though I'm sure you'll get some more reviews from the others
Great work, once again :)

Owner

arjoly commented May 6, 2013

Thanks for the review ! :-)

+ >>> y_pred = [0, 2, 1, 3]
+ >>> y_true = [0, 1, 2, 3]
+ >>> jaccard_similarity_score(y_true, y_pred)
+ 0.5
@glouppe

glouppe May 6, 2013

Owner

If y_pred and y_true are sets, shouldn't the score be 1.0 in this case? (Both y_pred and y_true contains the same labels, modulo their order, but this shouldn't be of importance since we are talking about sets, should it?)

@glouppe

glouppe May 6, 2013

Owner

Okat, I get it, y_true[i] and y_pred[i] are the sets to be compared. I think this should be make clearer then: jaccard_similarity_score computes the average (?) Jaccard similarity (as you define it above) over several pairs of sets.

@arjoly

arjoly May 6, 2013

Owner

Improve the doc in 7267e3c

@jaquesgrobler

jaquesgrobler May 6, 2013

Owner

Reads easier now 👍

doc/modules/model_evaluation.rst
+
+In multilabel classification, the function returns the subset accuracy: the
+entire set of labels for a sample must be entirely correct or the sample has an
+accuracy of zero.
@glouppe

glouppe May 6, 2013

Owner

I would rephrase this as "In multilabel classification, the function returns the subset accuracy: if the entire set of predicted labels for a sample strictly match with the true set of labels, then the subset accuracy is 1.0, otherwise it is 0.0."

doc/modules/model_evaluation.rst
+
+The :func:`jaccard_similarity_score` function computes the average (default)
+or sum of `Jaccard similarity coefficients, also called
+Jaccard index <http://en.wikipedia.org/wiki/Jaccard_index>`_, between
@glouppe

glouppe May 6, 2013

Owner

I would put the link on "Jaccard similarity coefficients" only.

doc/modules/model_evaluation.rst
+The :func:`jaccard_similarity_score` function computes the average (default)
+or sum of `Jaccard similarity coefficients, also called
+Jaccard index <http://en.wikipedia.org/wiki/Jaccard_index>`_, between
+pairs of label set.
@glouppe

glouppe May 6, 2013

Owner

Typo: sets

sklearn/metrics/metrics.py
+ np.sum(np.logical_or(y_pred == pos_label,
+ y_true == pos_label),
+ axis=1))
+
@glouppe

glouppe May 6, 2013

Owner

It may no be serious but y_pred == pos_label and y_true == pos_label are computed twice.

sklearn/metrics/metrics.py
+ else:
+ score = np.array([len(set(true) & set(pred)) /
+ len(set(true) | set(pred))
+ if set(true) | set(pred)
@glouppe

glouppe May 6, 2013

Owner

Same here, set(true) | set(pred) is computed twice.

Owner

glouppe commented May 6, 2013

Besides my comments above, this looks good to me. +1 after those are fixed.

Owner

arjoly commented May 6, 2013

If everything is ok, I can squash and push.

Owner

jaquesgrobler commented May 6, 2013

👍

@arjoly arjoly merged commit be842f2 into scikit-learn:master May 6, 2013

1 check was pending

default The Travis build is in progress
Details

@arjoly arjoly deleted the arjoly:metrics-jaccard branch May 6, 2013

Owner

arjoly commented May 6, 2013

Thanks a lot for the review !!!

Owner

arjoly commented May 6, 2013


See <https://jenkins.shiningpanda-ci.com/scikit-learn/job/python-2.6-numpy-1.3.0-scipy-0.7.2/1800/changes>

Changes:

[arnaud.v.joly] ENH more pythonic way to treat list of list of labels

[arnaud.v.joly] ENH add jaccard similarity score metrics

------------------------------------------
[...truncated 2265 lines...]
sklearn.tests.test_preprocessing.test_scaler_without_centering ... ok
Check that StandardScaler.fit does not change input ... ok
sklearn.tests.test_preprocessing.test_scale_sparse_with_mean_raise_exception ... ok
sklearn.tests.test_preprocessing.test_scale_function_without_centering ... ok
Check warning when scaling integer data ... ok
sklearn.tests.test_preprocessing.test_normalizer_l1 ... ok
sklearn.tests.test_preprocessing.test_normalizer_l2 ... ok
Check that invalid arguments yield ValueError ... ok
sklearn.tests.test_preprocessing.test_binarizer ... ok
sklearn.tests.test_preprocessing.test_label_binarizer ... ok
sklearn.tests.test_preprocessing.test_label_binarizer_set_label_encoding ... ok
sklearn.tests.test_preprocessing.test_label_binarizer_multilabel ... ok
Check that invalid arguments yield ValueError ... ok
Test OneHotEncoder's fit and transform. ... ok
Test LabelEncoder's transform and inverse_transform methods ... ok
Test fit_transform ... ok
Test LabelEncoder's transform and inverse_transform methods with ... ok
Check that invalid arguments yield ValueError ... ok
sklearn.tests.test_preprocessing.test_label_binarizer_iris ... ok
Check that LabelBinarizer can handle an unlabeled sample ... ok
Test that KernelCenterer is equivalent to StandardScaler ... ok
sklearn.tests.test_preprocessing.test_fit_transform ... ok
sklearn.tests.test_preprocessing.test_add_dummy_feature ... ok
sklearn.tests.test_preprocessing.test_add_dummy_feature_coo ... ok
sklearn.tests.test_preprocessing.test_add_dummy_feature_csc ... ok
sklearn.tests.test_preprocessing.test_add_dummy_feature_csr ... ok
sklearn.tests.test_preprocessing.test_balance_weights ... ok
QDA classification. ... ok
sklearn.tests.test_qda.test_qda_priors ... ok
sklearn.tests.test_qda.test_qda_store_covariances ... ok
sklearn.tests.test_random_projection.test_invalid_jl_domain ... ok
sklearn.tests.test_random_projection.test_input_size_jl_min_dim ... ok
Check basic properties of random matrix generation ... ok
Check some statical properties of Gaussian random matrix ... ok
Check some statical properties of sparse random matrix ... ok
sklearn.tests.test_random_projection.test_sparse_random_projection_transformer_invalid_density ... ok
sklearn.tests.test_random_projection.test_random_projection_transformer_invalid_input ... ok
sklearn.tests.test_random_projection.test_try_to_transform_before_fit ... ok
sklearn.tests.test_random_projection.test_too_many_samples_to_find_a_safe_embedding ... ok
sklearn.tests.test_random_projection.test_random_projection_embedding_quality ... ok
sklearn.tests.test_random_projection.test_SparseRandomProjection_output_representation ... ok
sklearn.tests.test_random_projection.test_correct_RandomProjection_dimensions_embedding ... ok
sklearn.tests.test_random_projection.test_warning_n_components_greater_than_n_features ... ok
Doctest: sklearn._NoseTester.test ... ok

======================================================================
FAIL: sklearn.metrics.tests.test_metrics.test_multilabel_representation_invariance
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/home/slave/virtualenvs/cpython-2.6/lib/python2.6/site-packages/nose/case.py", line 197, in runTest
   self.test(*self.arg)
 File "<https://jenkins.shiningpanda-ci.com/scikit-learn/job/python-2.6-numpy-1.3.0-scipy-0.7.2/ws/sklearn/metrics/tests/test_metrics.py",> line 947, in test_multilabel_representation_invariance
   % name)
 File "/home/slave/virtualenvs/cpython-2.6/lib/python2.6/site-packages/numpy/testing/utils.py", line 265, in assert_almost_equal
   raise AssertionError(msg)
AssertionError: 
Items are not equal:
jaccard_similarity_score failed representation invariance  between list of list of labels format and dense binary indicator format.
ACTUAL: 0.3562091503267974
DESIRED: 0.29738562091503268
raise AssertionError('\nItems are not equal:\njaccard_similarity_score failed representation invariance between list of list of labels format and dense binary indicator format.\n ACTUAL: 0.3562091503267974\n DESIRED: 0.29738562091503268')


======================================================================
FAIL: sklearn.metrics.tests.test_metrics.test_multilabel_jaccard_similarity_score
----------------------------------------------------------------------
Traceback (most recent call last):
 File "/home/slave/virtualenvs/cpython-2.6/lib/python2.6/site-packages/nose/case.py", line 197, in runTest
   self.test(*self.arg)
 File "<https://jenkins.shiningpanda-ci.com/scikit-learn/job/python-2.6-numpy-1.3.0-scipy-0.7.2/ws/sklearn/metrics/tests/test_metrics.py",> line 1103, in test_multilabel_jaccard_similarity_score
   assert_equal(1, jaccard_similarity_score(y1, y2, pos_label=10))
AssertionError: 1 != 0.0
raise self.failureException, \
         (None or '%r != %r' % (1, 0.0))


Name                                             Stmts   Miss  Cover   Missing
------------------------------------------------------------------------------
sklearn                                             34      4    88%   27, 52, 64-65
sklearn.__check_build                               18      3    83%   24, 45-46
sklearn.__check_build.setup                          9      2    78%   17-18
sklearn._build_utils                                17      4    76%   18, 22, 27-28
sklearn.base                                       136      3    98%   58, 76, 84
sklearn.cluster                                     11      0   100%   
sklearn.cluster._feature_agglomeration              22      2    91%   64, 70
sklearn.cluster.affinity_propagation_               90      6    93%   145-146, 167-169, 264
sklearn.cluster.dbscan_                             46      0   100%   
sklearn.cluster.hierarchical                       154      0   100%   
sklearn.cluster.k_means_                           381      5    99%   95, 360, 953, 1234, 1237
sklearn.cluster.mean_shift_                         78      5    94%   103, 120, 154-156
sklearn.cluster.setup                               18      2    89%   40-41
sklearn.cluster.spectral                           117     14    88%   140-142, 154, 253-256, 258-261, 408-411, 413-416, 442, 476
sklearn.covariance                                   6      0   100%   
sklearn.covariance.empirical_covariance_            62      0   100%   
sklearn.covariance.graph_lasso_                    201     18    91%   134, 156, 172-176, 202, 310, 335, 341, 344, 436-437, 480, 499-501, 503-505
sklearn.covariance.outlier_detection                38      2    95%   71, 100
sklearn.covariance.robust_covariance               211     19    91%   124, 134, 139-141, 147, 228, 322-323, 331, 380-386, 566, 574-579, 645
sklearn.covariance.shrunk_covariance_              127      5    96%   182, 184-188
sklearn.cross_validation                           428      1    99%   1294
sklearn.datasets                                    47      0   100%   
sklearn.datasets.base                              136     14    90%   455-464, 498-505
sklearn.datasets.california_housing                 37     24    35%   27-29, 64-101
sklearn.datasets.covtype                            50     23    54%   19-20, 69-78, 84-93, 100-104
sklearn.datasets.lfw                               157    135    14%   53-55, 66-105, 113-163, 178-208, 260-276, 294-334, 342, 402-429, 439
sklearn.datasets.mlcomp                             47     40    15%   11-13, 56-103
sklearn.datasets.mldata                             80      7    91%   15-19, 151-153
sklearn.datasets.olivetti_faces                     41     26    37%   32-35, 89-116
sklearn.datasets.samples_generator                 301     37    88%   116, 120, 123, 524, 573-593, 731, 1121-1124, 1276-1303
sklearn.datasets.setup                              14      2    86%   22-23
sklearn.datasets.species_distributions              72     55    24%   46-48, 69-82, 98-108, 125-135, 210-257
sklearn.datasets.svmlight_format                    96      6    94%   127, 238, 334, 339, 347-348
sklearn.datasets.twenty_newsgroups                 121     88    27%   71-92, 133-139, 143, 148-195, 225-271
sklearn.decomposition                                8      0   100%   
sklearn.decomposition.dict_learning                289     28    90%   87, 91-92, 305-306, 308, 417, 445, 450-451, 453, 476, 478, 481, 571, 581, 605, 647, 797, 1117-1133
sklearn.decomposition.factor_analysis               77      3    96%   138, 165-166
sklearn.decomposition.fastica_                     153     17    89%   79-83, 124-128, 239, 264, 271-275, 287-288, 312-314, 329
sklearn.decomposition.kernel_pca                    82      2    98%   172, 258
sklearn.decomposition.nmf                          202      8    96%   104, 254, 373, 393-395, 418, 540
sklearn.decomposition.pca                          153      5    97%   49, 62-63, 244, 329
sklearn.decomposition.sparse_pca                    59      0   100%   
sklearn.dummy                                      110      0   100%   
sklearn.ensemble                                    15      0   100%   
sklearn.ensemble.base                               25      1    96%   76
sklearn.ensemble.forest                            307     13    96%   77, 131-132, 148-149, 236, 306, 411, 446-449, 478, 593-594
sklearn.ensemble.gradient_boosting                 348     13    96%   53, 123, 188, 215, 368, 419, 546, 551, 595, 604, 607-608, 614
sklearn.ensemble.partial_dependence                159    104    35%   54, 56, 237-388
sklearn.ensemble.setup                              10      2    80%   16-17
sklearn.ensemble.weight_boosting                   272     24    91%   100-104, 143, 193, 230, 239-240, 379, 531-532, 595-596, 622, 675, 701-703, 806, 912, 987, 989, 1001, 1018
sklearn.externals                                    1      0   100%   
sklearn.externals.joblib                            10      0   100%   
sklearn.externals.joblib._compat                     4      2    50%   7-8
sklearn.externals.joblib.disk                       51     11    78%   28, 84-88, 94, 103-107
sklearn.externals.joblib.format_stack              227     46    80%   34-35, 50, 55, 63, 67-70, 133-135, 146, 148, 168-173, 193-197, 201-205, 208, 215-224, 246-247, 282-286, 299-300, 323, 345-346, 363-367, 372-375, 405, 412
sklearn.externals.joblib.func_inspect              117     12    90%   73, 98-102, 109-110, 138-139, 185, 224, 228
sklearn.externals.joblib.hashing                    91     18    80%   22, 54-55, 71, 88-99, 144, 155-159, 195
sklearn.externals.joblib.logger                     73     10    86%   29, 42, 68, 78, 94, 99, 115, 121, 138-139
sklearn.externals.joblib.memory                    235     20    91%   19-20, 58, 130, 159-160, 257, 289-290, 300, 353, 355, 375-376, 395-396, 406, 476, 522, 547
sklearn.externals.joblib.my_exceptions              42      3    93%   43, 70-71
sklearn.externals.joblib.numpy_pickle              170     20    88%   21-27, 86, 115, 122-125, 196-197, 243-246, 272-273, 290, 298
sklearn.externals.joblib.parallel                  219     21    90%   18-19, 28-29, 38-40, 53, 103, 123, 330-331, 398-400, 450, 456, 469-471, 477, 506
sklearn.externals.setup                              6      0   100%   
sklearn.externals.six                              170     64    62%   34-40, 54-56, 92-94, 107-115, 190, 195-201, 205-213, 228-230, 234-237, 240, 245, 260, 264, 272-284, 298-308, 319-320
sklearn.feature_extraction                           5      0   100%   
sklearn.feature_extraction.dict_vectorizer          98      3    97%   228, 246-247
sklearn.feature_extraction.hashing                  39      1    97%   97
sklearn.feature_extraction.image                   147      0   100%   
sklearn.feature_extraction.setup                    10      0   100%   
sklearn.feature_extraction.stop_words                1      0   100%   
sklearn.feature_extraction.text                    414     15    96%   104-105, 232, 400, 405-406, 595, 601, 767, 832, 891, 900, 941-944, 1049
sklearn.feature_selection                           13      0   100%   
sklearn.feature_selection.rfe                      102      7    93%   120, 129, 142, 149, 216, 219, 358
sklearn.feature_selection.selector_mixin            46      5    89%   45, 57, 88, 99-102
sklearn.feature_selection.univariate_selection     180      7    96%   226, 300, 315, 376, 382, 433, 584
sklearn.gaussian_process                             5      0   100%   
sklearn.gaussian_process.correlation_models         78     28    64%   48, 53, 91, 96, 129-148, 178-185, 221, 227, 271, 277
sklearn.gaussian_process.gaussian_process          335    105    69%   21, 309, 318, 320, 324, 329, 344, 349, 355, 361, 373-381, 426, 459-467, 495-520, 582-586, 597-598, 604-610, 616-623, 680-682, 688, 722-724, 742-744, 750-799, 812, 828, 834, 845, 848, 853-856, 868, 871, 876
sklearn.gaussian_process.regression_models          19      0   100%   
sklearn.grid_search                                227      6    97%   284, 369, 411, 440, 668-670
sklearn.hmm                                        452     34    92%   296-297, 402, 437-439, 521, 524, 695, 704-711, 737, 749, 800-801, 931, 995, 999, 1003, 1008, 1017, 1103, 1160, 1167, 1188, 1199-1201
sklearn.isotonic                                    54      1    98%   54
sklearn.kernel_approximation                       153      4    97%   247, 256, 454, 485
sklearn.lda                                         93     11    88%   91-95, 123, 130, 139, 141-142, 161
sklearn.linear_model                                14      0   100%   
sklearn.linear_model.base                          128      2    98%   264, 294
sklearn.linear_model.bayes                         126      8    94%   178-184, 210, 423
sklearn.linear_model.coordinate_descent            345     29    92%   138-139, 209-210, 262, 315, 513, 708-709, 730, 740-741, 766-771, 885, 888-890, 925, 980-982, 1192-1193, 1308-1309, 1363, 1382
sklearn.linear_model.least_angle                   362     18    95%   165-166, 285-286, 293-302, 403, 518, 787-790
sklearn.linear_model.logistic                       11      0   100%   
sklearn.linear_model.omp                           192     15    92%   87-88, 180-181, 279, 288, 376, 379, 384, 541-544, 547-548, 557
sklearn.linear_model.passive_aggressive             25      0   100%   
sklearn.linear_model.perceptron                      5      0   100%   
sklearn.linear_model.randomized_l1                 174      8    95%   69, 100, 122, 142, 573, 591, 600-601
sklearn.linear_model.ridge                         269     21    92%   134, 141-158, 511, 516, 537, 575-579, 591, 678, 885
sklearn.linear_model.setup                          19      2    89%   41-42
sklearn.linear_model.stochastic_gradient           309     13    96%   64-65, 136-137, 307-310, 350, 396-401, 833-836
sklearn.manifold                                     5      0   100%   
sklearn.manifold.isomap                             46      0   100%   
sklearn.manifold.locally_linear                    218     22    90%   45, 47, 147, 176, 271, 274, 283, 286, 289, 310, 333-334, 360, 384-389, 437, 482-483
sklearn.manifold.mds                               102     21    79%   79-80, 98-107, 122, 124-128, 229-233, 345, 378, 385-388
sklearn.manifold.spectral_embedding                144     17    88%   197-200, 270-282, 306, 387, 459
sklearn.metrics                                      8      0   100%   
sklearn.metrics.cluster                             14      0   100%   
sklearn.metrics.cluster.setup                       14      2    86%   22-23
sklearn.metrics.cluster.supervised                 110      0   100%   
sklearn.metrics.cluster.unsupervised                27      0   100%   
sklearn.metrics.metrics                            387     18    95%   139, 142, 669-671, 673-675, 681-683, 968-972, 1847-1848
sklearn.metrics.pairwise                           177      4    98%   173, 532, 636, 796
sklearn.metrics.scorer                              33      2    94%   72-74
sklearn.metrics.setup                               13      2    85%   20-21
sklearn.mixture                                      5      0   100%   
sklearn.mixture.dpgmm                              344     23    93%   211-216, 219, 222, 251, 266, 367-370, 460, 465, 502, 685, 692, 737-740
sklearn.mixture.gmm                                256     29    89%   244, 261-268, 299, 301, 303, 357-358, 419, 421, 440, 474, 566, 590, 619, 621, 624, 627, 637, 640, 645-648, 666
sklearn.multiclass                                 165      9    95%   53, 79, 126, 258, 277-281, 542
sklearn.naive_bayes                                124      6    95%   160, 230, 238, 246, 267, 440
sklearn.neighbors                                    7      0   100%   
sklearn.neighbors.base                             225      9    96%   65, 74, 90, 118, 122, 126, 152, 255, 463
sklearn.neighbors.classification                    62      0   100%   
sklearn.neighbors.graph                             10      0   100%   
sklearn.neighbors.nearest_centroid                  50      2    96%   90, 156
sklearn.neighbors.regression                        32      0   100%   
sklearn.neighbors.setup                             10      0   100%   
sklearn.neighbors.unsupervised                       7      0   100%   
sklearn.pipeline                                   138      6    96%   81, 91, 188, 272, 334, 341
sklearn.pls                                        202     24    88%   63-64, 98-99, 227, 231, 238, 242, 244, 247, 250, 366-372, 402-404, 834, 837, 842
sklearn.preprocessing                              389      5    99%   62, 430, 439, 456, 1027
sklearn.qda                                         75      0   100%   
sklearn.random_projection                          114      0   100%   
sklearn.semi_supervised                              2      0   100%   
sklearn.semi_supervised.label_propagation          112      3    97%   130, 135, 171
sklearn.setup                                       56      4    93%   68-70, 83-84
sklearn.svm                                          4      0   100%   
sklearn.svm.base                                   278      8    97%   129, 278, 307, 351, 494, 554, 624, 718
sklearn.svm.bounds                                  35      0   100%   
sklearn.svm.classes                                 24      0   100%   
sklearn.svm.setup                                   26      4    85%   54-55, 83-84
sklearn.tree                                         6      0   100%   
sklearn.tree.export                                 45      8    82%   71-76, 100, 127, 133
sklearn.tree.setup                                  14      2    86%   22-23
sklearn.tree.tree                                  180      5    97%   78, 215, 242, 246, 349
sklearn.utils                                      112      3    97%   321, 354, 358
sklearn.utils._csgraph                              21      3    86%   65-66, 69
sklearn.utils.arpack                               627    308    51%   307, 312, 315, 364-368, 429, 431, 433, 439-450, 453, 455, 463-497, 500, 503, 509, 516, 536, 542-543, 545-547, 553, 555, 578, 586, 629, 631, 633, 638-679, 682, 685, 691, 706, 718, 728, 734-735, 737, 739, 745-748, 771, 794-803, 810-834, 841-842, 848, 852, 859-875, 880, 886-887, 906, 933-945, 948-953, 963-984, 987, 990, 993-998, 1009, 1023, 1032-1046, 1184, 1186-1191, 1196, 1202, 1205, 1214-1253, 1423-1440, 1443, 1445-1450, 1455, 1462, 1470-1480, 1484, 1494-1495, 1499-1529, 1563-1599, 1603
sklearn.utils.bench                                  3      0   100%   
sklearn.utils.class_weight                          19      0   100%   
sklearn.utils.extmath                              134     11    92%   46-49, 54, 113-115, 190-192, 306, 369
sklearn.utils.fixes                                142     33    77%   25-26, 29, 33-38, 47, 72-73, 78-83, 93, 108-110, 116, 131, 141, 160, 167, 178, 196-197, 207-209, 217, 228-229, 231
sklearn.utils.graph                                 75      5    93%   52, 66, 72, 112, 131
sklearn.utils.multiclass                            20      0   100%   
sklearn.utils.setup                                 26      2    92%   70-71
sklearn.utils.sparsetools                            1      0   100%   
sklearn.utils.sparsetools.csgraph                   61     34    44%   17-19, 29, 33-34, 38-50, 54, 58-63, 67-71, 77-80
sklearn.utils.sparsetools.setup                     11      2    82%   16-17
sklearn.utils.validation                           105      1    99%   179
------------------------------------------------------------------------------
TOTAL                                            17190   1979    88%   
----------------------------------------------------------------------
Ran 1855 tests in 293.749s

FAILED (SKIP=16, failures=2)
<https://jenkins.shiningpanda-ci.com/scikit-learn/job/python-2.6-numpy-1.3.0-scipy-0.7.2/ws/sklearn/cluster/k_means_.py>:1161: RuntimeWarning: init_size=3 should be larger than k=8. Setting it to 3*k
 init_size=init_size)
[Parallel(n_jobs=1)]: Done   1 jobs       | elapsed:    0.0s
[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    0.0s finished
<https://jenkins.shiningpanda-ci.com/scikit-learn/job/python-2.6-numpy-1.3.0-scipy-0.7.2/ws/sklearn/qda.py>:158: RuntimeWarning: divide by zero encountered in log
 + np.log(self.priors_))
make: *** [test-coverage] Error 1
Build step 'Custom Python Builder' marked build as failure
Archiving artifacts
Skipping Cobertura coverage report as build was not UNSTABLE or better ...

Owner

arjoly commented May 6, 2013

I'm looking for the bug.

Owner

amueller commented on be842f2 May 6, 2013

did you see the jenkins failure?

Owner

arjoly replied May 6, 2013

Yes see the pr.
I will submit a patch soon.

Owner

arjoly commented May 6, 2013

Oddly with python 2.6 and numpy 1.3, I got :

In [1]: import numpy as np

In [2]: from __future__ import division

In [3]: a = np.array([2, 0, 0])

In [4]: b = np.array([4, 1, 0])

In [5]: a / b
Out[5]: array([ 0.5,  0. ,  0. ])

instead of

Out[5]: array([ 0.5,  0. ,  nan ])

patch is coming.

Owner

arjoly commented May 7, 2013

For reference, this is a new step to solve #558.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment