[MRG+1] Sparse multilabel target support in metrics #3395

Merged
merged 3 commits into from Aug 12, 2014

Conversation

Projects
None yet
6 participants
Owner

jnothman commented Jul 16, 2014

This introduces a series of helper classes that abstract away aggregation over multilabel structures. This enables efficient calculation in sparse and dense binary indicator matrices, while maintaining support for the deprecated sequences format.

Owner

arjoly commented Jul 16, 2014

great ! I will review this pr :-) when I ditch some time.

@jnothman jnothman changed the title from [MRG] Sparse multilabel target support to [MRG] Sparse multilabel target support in metrics Jul 16, 2014

sklearn/metrics/metrics.py
+ def _weight(self, X):
+ print(X)
+ if self.sample_weight is not None:
+ print(X * self.sample_weight[:, None])
@arjoly

arjoly Jul 16, 2014

Owner

They change the travis output in an interesting way.

@jnothman

jnothman Jul 16, 2014

Owner

Fixed and hidden from history!

sklearn/metrics/metrics.py
+ return self._count_nnz(self.y_pred, axis, self.sample_weight)
+
+ @staticmethod
+ def _count_nnz(X, axis=None, sample_weight=None):
@vene

vene Jul 16, 2014

Owner

The axis=0/1 cases assume that X is of a specific sparse format, right? I guess this should be documented, even if they're private. Your sparse-fu is way over my head.

@arjoly

arjoly Jul 16, 2014

Owner

The sparse format is enforce in the constructor

@vene

vene Jul 16, 2014

Owner

Oh, right. I'd still argue it's easy to miss: when reading scikit-learn code I usually ignore constructors assuming they don't change the input.

@jnothman

jnothman Jul 17, 2014

Owner

Done. And fu is explained

Owner

vene commented Jul 16, 2014

Travis tests are not really failing here, it's just that the output gets truncated. At the very least they work on my box.

sklearn/metrics/metrics.py
+ if labels is None:
+ labels = unique_labels(y_true, y_pred)
+ if binarize:
+ binarizer = MultiLabelBinarizer([labels])
@vene

vene Jul 16, 2014

Owner

This can (and I guess should) use the sparse_output=True param which has been merged to master in the meanwhile.

@jnothman

jnothman Jul 16, 2014

Owner

Hmm... currently the binarize option isn't used (or tested, indeed), but you're right that it would better produce sparse output. I can just remove it. Or as @arjoly suggests, get rid of _SequencesMultilabelHelper and use it always. Perhaps that deserves a benchmark.

sklearn/metrics/metrics.py
+ if sample_weight is None:
+ return X.nnz
+ else:
+ return (np.diff(X.indptr) * sample_weight).sum()
@arjoly

arjoly Jul 16, 2014

Owner

np.dot(np.diff(X.indptr), sample_weight)?

sklearn/metrics/metrics.py
+ weights = np.repeat(sample_weight, np.diff(X.indptr))
+ return np.bincount(X.indices, minlength=X.shape[1],
+ weights=weights)
+ raise ValueError('Unsupported axis: {0}'.format(axis))
@arjoly

arjoly Jul 16, 2014

Owner

Nitpick, I would do:

else:
    raise ValueError(...)
@arjoly

arjoly Jul 16, 2014

Owner

I see other places in the code that could be re-worte with an else.

sklearn/metrics/metrics.py
+ self.shape = y_true.shape
+ self.sample_weight = sample_weight
+
+ def _weight(self, X):
@arjoly

arjoly Jul 16, 2014

Owner

X doesn't seem to be appropriate in this case.

@jnothman

jnothman Jul 16, 2014

Owner

I have used X elsewhere here to represent a matrix. I'm happy to call it y or m, etc. Choose me a letter!

@arjoly

arjoly Jul 17, 2014

Owner

What do you think of Y to recall that this is an output matrix?

@vene

vene Jul 17, 2014

Owner

I would have suggested Y as well

sklearn/metrics/metrics.py
+ return self._weight(self.y_pred).sum(axis)
+
+
+class _SequencesMultilabelHelper(object):
@arjoly

arjoly Jul 16, 2014

Owner

Is there a point of not converting sequences of sequences to sparse matrices?

sklearn/metrics/tests/test_metrics.py
+
+
+def test_multilabel_helper():
+ return
@vene

vene Jul 16, 2014

Owner

This test passes just fine, you don't need to use such sneaky techniques to trick us :P

@jnothman

jnothman Jul 16, 2014

Owner

Lol. I'm glad you're in good spirits :) I seem to have had one I closed when pushing.

sklearn/metrics/tests/test_metrics.py
+ lb.fit([labels])
+ y_true_bi = lb.transform(y_true_ll)
+ y_pred_bi = lb.transform(y_pred_ll)
+ y_true_sp = sp.csr_matrix(y_true_bi)
@vene

vene Jul 16, 2014

Owner

Maybe make the test use the binarizer sparse output, to cover that code branch in an end-to-end way?

@jnothman

jnothman Jul 16, 2014

Owner

This test is clearly far from an end-to-end test. It's testing a helper.

sklearn/metrics/tests/test_metrics.py
@@ -1707,37 +1712,16 @@ def test_multilabel_representation_invariance():
if isinstance(metric, partial):
metric.__module__ = 'tmp'
metric.__name__ = name
- # Check warning for sequence of sequences
- measure = assert_warns(DeprecationWarning, metric, y1, y2)
@vene

vene Jul 16, 2014

Owner

This warning is no longer tested for

Coverage Status

Coverage decreased (-0.1%) when pulling f6bb3d9 on jnothman:sparse_multi_metrics into 2e7ef3c on scikit-learn:master.

Owner

jnothman commented Jul 17, 2014

A few comments:

  • There may be no benefit in handling the dense case at all. Maybe we should just convert everything to sparse matrices, but even then it is useful to have a helper to give names .
  • Perhaps this is worth benchmarking; or perhaps metric calculation is so small compared to other costs that we don't care. Is memory an issue here?
  • The sparse matrix code is very easy to adapt for either CSC or CSR (but not a mixture): just swap the axis argument from 1 to 0 and vice-versa. LabelBinarizer and MultiLabelBinarizer output CSR, while OvR will probably output CSC. So one of the arguments will usually need to be converted.
Owner

jnothman commented Jul 17, 2014

FWIW, this is benchmark without turning everything into sparse matrix:

$ benchmarks/bench_multilabel_metrics.py --samples 10000 --classes 15 --density .2
Metric             csc       csr     dense sequences
accuracy         0.018     0.011     0.027     0.087
f1               0.023     0.011     0.051     0.137
f1-by-sample     0.032     0.012     0.056     0.105
hamming          0.027     0.013     0.047     0.082
jaccard          0.018     0.012     0.027     0.109

This is with:

$ benchmarks/bench_multilabel_metrics.py --samples 10000 --classes 15 --density .2
Metric             csc       csr     dense sequences
accuracy         0.018     0.011     0.072     0.293
f1               0.023     0.011     0.086     0.298
f1-by-sample     0.021     0.012     0.092     0.295
hamming          0.023     0.016     0.089     0.292
jaccard          0.022     0.013     0.067     0.310

We're dealing with small numbers apart from the sequences case, which is being deprecated, but is substantially faster without binarizing, so seeing as the implementation is here and tested, we might as well make it fast until deprecation is complete. Making dense data sparse adds ~.04s, but we're currently dealing with density settings that may favour sparse. We could experiment with varying those parameters, but I don't see the point.

Owner

arjoly commented Jul 17, 2014

Thanks for the benchmark !

Owner

vene commented Jul 17, 2014

Thanks Joel,
Looking at the benchmark, my thoughts re: converting to sparse are:

  • until complete deprecation, we should keep the sequences helper instead of converting.
  • If this is merged at the same time (rather, in the same release) as #3276 which makes LabelEncoder output sparse by default, we can do away with the dense helper and convert to sparse, since it doesn't seem to make a difference.
Owner

jnothman commented Jul 17, 2014

If I understand the state of play, #3276 only makes LabelBinarizer sparse
by default for OvR, which is irrelevant here, because by the time it gets
here the Binarization is already inverted. Still, I'm happy to remove
direct handling of dense matrices here if there's another +1 (and maybe
with another benchmark for a positive indicator density where dense is
appropriate).

On 17 July 2014 17:08, Vlad Niculae notifications@github.com wrote:

Thanks Joel,
Looking at the benchmark, my thoughts re: converting to sparse are:

  • until complete deprecation, we should keep the sequences helper
    instead of converting.
  • If this is merged at the same time (rather, in the same release) as
    #3276 #3276 which
    makes LabelEncoder output sparse by default, we can do away with the dense
    helper and convert to sparse, since it doesn't seem to make a difference.


Reply to this email directly or view it on GitHub
#3395 (comment)
.

Owner

arjoly commented Jul 19, 2014

Still, I'm happy to remove
direct handling of dense matrices here if there's another +1 (and maybe
with another benchmark for a positive indicator density where dense is
appropriate).

+1 for a benchmark !

Owner

GaelVaroquaux commented Jul 19, 2014

Beautiful implementation of a generic pattern. It certainly makes the code in metrics much more readable.

However, I am a bit worried that such patterns require some learning to be able to read the codebase, and will make it harder for people without a lot of expertise to maintain the codebase.

My gut feeling, and it's only a gut feeling, is that we could try to have a set of functions that implement the methods that you created, but as functions, not as methods. This means that the routing of the genericity would be done inside the function. I find it hard to know beforehand if it is actually going to result in more readable code or not. Would you bare with me, and try to implement this approach? Maybe in a separate PR to compare (I don't know if the separate PR is a good or a bad idea).

Owner

jnothman commented Jul 20, 2014

As an intuition, as long as we are handling formats that require
non-trivial type detection (as performed by type_of_target), we either add
what may be substantial overhead in doing that detection in each function,
or we pass it to each function call which I think will be much less pretty.
If you still want me to implement it, I'll try find time to do so. The
current proposal is nice in ensuring that checking type and ensuring the
correct format (i.e. CSR) is done once.

The other option that avoids a helper class is to put all the metric
implementations into polymporphic class hierarchy, but I don't think this
is really worth considering.

On 20 July 2014 02:32, Gael Varoquaux notifications@github.com wrote:

Beautiful implementation of a generic pattern. It certainly makes the
code in metrics much more readable.

However, I am a bit worried that such patterns require some learning to be
able to read the codebase, and will make it harder for people without a lot
of expertise to maintain the codebase.

My gut feeling, and it's only a gut feeling, is that we could try to have
a set of functions that implement the methods that you created, but as
functions, not as methods. This means that the routing of the genericity
would be done inside the function. I find it hard to know beforehand if it
is actually going to result in more readable code or not. Would you bare
with me, and try to implement this approach? Maybe in a separate PR to
compare (I don't know if the separate PR is a good or a bad idea).


Reply to this email directly or view it on GitHub
#3395 (comment)
.

Owner

jnothman commented Jul 20, 2014

I think 3 classes with 2 out of 3 filled may make for an appropriate dense vs convert-to-sparse benchmark:

$ benchmarks/bench_multilabel_metrics.py --samples 10000 --classes 3 --density .7
Metric           dense     as csr
accuracy         0.006     0.030
f1               0.012     0.040
f1-by-sample     0.014     0.029
hamming          0.012     0.028
jaccard          0.008     0.028

This says nothing about memory, but in time the conversion has a cost, but runtime is still very small, so I could consider removing dense support.

@GaelVaroquaux, I could convert everything to sparse matrices and simplify the code a lot (unless we wanted a similar polymorphism for the multiclass case), particularly in using functions rather than methods of a helper class. This would mean slower sequences of sequences support, as shown above, but perhaps that is incentive for people to heed the DeprecationWarning.

Owner

jnothman commented Jul 20, 2014

@arjoly, damn your cruel refactoring...

Owner

jnothman commented Jul 20, 2014

Hopefully I correctly edited that rebase

Coverage Status

Coverage increased (+0.01%) when pulling 5a7c600 on jnothman:sparse_multi_metrics into 41d02e0 on scikit-learn:master.

Owner

arjoly commented Jul 20, 2014

sorry :-(

Owner

arjoly commented Jul 20, 2014

This would mean slower sequences of sequences support, as shown above, but perhaps that is incentive for people to heed the DeprecationWarning.

I am fine with slower sequence of sequence.

Owner

jnothman commented Jul 20, 2014

@GaelVaroquaux, I think you'll much prefer this version where everything is calculated over CSR matrices... I am only concerned that the + and - and .multiply operations are a bit opaque, and really stand in for np.logcal_* functions not supported by scipy.sparse until recently.

Coverage Status

Coverage decreased (-0.0%) when pulling cdcea67 on jnothman:sparse_multi_metrics into 8dab222 on scikit-learn:master.

sklearn/metrics/classification.py
@@ -90,6 +95,38 @@ def _check_clf_targets(y_true, y_pred):
return y_type, y_true, y_pred
+def _multilabel_to_csr(y_true, y_pred, y_type, labels=None):
@arjoly

arjoly Jul 20, 2014

Owner

I have the feeling that this could be merged with _check_clf_target

@jnothman

jnothman Jul 20, 2014

Owner

Yes, I'm somewhat ambivalent to this. The label ordering would need to be handled specially.

@arjoly

arjoly Jul 20, 2014

Owner

Would there be metrics in classification.py without sparse input support?

@jnothman

jnothman Jul 21, 2014

Owner

All multilabel metrics will support sparse input. There is no reason not to.

What I meant above is that I can put this into _check_clf_target, but _check_clf_target will need to take a new labels (or label_order) parameter for binarization (until sequences are deprecated)

sklearn/metrics/metrics.py
- DeprecationWarning)
+warnings.warn("sklearn.metrics.metrics is deprecated and will be removed in "
+ "0.18. Please import from sklearn.metrics",
+ DeprecationWarning)
@arjoly

arjoly Jul 20, 2014

Owner

What has changed here?

@jnothman

jnothman Jul 20, 2014

Owner

Oh, nothing relevant to this PR. PEP8 indentation. I should revert this.

@arjoly

arjoly Jul 20, 2014

Owner

No, it's good. Sorry. I had a hard time trying to figure what was the difference.
Thanks for pep8 :-)

Owner

arjoly commented Jul 20, 2014

I am only concerned that the + and - and .multiply operations are a bit opaque, and really stand in for np.logcal_* functions not supported by scipy.sparse until recently.

This would be nice to add some comment about it in classification.py (developper comment)

@@ -644,6 +644,9 @@ def test_multilabel_representation_invariance():
y1_binary_indicator = lb.transform(y1)
y2_binary_indicator = lb.transform(y2)
+ y1_sparse_indicator = sp.coo_matrix(y1_binary_indicator)
+ y2_sparse_indicator = sp.coo_matrix(y2_binary_indicator)
@arjoly

arjoly Jul 20, 2014

Owner

Do we want to test other matrices here?

@@ -636,7 +637,6 @@ def test_multilabel_representation_invariance():
y2_shuffle = [shuffled(x) for x in y2]
# Let's have redundant labels
- y1_redundant = [x * rng.randint(1, 4) for x in y1]
y2_redundant = [x * rng.randint(1, 4) for x in y2]
@arjoly

arjoly Jul 20, 2014

Owner

Should we keep this line?

@arjoly

arjoly Jul 21, 2014

Owner

Sorry, lagging brain.

@jnothman

jnothman Jul 21, 2014

Owner

Indeed, from the reviews I've been seeing, it seems the sprint was very
exhausting. Go take a long nap!

On 21 July 2014 17:09, Arnaud Joly notifications@github.com wrote:

In sklearn/metrics/tests/test_common.py:

@@ -636,7 +637,6 @@ def test_multilabel_representation_invariance():
y2_shuffle = [shuffled(x) for x in y2]

 # Let's have redundant labels
  • y1_redundant = [x * rng.randint(1, 4) for x in y1]
    y2_redundant = [x * rng.randint(1, 4) for x in y2]

Sorry, lagging brain.


Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/3395/files#r15156785.

Owner

arjoly commented Jul 20, 2014

The code feels a lot simpler. I like that new version !

sklearn/metrics/classification.py
+ if y_type.startswith('multilabel'):
+ y_true, y_pred = _multilabel_to_csr(y_true, y_pred, y_type,
+ labels=classes)
+ return (y_true - y_pred).nnz / y_true.shape[0] / y_true.shape[1]
@arjoly

arjoly Jul 20, 2014

Owner

count_nonzero(y_true - y_pred, axis=None) / np.product(y_true.shape)?

@jnothman

jnothman Jul 20, 2014

Owner

If you rather it.

Owner

arjoly commented Jul 20, 2014

Given that we do everything with count_nonzero + and - and .multiply. Does it have a computational cost? Using these simple operation, it seems we generate dense sparse matrices.

Owner

jnothman commented Jul 20, 2014

What do you mean by dense sparse matrices? We don't densify the matrices: everything is O(n_nonzero) and vectorized. Calling the function has a cost, but it's negligible.

Owner

arjoly commented Jul 20, 2014

I am probably over-optimizing. Thanks for this pr. This is very good stuff :-)

Owner

jnothman commented Jul 21, 2014

I've fixed those things up, @arjoly, and rolled _multilabel_to_csr into _check_clf_targets although this means an unnecessary call to unique_labels for sequences of sequences, which I have renamed (given your refactor) to _check_targets. WDYT?

Owner

jnothman commented Jul 21, 2014

Rebased.

Owner

arjoly commented Jul 21, 2014

# Master
(sklearn) ± python benchmarks/bench_multilabel_metrics.py --classes 4000 --samples 20000 --density 0.01
Metric           dense sequences
accuracy        20.851     0.814
f1              45.332    15.495
f1-by-sample    45.069    15.296
hamming         39.844     1.144
jaccard         21.153     0.913


# PR
(sklearn) ± python benchmarks/bench_multilabel_metrics.py --classes 4000 --samples 20000 --density 0.01
Metric             csc       csr     dense sequences
accuracy         0.261     0.084    26.684     4.477
f1               0.335     0.162    25.658     4.705
f1-by-sample     0.292     0.111    25.846     4.171
hamming          0.292     0.116    26.862     4.189
jaccard          0.332     0.146    26.152     4.111

Performance looks great ! Through I have a high discrepancy for some metrics between master and this pr.

Owner

arjoly commented Jul 21, 2014

Awesome pr !!! Could you update the docstrings and the narrative doc to highlight your work?

Owner

arjoly commented Jul 21, 2014

The extra call for unique labels is fine to me.

Owner

jnothman commented Jul 21, 2014

Could you update the docstrings and the narrative doc to highlight your work?

Fair point. I'd better go looking for things to change...

Owner

jnothman commented Jul 21, 2014

Do you think each docstring needs to specify "or sparse/dense label indicator matrix"?

Owner

arjoly commented Jul 21, 2014

I would do something in the spirit of what we have done for sparse one versus rest.

Owner

jnothman commented Jul 21, 2014

I see there you use {array-like, sparse matrix}

Currently, though, we say "array-like or label indicator matrix". What we want to say is something like: "1d array-like, or label indicator array / sparse matrix". I.e. "{array-like 1d,array of 1s and 0s,sparse matrix of 1s}"

Coverage Status

Coverage increased (+0.0%) when pulling 744a7ad on jnothman:sparse_multi_metrics into 0d57c23 on scikit-learn:master.

Owner

arjoly commented Jul 21, 2014

code is awesome !

Currently, though, we say "array-like or label indicator matrix". What we want to say is something like: "1d array-like, or label indicator array / sparse matrix". I.e. "{array-like 1d,array of 1s and 0s,sparse matrix of 1s}"

If it holds on 80 character and contains shape, I am +1 for improved doctring.

Owner

jnothman commented Jul 24, 2014

Something else I now realise is missing here is sparse support in LRAP

Owner

arjoly commented Jul 25, 2014

Something else I now realise is missing here is sparse support in LRAP

If you are at it, I would be happy to see sparse input support for lrap.

@jnothman jnothman referenced this pull request Jul 28, 2014

Open

Add sample weight support to more metrics #3450

6 of 7 tasks complete
Owner

jnothman commented Jul 30, 2014

Pushed changes to support sparse matrix in LRAP, and to improve documentation.

Owner

jnothman commented Jul 30, 2014

Assuming Travis is happy, I think this is where we want it to be. Votes for merge? @arjoly? @GaelVaroquaux?

Coverage Status

Coverage increased (+0.0%) when pulling 15be75e on jnothman:sparse_multi_metrics into 1b2833a on scikit-learn:master.

Owner

arjoly commented Jul 30, 2014

Thanks for the lrap metric!!!

Owner

arjoly commented Jul 30, 2014

You get my +1

@jnothman jnothman changed the title from [MRG] Sparse multilabel target support in metrics to [MRG+1] Sparse multilabel target support in metrics Jul 30, 2014

Owner

jnothman commented Jul 30, 2014

Thanks @arjoly

Owner

arjoly commented Aug 4, 2014

A last reviewer ?

Owner

arjoly commented Aug 4, 2014

It should probably be updated to take into account the sample weight support in the jaccard metric.

Owner

jnothman commented Aug 4, 2014

It should probably be updated to take into account the sample weight support in the jaccard metric.

What's to update?

Owner

arjoly commented Aug 5, 2014

Apparently nothing, sorry. I should have checked the implementation.

Owner

arjoly commented Aug 11, 2014

A last reviewer? (ping randomly @ogrisel, @vene )

@larsmans larsmans merged commit 15be75e into scikit-learn:master Aug 12, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
Owner

arjoly commented Aug 12, 2014

Thanks @larsmans and @jnothman !!! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment