[MRG+1] Added override of fit_transform to LabelBinarizer #7670

kgilliam125 · 2016-10-14T16:37:41Z

Reference Issue

What does this implement/fix? Explain your changes.

Added an override of the fit_transform method to the LabelBinarizer class and provided a
new docstring based on the fit and transform docstrings.

Any other comments?

To keep the original behavior, I'm assuming I should call the base class
fit_transform method? Let me know if you want it to do something else.

Also, I haven't had a chance to run tests on the change yet, but will do that when
I get home this afternoon.

kgilliam125 · 2016-10-14T17:18:41Z

sklearn/preprocessing/label.py

@@ -352,8 +352,7 @@ def fit_transform(self, y):
        Y : numpy array or CSR matrix of shape [n_samples, n_classes]
            Shape will be [n_samples, 1] for binary problems.
        """
-        self.fit(y)
-        return self.transform(y)
+        return super(fit_transform, self).fit_transform(y)


I made a mistake here, it should be return super(LabelBinarizer, self).fit_transform(y) . I'm going to wait for feedback before committing the change to avoid polluting the commit history in case of additional issues.

Hi, I suppose this is the intended behaviour since the main issue was the doc string to be relevant to Label Binarizer. So it would probably be okay to make the change since it seems the cause of the failure. Also flake8 tests is failing. Maybe there is some pep issues too.

I'm not sure, but I think flake8 may fail if compile fails. I haven't used it very much. I get a connection error when I try to access the travis-ci report, so I can't look at it. That's what I get for making a change directly on GitHub.

Another potential issue is that I'm not directly matching the function signature from the base class in LabelBinarizer. The code should still work without doing that, but there are varied opinions on whether or not it's an acceptable thing to do. I'll leave it to you all to decide what you want me to do.

amueller · 2016-10-14T20:13:38Z

you should certainly be able to view the errors here: https://travis-ci.org/scikit-learn/scikit-learn/jobs/167694801

amueller · 2016-10-14T20:14:52Z

sklearn/preprocessing/label.py

+        Y : numpy array or CSR matrix of shape [n_samples, n_classes]
+            Shape will be [n_samples, 1] for binary problems.
+        """
+        return super(fit_transform, self).fit_transform(y)


super(LabelBinarizer, self)

You could also return(self.fit(y).transform(y)) though, if you like.

amueller · 2016-10-14T20:16:07Z

do you want to do the same for the other classes in that file?

kgilliam125 · 2016-10-14T20:18:45Z

@amueller Sure, I'll update the other class as well. Did you have an opinion about not matching the method signature to the base class?

I think I had a firewall issue with travis-ci on my side. I'm on a different network now and it's working fine. Thanks!

amueller · 2016-10-14T20:21:39Z

Oh you mean because the base class has fit_transform(X, y=None)? I would argue that's really odd and your change is good. Please add an entry into whatsnew that documents your change.

kgilliam125 · 2016-10-14T20:23:48Z

No problem, I'll leave it as is. I surfed through a few StackOverflow articles where people were arguing about it.

Do you want me to standardize the fit_transform overrides in this file? Since they just do self.fit(y).transform(y) it seems to me that calling the base class method would be more maintainable if the change is just for a docstring override.

amueller · 2016-10-14T20:28:56Z

I'm not entirely sure I understand your last point. Why is calling the base class more maintainable? I don't have strong feelings either way, though.

kgilliam125 · 2016-10-14T20:34:38Z

The only angle I had on that, was that if there were a bug later on with fit_transform, then using super... to call the function would allow the bug to be fixed in one place rather than the various places where self.fit(...).transform(...) has been called. Granted, since the base class fit_transform doesn't do very much outside of calling fit and transform, it probably doesn't matter.

Right now, it's probably best to go with using self.fit(...).transform(...) throughout so that the code stays consistent.

kgilliam125 · 2016-10-14T20:58:58Z

Both LabelEncoder and MultiLabelBinarizer already had overrides for fit_transform, so I left them unchanged.

amueller · 2016-10-14T21:00:10Z

great thanks

dalmia · 2016-11-12T00:58:24Z

@kgilliam125 Are you working on this now?

kgilliam125 · 2016-11-12T02:42:28Z

@dalmia From my end this one is finished unless I have any more feedback. I'm just waiting for it to be approved and merged.

jnothman · 2016-11-16T11:36:04Z

sklearn/preprocessing/label.py

+        Parameters
+        ----------
+        y : numpy array or sparse matrix of shape (n_samples,) or
+            (n_samples, n_classes) Target values. The 2-d matrix should only


The first line after : is treated specially. You can't continue it here at this indentation.

You can drop "numpy"

I think I might be missing what you mean... the indent level looks the same as in the other function definitions.

I'll drop numpy though.

the first line is treated specially. You've let it wrap onto the second. I think the first line can be continued, but not like this. Try rendering the docs and seeing how it looks

jnothman · 2016-11-16T11:37:02Z

sklearn/preprocessing/label.py

+        ----------
+        y : numpy array or sparse matrix of shape (n_samples,) or
+            (n_samples, n_classes) Target values. The 2-d matrix should only
+            contain 0 and 1, represents multilabel classification. Sparse


", represents" -> ", and represents"

jnothman · 2016-11-25T01:00:54Z


./sklearn/preprocessing/label.py:315:80: E501 line too long (82 > 79 characters)
        y : array or sparse matrix of shape (n_samples,) or (n_samples, n_classes)
                                                                               ^
./sklearn/preprocessing/label.py:315:83: W291 trailing whitespace
        y : array or sparse matrix of shape (n_samples,) or (n_samples, n_classes)
                                                                                  ^
./sklearn/preprocessing/label.py:316:80: E501 line too long (85 > 79 characters)
            Target values. The 2-d matrix should only contain 0 and 1, and represents
                                                                               ^
./sklearn/preprocessing/label.py:316:86: W291 trailing whitespace
            Target values. The 2-d matrix should only contain 0 and 1, and represents
                                                                                     ^
./sklearn/preprocessing/label.py:317:80: E501 line too long (87 > 79 characters)
            multilabel classification. Sparse matrix can be CSR, CSC, COO, DOK, or LIL.
                                                                               ^
./sklearn/preprocessing/label.py:334:80: E501 line too long (82 > 79 characters)
        y : array or sparse matrix of shape (n_samples,) or (n_samples, n_classes)
                                                                               ^
./sklearn/preprocessing/label.py:334:83: W291 trailing whitespace
        y : array or sparse matrix of shape (n_samples,) or (n_samples, n_classes)
                                                                                  ^

kgilliam125 · 2016-11-25T01:11:41Z

What about

y : array or sparse matrix
    Target Values. ...
    ... or LIL. Shape must be (n_samples,) or
    (n_samples, n_classes).

jnothman · 2016-11-25T01:16:13Z

I'm okay with that. Elsewhere I think we've used:

        y : array or sparse matrix of shape (n_samples,) or
            (n_samples, n_classes)
        Description

for wrapping the type spec. I can't remember if it renders correctly, but it's not hard to check.

kgilliam125 · 2016-11-25T02:17:10Z

Didn't render correctly for me; it wrapped to the next line like before. I can make it work using a line continuation though.

    y : array or sparse matrix of shape (n_classes,) \
or (n_samples, n_classes)
        Target Values. ...

I don't think that will cause issues with Python since it's in a docstring. Thoughts?

jnothman · 2016-11-25T02:36:58Z

If that works in rendering I'm fine with it.

…

On 25 November 2016 at 13:17, Kyle Gilliam ***@***.***> wrote: Didn't render correctly for me; it wrapped to the next line like before. I can make it work using a line continuation though. y : array or sparse matrix of shape (n_classes,) \ or (n_samples, n_classes) Target Values. ... I don't think that will cause issues with Python since it's in a docstring. Thoughts? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#7670 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz66IOIPEK-QsBD2gJS3Li6DUohs2xks5rBkUngaJpZM4KXNJX> .

jnothman · 2016-11-30T03:05:22Z

sklearn/preprocessing/label.py

-            contain 0 and 1, represents multilabel classification. Sparse
-            matrix can be CSR, CSC, COO, DOK, or LIL.
+        y : array or sparse matrix of shape (n_samples,) or \
+(n_samples, n_classes)


please indent this. It'll render just the same.

jnothman · 2016-11-30T03:05:59Z

sklearn/preprocessing/label.py

+        Parameters
+        ----------
+        y : array or sparse matrix of shape (n_samples,) or \
+(n_samples, n_classes)


please indent this

iirc it will actually change the rendering, but I'm still +1 on indenting. We should ask at numpydoc how this is supposed to work ^^

Agree that it looks better if it's at the same indent level as the preceding text. My concern is that it renders with spaces between or and (n_samples, n_classes). I'll check it when I get home tonight.

jnothman · 2016-11-30T03:06:14Z

Otherwise LGTM

amueller

I'm ok with how it is right now though I'd appreciate the cosmetic changes I suggested 👍

amueller · 2016-11-30T20:37:17Z

sklearn/preprocessing/label.py

+        """Fit label binarizer and transform multi-class labels to binary
+        labels.
+
+        The output of transform is sometimes referred to by some authors as


I think you can remove the "by some authors". That seems implicit in "sometimes"...

amueller · 2016-11-30T20:37:49Z

sklearn/preprocessing/label.py

+
+        Returns
+        -------
+        Y : array or CSR matrix of shape [n_samples, n_classes]


you used tuples / parentheses everywhere else.

I standardized my usage to always describe shapes using brackets. There was one other place (the fit function) in LabelBinarizer that used parens in a similar fashion, so I changed it to use brackets as well.

jnothman · 2016-11-30T23:00:18Z

additional spaces should be insignificant in rendering in either TeX or HTML.

…

On 1 December 2016 at 08:50, Kyle Gilliam ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In sklearn/preprocessing/label.py <#7670>: > @@ -304,18 +304,41 @@ def fit(self, y): self.classes_ = unique_labels(y) return self + def fit_transform(self, y): + """Fit label binarizer and transform multi-class labels to binary + labels. + + The output of transform is sometimes referred to by some authors as + the 1-of-K coding scheme. + + Parameters + ---------- + y : array or sparse matrix of shape (n_samples,) or \ +(n_samples, n_classes) Agree that it looks better if it's at the same indent level as the preceding text. My concern is that it renders with spaces between or and (n_samples, n_classes). I'll check it when I get home tonight. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#7670>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz647KpV8vYyNhC5Fri7DPA7w4jpU5ks5rDe-rgaJpZM4KXNJX> .

jnothman · 2016-12-05T03:25:21Z

So is that LGTM from you, @amueller?

amueller · 2016-12-06T21:02:42Z

yeah LGTM

…rn#7670) * Added override of fit_transform to LabelBinarizer * Updated fit_transform to call base class method * Changed fit_transform for code consistency * Removed whitespace on blank lines * Fixed line wrap issues for doc gen. * Used line cont. for term defs * Standardized bracket usage, fixed line cont. indent level

kgilliam125 added 2 commits October 14, 2016 10:29

Added override of fit_transform to LabelBinarizer

9c7001d

Updated fit_transform to call base class method

655e00c

kgilliam125 commented Oct 14, 2016

View reviewed changes

amueller reviewed Oct 14, 2016

View reviewed changes

Changed fit_transform for code consistency

2b5bb7b

amueller changed the title ~~[WIP] Added override of fit_transform to LabelBinarizer~~ [MRG] Added override of fit_transform to LabelBinarizer Oct 14, 2016

Removed whitespace on blank lines

0b44729

jnothman added the Documentation label Nov 14, 2016

jnothman requested changes Nov 16, 2016

View reviewed changes

jnothman reviewed Nov 16, 2016

View reviewed changes

Fixed line wrap issues for doc gen.

a203c4a

jnothman approved these changes Nov 25, 2016

View reviewed changes

Used line cont. for term defs

2f7dd7f

jnothman requested changes Nov 30, 2016

View reviewed changes

jnothman changed the title ~~[MRG] Added override of fit_transform to LabelBinarizer~~ [MRG+1] Added override of fit_transform to LabelBinarizer Nov 30, 2016

amueller approved these changes Nov 30, 2016

View reviewed changes

Standardized bracket usage, fixed line cont. indent level

c9d75d8

amueller merged commit d39c273 into scikit-learn:master Dec 6, 2016

qinhanmin2014 mentioned this pull request Nov 11, 2017

LabelBinarizer fit_transform docstring is confusing #7238

Closed

Uh oh!

[MRG+1] Added override of fit_transform to LabelBinarizer #7670

[MRG+1] Added override of fit_transform to LabelBinarizer #7670

Uh oh!

Conversation

kgilliam125 commented Oct 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kgilliam125 Oct 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Oct 14, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Oct 14, 2016

Uh oh!

kgilliam125 commented Oct 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Oct 14, 2016

Uh oh!

kgilliam125 commented Oct 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Oct 14, 2016

Uh oh!

kgilliam125 commented Oct 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kgilliam125 commented Oct 14, 2016

Uh oh!

amueller commented Oct 14, 2016

Uh oh!

dalmia commented Nov 12, 2016

Uh oh!

kgilliam125 commented Nov 12, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Nov 25, 2016

Uh oh!

kgilliam125 commented Nov 25, 2016

Uh oh!

jnothman commented Nov 25, 2016

Uh oh!

kgilliam125 commented Nov 25, 2016

Uh oh!

jnothman commented Nov 25, 2016 via email

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Nov 30, 2016

Uh oh!

amueller left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

kgilliam125 commented Oct 14, 2016 •

edited

Loading

kgilliam125 Oct 14, 2016 •

edited

Loading

kgilliam125 commented Oct 14, 2016 •

edited

Loading

kgilliam125 commented Oct 14, 2016 •

edited

Loading

kgilliam125 commented Oct 14, 2016 •

edited

Loading