Skip to content

Conversation

kgilliam125
Copy link
Contributor

@kgilliam125 kgilliam125 commented Oct 14, 2016

Reference Issue

#7238

What does this implement/fix? Explain your changes.

Added an override of the fit_transform method to the LabelBinarizer class and provided a
new docstring based on the fit and transform docstrings.

Any other comments?

To keep the original behavior, I'm assuming I should call the base class
fit_transform method? Let me know if you want it to do something else.

Also, I haven't had a chance to run tests on the change yet, but will do that when
I get home this afternoon.

@@ -352,8 +352,7 @@ def fit_transform(self, y):
Y : numpy array or CSR matrix of shape [n_samples, n_classes]
Shape will be [n_samples, 1] for binary problems.
"""
self.fit(y)
return self.transform(y)
return super(fit_transform, self).fit_transform(y)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a mistake here, it should be return super(LabelBinarizer, self).fit_transform(y) . I'm going to wait for feedback before committing the change to avoid polluting the commit history in case of additional issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I suppose this is the intended behaviour since the main issue was the doc string to be relevant to Label Binarizer. So it would probably be okay to make the change since it seems the cause of the failure. Also flake8 tests is failing. Maybe there is some pep issues too.

Copy link
Contributor Author

@kgilliam125 kgilliam125 Oct 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but I think flake8 may fail if compile fails. I haven't used it very much. I get a connection error when I try to access the travis-ci report, so I can't look at it. That's what I get for making a change directly on GitHub.

Another potential issue is that I'm not directly matching the function signature from the base class in LabelBinarizer. The code should still work without doing that, but there are varied opinions on whether or not it's an acceptable thing to do. I'll leave it to you all to decide what you want me to do.

@amueller
Copy link
Member

you should certainly be able to view the errors here: https://travis-ci.org/scikit-learn/scikit-learn/jobs/167694801

Y : numpy array or CSR matrix of shape [n_samples, n_classes]
Shape will be [n_samples, 1] for binary problems.
"""
return super(fit_transform, self).fit_transform(y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super(LabelBinarizer, self)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also return(self.fit(y).transform(y)) though, if you like.

@amueller
Copy link
Member

do you want to do the same for the other classes in that file?

@kgilliam125
Copy link
Contributor Author

kgilliam125 commented Oct 14, 2016

@amueller Sure, I'll update the other class as well. Did you have an opinion about not matching the method signature to the base class?

I think I had a firewall issue with travis-ci on my side. I'm on a different network now and it's working fine. Thanks!

@amueller
Copy link
Member

Oh you mean because the base class has fit_transform(X, y=None)? I would argue that's really odd and your change is good. Please add an entry into whatsnew that documents your change.

@kgilliam125
Copy link
Contributor Author

kgilliam125 commented Oct 14, 2016

No problem, I'll leave it as is. I surfed through a few StackOverflow articles where people were arguing about it.

Do you want me to standardize the fit_transform overrides in this file? Since they just do self.fit(y).transform(y) it seems to me that calling the base class method would be more maintainable if the change is just for a docstring override.

@amueller
Copy link
Member

I'm not entirely sure I understand your last point. Why is calling the base class more maintainable? I don't have strong feelings either way, though.

@kgilliam125
Copy link
Contributor Author

kgilliam125 commented Oct 14, 2016

The only angle I had on that, was that if there were a bug later on with fit_transform, then using super... to call the function would allow the bug to be fixed in one place rather than the various places where self.fit(...).transform(...) has been called. Granted, since the base class fit_transform doesn't do very much outside of calling fit and transform, it probably doesn't matter.

Right now, it's probably best to go with using self.fit(...).transform(...) throughout so that the code stays consistent.

@kgilliam125
Copy link
Contributor Author

Both LabelEncoder and MultiLabelBinarizer already had overrides for fit_transform, so I left them unchanged.

@amueller
Copy link
Member

great thanks

@amueller amueller changed the title [WIP] Added override of fit_transform to LabelBinarizer [MRG] Added override of fit_transform to LabelBinarizer Oct 14, 2016
@dalmia
Copy link
Contributor

dalmia commented Nov 12, 2016

@kgilliam125 Are you working on this now?

@kgilliam125
Copy link
Contributor Author

@dalmia From my end this one is finished unless I have any more feedback. I'm just waiting for it to be approved and merged.

Parameters
----------
y : numpy array or sparse matrix of shape (n_samples,) or
(n_samples, n_classes) Target values. The 2-d matrix should only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line after : is treated specially. You can't continue it here at this indentation.

You can drop "numpy"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might be missing what you mean... the indent level looks the same as in the other function definitions.

I'll drop numpy though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first line is treated specially. You've let it wrap onto the second. I think the first line can be continued, but not like this. Try rendering the docs and seeing how it looks

----------
y : numpy array or sparse matrix of shape (n_samples,) or
(n_samples, n_classes) Target values. The 2-d matrix should only
contain 0 and 1, represents multilabel classification. Sparse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

", represents" -> ", and represents"

@jnothman
Copy link
Member


./sklearn/preprocessing/label.py:315:80: E501 line too long (82 > 79 characters)
        y : array or sparse matrix of shape (n_samples,) or (n_samples, n_classes)
                                                                               ^
./sklearn/preprocessing/label.py:315:83: W291 trailing whitespace
        y : array or sparse matrix of shape (n_samples,) or (n_samples, n_classes)
                                                                                  ^
./sklearn/preprocessing/label.py:316:80: E501 line too long (85 > 79 characters)
            Target values. The 2-d matrix should only contain 0 and 1, and represents
                                                                               ^
./sklearn/preprocessing/label.py:316:86: W291 trailing whitespace
            Target values. The 2-d matrix should only contain 0 and 1, and represents
                                                                                     ^
./sklearn/preprocessing/label.py:317:80: E501 line too long (87 > 79 characters)
            multilabel classification. Sparse matrix can be CSR, CSC, COO, DOK, or LIL.
                                                                               ^
./sklearn/preprocessing/label.py:334:80: E501 line too long (82 > 79 characters)
        y : array or sparse matrix of shape (n_samples,) or (n_samples, n_classes)
                                                                               ^
./sklearn/preprocessing/label.py:334:83: W291 trailing whitespace
        y : array or sparse matrix of shape (n_samples,) or (n_samples, n_classes)
                                                                                  ^

@kgilliam125
Copy link
Contributor Author

What about

y : array or sparse matrix
    Target Values. ...
    ... or LIL. Shape must be (n_samples,) or
    (n_samples, n_classes).

@jnothman
Copy link
Member

I'm okay with that. Elsewhere I think we've used:

        y : array or sparse matrix of shape (n_samples,) or
            (n_samples, n_classes)
        Description

for wrapping the type spec. I can't remember if it renders correctly, but it's not hard to check.

@kgilliam125
Copy link
Contributor Author

Didn't render correctly for me; it wrapped to the next line like before. I can make it work using a line continuation though.

    y : array or sparse matrix of shape (n_classes,) \
or (n_samples, n_classes)
        Target Values. ...

I don't think that will cause issues with Python since it's in a docstring. Thoughts?

@jnothman
Copy link
Member

jnothman commented Nov 25, 2016 via email

contain 0 and 1, represents multilabel classification. Sparse
matrix can be CSR, CSC, COO, DOK, or LIL.
y : array or sparse matrix of shape (n_samples,) or \
(n_samples, n_classes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please indent this. It'll render just the same.

Parameters
----------
y : array or sparse matrix of shape (n_samples,) or \
(n_samples, n_classes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please indent this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc it will actually change the rendering, but I'm still +1 on indenting. We should ask at numpydoc how this is supposed to work ^^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it looks better if it's at the same indent level as the preceding text. My concern is that it renders with spaces between or and (n_samples, n_classes). I'll check it when I get home tonight.

@jnothman
Copy link
Member

Otherwise LGTM

@jnothman jnothman changed the title [MRG] Added override of fit_transform to LabelBinarizer [MRG+1] Added override of fit_transform to LabelBinarizer Nov 30, 2016
Copy link
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with how it is right now though I'd appreciate the cosmetic changes I suggested 👍

"""Fit label binarizer and transform multi-class labels to binary
labels.

The output of transform is sometimes referred to by some authors as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the "by some authors". That seems implicit in "sometimes"...


Returns
-------
Y : array or CSR matrix of shape [n_samples, n_classes]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you used tuples / parentheses everywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I standardized my usage to always describe shapes using brackets. There was one other place (the fit function) in LabelBinarizer that used parens in a similar fashion, so I changed it to use brackets as well.

@jnothman
Copy link
Member

jnothman commented Nov 30, 2016 via email

@jnothman
Copy link
Member

jnothman commented Dec 5, 2016

So is that LGTM from you, @amueller?

@amueller
Copy link
Member

amueller commented Dec 6, 2016

yeah LGTM

@amueller amueller merged commit d39c273 into scikit-learn:master Dec 6, 2016
sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017
…rn#7670)

* Added override of fit_transform to LabelBinarizer

* Updated fit_transform to call base class method

* Changed fit_transform for code consistency

* Removed whitespace on blank lines

* Fixed line wrap issues for doc gen.

* Used line cont. for term defs

* Standardized bracket usage, fixed line cont. indent level
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
…rn#7670)

* Added override of fit_transform to LabelBinarizer

* Updated fit_transform to call base class method

* Changed fit_transform for code consistency

* Removed whitespace on blank lines

* Fixed line wrap issues for doc gen.

* Used line cont. for term defs

* Standardized bracket usage, fixed line cont. indent level
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…rn#7670)

* Added override of fit_transform to LabelBinarizer

* Updated fit_transform to call base class method

* Changed fit_transform for code consistency

* Removed whitespace on blank lines

* Fixed line wrap issues for doc gen.

* Used line cont. for term defs

* Standardized bracket usage, fixed line cont. indent level
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…rn#7670)

* Added override of fit_transform to LabelBinarizer

* Updated fit_transform to call base class method

* Changed fit_transform for code consistency

* Removed whitespace on blank lines

* Fixed line wrap issues for doc gen.

* Used line cont. for term defs

* Standardized bracket usage, fixed line cont. indent level
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…rn#7670)

* Added override of fit_transform to LabelBinarizer

* Updated fit_transform to call base class method

* Changed fit_transform for code consistency

* Removed whitespace on blank lines

* Fixed line wrap issues for doc gen.

* Used line cont. for term defs

* Standardized bracket usage, fixed line cont. indent level
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants