-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[MRG+1] Added override of fit_transform to LabelBinarizer #7670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Added override of fit_transform to LabelBinarizer #7670
Conversation
@@ -352,8 +352,7 @@ def fit_transform(self, y): | |||
Y : numpy array or CSR matrix of shape [n_samples, n_classes] | |||
Shape will be [n_samples, 1] for binary problems. | |||
""" | |||
self.fit(y) | |||
return self.transform(y) | |||
return super(fit_transform, self).fit_transform(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a mistake here, it should be return super(LabelBinarizer, self).fit_transform(y)
. I'm going to wait for feedback before committing the change to avoid polluting the commit history in case of additional issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I suppose this is the intended behaviour since the main issue was the doc string to be relevant to Label Binarizer. So it would probably be okay to make the change since it seems the cause of the failure. Also flake8 tests is failing. Maybe there is some pep issues too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but I think flake8 may fail if compile fails. I haven't used it very much. I get a connection error when I try to access the travis-ci report, so I can't look at it. That's what I get for making a change directly on GitHub.
Another potential issue is that I'm not directly matching the function signature from the base class in LabelBinarizer
. The code should still work without doing that, but there are varied opinions on whether or not it's an acceptable thing to do. I'll leave it to you all to decide what you want me to do.
you should certainly be able to view the errors here: https://travis-ci.org/scikit-learn/scikit-learn/jobs/167694801 |
Y : numpy array or CSR matrix of shape [n_samples, n_classes] | ||
Shape will be [n_samples, 1] for binary problems. | ||
""" | ||
return super(fit_transform, self).fit_transform(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super(LabelBinarizer, self)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also return(self.fit(y).transform(y))
though, if you like.
do you want to do the same for the other classes in that file? |
@amueller Sure, I'll update the other class as well. Did you have an opinion about not matching the method signature to the base class? I think I had a firewall issue with travis-ci on my side. I'm on a different network now and it's working fine. Thanks! |
Oh you mean because the base class has |
No problem, I'll leave it as is. I surfed through a few StackOverflow articles where people were arguing about it. Do you want me to standardize the |
I'm not entirely sure I understand your last point. Why is calling the base class more maintainable? I don't have strong feelings either way, though. |
The only angle I had on that, was that if there were a bug later on with Right now, it's probably best to go with using |
Both |
great thanks |
@kgilliam125 Are you working on this now? |
@dalmia From my end this one is finished unless I have any more feedback. I'm just waiting for it to be approved and merged. |
Parameters | ||
---------- | ||
y : numpy array or sparse matrix of shape (n_samples,) or | ||
(n_samples, n_classes) Target values. The 2-d matrix should only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first line after :
is treated specially. You can't continue it here at this indentation.
You can drop "numpy"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I might be missing what you mean... the indent level looks the same as in the other function definitions.
I'll drop numpy though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the first line is treated specially. You've let it wrap onto the second. I think the first line can be continued, but not like this. Try rendering the docs and seeing how it looks
---------- | ||
y : numpy array or sparse matrix of shape (n_samples,) or | ||
(n_samples, n_classes) Target values. The 2-d matrix should only | ||
contain 0 and 1, represents multilabel classification. Sparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
", represents" -> ", and represents"
|
What about
|
I'm okay with that. Elsewhere I think we've used:
for wrapping the type spec. I can't remember if it renders correctly, but it's not hard to check. |
Didn't render correctly for me; it wrapped to the next line like before. I can make it work using a line continuation though.
I don't think that will cause issues with Python since it's in a docstring. Thoughts? |
If that works in rendering I'm fine with it.
…On 25 November 2016 at 13:17, Kyle Gilliam ***@***.***> wrote:
Didn't render correctly for me; it wrapped to the next line like before. I
can make it work using a line continuation though.
y : array or sparse matrix of shape (n_classes,) \
or (n_samples, n_classes)
Target Values. ...
I don't think that will cause issues with Python since it's in a
docstring. Thoughts?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7670 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz66IOIPEK-QsBD2gJS3Li6DUohs2xks5rBkUngaJpZM4KXNJX>
.
|
contain 0 and 1, represents multilabel classification. Sparse | ||
matrix can be CSR, CSC, COO, DOK, or LIL. | ||
y : array or sparse matrix of shape (n_samples,) or \ | ||
(n_samples, n_classes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please indent this. It'll render just the same.
Parameters | ||
---------- | ||
y : array or sparse matrix of shape (n_samples,) or \ | ||
(n_samples, n_classes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please indent this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc it will actually change the rendering, but I'm still +1 on indenting. We should ask at numpydoc how this is supposed to work ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that it looks better if it's at the same indent level as the preceding text. My concern is that it renders with spaces between or
and (n_samples, n_classes)
. I'll check it when I get home tonight.
Otherwise LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with how it is right now though I'd appreciate the cosmetic changes I suggested 👍
"""Fit label binarizer and transform multi-class labels to binary | ||
labels. | ||
|
||
The output of transform is sometimes referred to by some authors as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove the "by some authors". That seems implicit in "sometimes"...
|
||
Returns | ||
------- | ||
Y : array or CSR matrix of shape [n_samples, n_classes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you used tuples / parentheses everywhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I standardized my usage to always describe shapes using brackets. There was one other place (the fit
function) in LabelBinarizer that used parens in a similar fashion, so I changed it to use brackets as well.
additional spaces should be insignificant in rendering in either TeX or
HTML.
…On 1 December 2016 at 08:50, Kyle Gilliam ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/preprocessing/label.py
<#7670>:
> @@ -304,18 +304,41 @@ def fit(self, y):
self.classes_ = unique_labels(y)
return self
+ def fit_transform(self, y):
+ """Fit label binarizer and transform multi-class labels to binary
+ labels.
+
+ The output of transform is sometimes referred to by some authors as
+ the 1-of-K coding scheme.
+
+ Parameters
+ ----------
+ y : array or sparse matrix of shape (n_samples,) or \
+(n_samples, n_classes)
Agree that it looks better if it's at the same indent level as the
preceding text. My concern is that it renders with spaces between or and (n_samples,
n_classes). I'll check it when I get home tonight.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7670>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAEz647KpV8vYyNhC5Fri7DPA7w4jpU5ks5rDe-rgaJpZM4KXNJX>
.
|
So is that LGTM from you, @amueller? |
yeah LGTM |
…rn#7670) * Added override of fit_transform to LabelBinarizer * Updated fit_transform to call base class method * Changed fit_transform for code consistency * Removed whitespace on blank lines * Fixed line wrap issues for doc gen. * Used line cont. for term defs * Standardized bracket usage, fixed line cont. indent level
…rn#7670) * Added override of fit_transform to LabelBinarizer * Updated fit_transform to call base class method * Changed fit_transform for code consistency * Removed whitespace on blank lines * Fixed line wrap issues for doc gen. * Used line cont. for term defs * Standardized bracket usage, fixed line cont. indent level
…rn#7670) * Added override of fit_transform to LabelBinarizer * Updated fit_transform to call base class method * Changed fit_transform for code consistency * Removed whitespace on blank lines * Fixed line wrap issues for doc gen. * Used line cont. for term defs * Standardized bracket usage, fixed line cont. indent level
…rn#7670) * Added override of fit_transform to LabelBinarizer * Updated fit_transform to call base class method * Changed fit_transform for code consistency * Removed whitespace on blank lines * Fixed line wrap issues for doc gen. * Used line cont. for term defs * Standardized bracket usage, fixed line cont. indent level
…rn#7670) * Added override of fit_transform to LabelBinarizer * Updated fit_transform to call base class method * Changed fit_transform for code consistency * Removed whitespace on blank lines * Fixed line wrap issues for doc gen. * Used line cont. for term defs * Standardized bracket usage, fixed line cont. indent level
Reference Issue
#7238
What does this implement/fix? Explain your changes.
Added an override of the
fit_transform
method to the LabelBinarizer class and provided anew docstring based on the
fit
andtransform
docstrings.Any other comments?
To keep the original behavior, I'm assuming I should call the base class
fit_transform
method? Let me know if you want it to do something else.Also, I haven't had a chance to run tests on the change yet, but will do that when
I get home this afternoon.