-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[MRG+2] MultiOutputClassifier #6127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] MultiOutputClassifier #6127
Conversation
Thanks a lot for the PR, could you remove the main in test file, as all tests are run by nose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change the docstrings of all test*
functions to comments please?
@rvraghav93, I just had a chat with @hugobowne about doing these changes and fixing the unit test failures. Might submit a PR soon if that's ok...? |
You mean a PR to @hugobowne's branch right? Raising another PR to scikit learn is not necessary :) |
Also a few points - The file MultiOneRest is empty? This PR seems to have only the tests. And I think the prefered filename would be Please see if this approach could be followed instead? |
I suppose the file MultiOneRest.py was committed as a executable and hence showing in the diff as empty. Had faced this issue before :) |
Ah that's new to me! |
wacky issue! @rvraghav93, i think the preferred filename should be multi_one_vs_rest.py or something along these lines, as this PR deals specifically with '[one-versus-all] classification models' -- In particular, it doesn't deal with regressors at all. Moreover, it should probably be generalized to deal with all classification models (this will be an easy extension). @rvraghav93, we had completed it before you suggested your approach. after fixing all necessary issues, i suggest we i) generalize to deal with all classification models & leave regressors for a different PR. thoughts? |
Yes, like @mblondel suggests here, we should have
Indeed. regression meta-estimator could be done in a separate PR. And Thanks for your patience! |
sklearn/MultiOneVsRest.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you did not mean to commit an empty file :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MechCoder i definitely didn't ! the diff is empty for some wacky reason but the file is not! can you confirm this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try
chmod -x
- Add the file again and commit
git config core.fileMode false
- Add the file again and commit, if needed (I think you won't need to)
- Squash all the commits
- Force push
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you could just copy over the code to multioutput.py
, remove this file and force push because that is what you are ultimately going to do anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my thinking exactly @MechCoder
@hugobowne I've changed the title to WIP. Let us know once you finish up the |
@hugobowne Thanks a lot for letting me know about this. I did pull the code from your branch. I would be happy if I can help in any way. One doubt - Even if I add a commit to this code, to continue working on this PR, I need to push them to this branch, for which I won't have the access rights. Any help is appreciated ! Thanks. |
I have just done simple modifications and refactored the code a little. It is at this branch. Please do look at it though I haven't added any new functionality. Thanks. |
thanks for patience, all. this is a quick note: but for workflow here, generally, perhaps @MechCoder or @rvraghav93 could suggest best practice given the following: I won't have much time to contribute in the upcoming weeks & @maniteja123 is going to work on the MultiOutputClassifier -- in this case, is it i) best for him to issue PRs to my branch OR ii) should I give him collaborator access to my branch so that I don't need to merge etc... (in which case this all may move more quickly). is there a common practice for this? |
The common way to do this is as a PR to your branch as you had suggested. But if you don't mind giving him access to your repository, you can go ahead as it would indeed speed things up :) |
I agree On Sat, Jan 9, 2016 at 10:57 PM, Raghav R V notifications@github.com
Manoj, |
hi all. I have just now merged @james-nichols PR into my branch. I then tried to squash commits but think I may have completely bungled it -- i used this as a guide: http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html thoughts? @rvraghav93 @MechCoder @maniteja123 : I have given you collaborator rights to my sklearn fork so please feel free to work on the branch -- I would suggest that you shoot me an email when working on it & i will do the same. collaborator on code @MrChristophRivera can also field questions when I'm unable to. |
ba0db84
to
bae109a
Compare
ok I just attempted to squash again. let me know how it's looking. apologies for rookie errors! |
sklearn/multioutput.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this paragraph looks great but it should belong to an example and not here, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left it for here as of now. Will make a point.
sklearn/tests/test_multioutput.py
Outdated
forest_.fit(X, y[:, i]) | ||
assert_equal(list(forest_.predict(X)), list(predictions[:, i])) | ||
assert_almost_equal(list(forest_.predict_proba(X)), | ||
list(predict_proba[:, :, i]), decimal=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decimal=1
is small, can't you go further?
Can you have the exact result with appropriate random_state ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it to assert_array_equal
now and the test succeeds.
@TomDLT I have done all the changes. I also did go through the whole code for any errors in documentation or tests. Hopefully, I have addressed all the comments. |
sklearn/multioutput.py
Outdated
def fit(self, X, y, sample_weight=None): | ||
""" Fit the model to data. | ||
Fits a seperate model for each output variable. | ||
Fit a seperate model for each output variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate
Can you squash into 2 commits? |
Yeah, I should do something like |
Yes, at the end you should have only two commits, hugobown's work and yours |
Just a doubt. when I rebase with |
Just squash you last 12 into one. git rebase -i HEAD~12 |
I have one local commit also, so it should be 13, right ? |
2ea42ca
to
2c8dd4e
Compare
yes |
@TomDLT Please merge if you are happy! Thanks! |
This looks really good to me! Just one detail:
Actually not for META_ESTIMATORS, but I am not sure if we should add it in common tests or in test_multioutput.py |
@maniteja123 Could you just add a test to check for NonFittedError? |
2c8dd4e
to
29ee54a
Compare
@MechCoder I added a simple test for NotFittedError when predict, predict_proba and score are called. |
Merging with master. Thanks for your perseverance! 🍷 🍷 |
Thanks @maniteja123 and @hugobowne |
We forgot to update whatsnew.rst for this. Could you do that? |
Yeah sure. Shall I push it to this branch itself ? |
And thank you so so much @MechCoder @rvraghav93 @TomDLT and everyone else for all the help and bearing patiently with my doubts and sincere thanks to @hugobowne for letting me work on this. I am again sorry for taking so much of your time in reviewing this multiple times. |
Yes please push it here. I'll cherry-pick it |
@MechCoder sorry for the delay, This is the commit |
TODO for this PR