Skip to content

Conversation

hugobowne
Copy link
Contributor

TODO for this PR

  • Check if meta of meta works (a few combinations)
  • To check the shapes of outputs of all methods
  • Check if meta of meta works (a few combinations)
  • Compare with multiple runs of single o/p estimators
  • Check if the various errors are raised properly (when the n_outputs during predict time is > (or !=?) during fit time).
  • Check if not fitted error is raised? (or could we make the general tests take care of that? I think there is a test for meta estimators.)
  • See if this passes all the meta estimators' tests
  • Make the tests pass ;)
  • Add documentation

@raghavrv
Copy link
Member

raghavrv commented Jan 7, 2016

Thanks a lot for the PR, could you remove the main in test file, as all tests are run by nose



def test_multi_target_init_with_random_forest():
''' test if multi_target initilizes correctly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change the docstrings of all test* functions to comments please?

@james-nichols
Copy link

@rvraghav93, I just had a chat with @hugobowne about doing these changes and fixing the unit test failures. Might submit a PR soon if that's ok...?

@raghavrv
Copy link
Member

raghavrv commented Jan 7, 2016

You mean a PR to @hugobowne's branch right? Raising another PR to scikit learn is not necessary :)

@raghavrv
Copy link
Member

raghavrv commented Jan 7, 2016

Also a few points -

The file MultiOneRest is empty? This PR seems to have only the tests.

And I think the prefered filename would be multioutput.py

Please see if this approach could be followed instead?

@maniteja123
Copy link
Contributor

I suppose the file MultiOneRest.py was committed as a executable and hence showing in the diff as empty. Had faced this issue before :)

@raghavrv
Copy link
Member

raghavrv commented Jan 7, 2016

Ah that's new to me!

@hugobowne
Copy link
Contributor Author

wacky issue!

@rvraghav93, i think the preferred filename should be multi_one_vs_rest.py or something along these lines, as this PR deals specifically with '[one-versus-all] classification models' --

In particular, it doesn't deal with regressors at all. Moreover, it should probably be generalized to deal with all classification models (this will be an easy extension). @rvraghav93, we had completed it before you suggested your approach.

after fixing all necessary issues, i suggest we i) generalize to deal with all classification models & leave regressors for a different PR.

thoughts?

@raghavrv
Copy link
Member

raghavrv commented Jan 7, 2016

Moreover, it should probably be generalized to deal with all classification models (this will be an easy extension).

Yes, like @mblondel suggests here, we should have MutiOutputClassifier and MultiOutputRegressor put inside the multioutput.py. The MultiOutput* meta-estimators should support any estimator including but not limited to the OneVsRestClassifier, which by itself is a meta-estimator!

leave regressors for a different PR.

Indeed. regression meta-estimator could be done in a separate PR. And partial_fit support in a subsequent one after that.

Thanks for your patience!

@MechCoder MechCoder changed the title added MultiOneVsRest classifier & testing suite (Meta-estimators for multi-output learning #5824) [MRG] added MultiOneVsRest classifier & testing suite (Meta-estimators for multi-output learning #5824) Jan 7, 2016
@@ -0,0 +1 @@
"""MultiOneVsRestClassifier===========================This module includes several classes that extend base estimators to multi-target estimators. Most sklearn estimators use a response matrix to train a target functionwith a single output variable. I.e. typical estimators use the training set X to estimate a target function f(X) that predicts a single Y. The purpose of this class is to extend estimatorsto be able to estimate a series of target functions (f1,f2,f3...,fn)that are trained on a single X predictor matrix to predict a seriesof reponses (y1,y2,y3...,yn)."""#Author: Hugo Bowne-Anderson <hugobowne@gmail.com>#Author: Chris Rivera <chris.richard.rivera@gmail.com>#Author: Michael Williamson#License: BSD 3 clauseimport arrayimport numpy as npimport warningsimport scipy.sparse as spfrom sklearn.base import BaseEstimator, ClassifierMixinfrom sklearn.base import clone, is_classifierfrom sklearn.base import MetaEstimatorMixin, is_regressorfrom sklearn.preprocessing import LabelBinarizerfrom sklearn.metrics.pairwise import euclidean_distancesfrom sklearn.utils import check_random_statefrom sklearn.utils.validation import _num_samplesfrom sklearn.utils.validation import check_consistent_lengthfrom sklearn.utils.validation import check_is_fittedfrom sklearn.externals.joblib import Parallelfrom sklearn.externals.joblib import delayedfrom sklearn.multiclass import OneVsRestClassifierclass MultiOneVsRestClassifier(): """ Converts any classifer estimator into a multi-target classifier estimator. This class fits and predicts a series of one-versus-all models to response matrix Y, which has n_samples and p_target variables, on the predictor Matrix X with n_samples and m_feature variables. This allows for multiple target variable classifications. For each target variable (column in Y), a separate OneVsRestClassifier is fit. See the base OneVsRestClassifier Class in sklearn.multiclass for more details. Parameters ---------- estimator : estimator object An estimator object implementing `fit` & `predict_proba`. n_jobs : int, optional, default: 1 The number of jobs to use for the computation. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. Note that parallel processing only occurs if there is multiple classes within each target variable. It does each target variable in y in series. Attributes __________ estimator: Sklearn estimator: The base estimator used to constructe the model. """ def __init__(self, estimator=None, n_jobs=1): self.estimator = estimator self.n_jobs = n_jobs def fit(self, X, y): """ Fit the model to data. Creates a seperate model for each Response column. Parameters ---------- X : (sparse) array-like, shape = [n_samples, n_features] Data. y : (sparse) array-like, shape = [n_samples, p_targets] Multi-class targets. An indicator matrix turns on multilabel classification. Returns ------- self """ # check to see that the data is numeric # check to see that the X and y have the same number of rows. # Calculate the number of classifiers self._num_y = y.shape[1] ## create a dictionary to hold the estimators. self.estimators_ ={} for i in range(self._num_y): # init a new classifer for each and fit it. estimator = clone(self.estimator) #make a fresh clone ovr = OneVsRestClassifier(estimator,self.n_jobs) self.estimators_[i] = ovr.fit(X,y[:, i]) return self def predict(self, X): """Predict multi-class multiple target variable using a model trained for each target variable. Parameters ---------- X : (sparse) array-like, shape = [n_samples, n_features] Data. Returns ------- y : dict of [sparse array-like], shape = {predictors: n_samples} or {predictors: [n_samples, n_classes], n_predictors}. Predicted multi-class targets across multiple predictors. Note: entirely separate models are generated for each predictor. """ # check to see if the fit has been performed check_is_fitted(self, 'estimators_') results = {} for label, model_ in self.estimators_.iteritems(): results[label] = model_.predict( X) return(results) def predict_proba(self, X): """Probability estimates. This returns prediction probabilites for each class for each label in the form of a dictionary. Parameters ---------- X : array-like, shape = [n_samples, n_features] Returns ------- prob_dict (dict) A dictionary containing n_label sparse arrays with shape = [n_samples, n_classes]. Each row in the array contains the the probability of the sample for each class in the model, where classes are ordered as they are in `self.classes_`. """ # check to see whether the fit has occured. check_is_fitted(self, 'estimators_') results ={} for label, model_ in self.estimators_.iteritems(): results[label] = model_.predict_proba(X) return(results) def score(self, X, Y): """"Returns the mean accuracy on the given test data and labels. Parameters ---------- X : array-like, shape = [n_samples, n_features] Y : (sparse) array-like, shape = [n_samples, p_targets] Returns ------- scores (np.array) Array of p_target floats of the mean accuracy of each estimator_.predict wrt. y. """ check_is_fitted(self, 'estimators_') # Score the results for each function results =[] for i in range(self._num_y): estimator = self.estimators_[i] results.append(estimator.score(X,Y[:,i])) return results def get_params(self): '''returns the parameters of the estimator.''' return self.estimator.get_params() def set_params(self, params): """sets the params for the estimator.""" self.estimator.set_params(params) def __repr__(self): return 'MultiOneVsRestClassifier( %s )' %self.estimator.__repr__() @property def multilabel_(self): """returns a vector of whether each classifer is a multilabel classifier in tuple for """ return [(label, model_.multilabel_) for label, model_ in self.estimators_.iteritems()] @property def classes_(self): return [(label, model_.label_binarizer_) for label, model_ in self.estimators_.iteritems()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you did not mean to commit an empty file :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MechCoder i definitely didn't ! the diff is empty for some wacky reason but the file is not! can you confirm this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try

  1. chmod -x
  2. Add the file again and commit
  3. git config core.fileMode false
  4. Add the file again and commit, if needed (I think you won't need to)
  5. Squash all the commits
  6. Force push

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you could just copy over the code to multioutput.py, remove this file and force push because that is what you are ultimately going to do anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my thinking exactly @MechCoder

@MechCoder MechCoder changed the title [MRG] added MultiOneVsRest classifier & testing suite (Meta-estimators for multi-output learning #5824) [WIP] added MultiOneVsRest classifier & testing suite (Meta-estimators for multi-output learning #5824) Jan 7, 2016
@MechCoder
Copy link
Member

@hugobowne I've changed the title to WIP. Let us know once you finish up the MultiOutputClassifier (and if you need any help).

@maniteja123
Copy link
Contributor

@hugobowne Thanks a lot for letting me know about this. I did pull the code from your branch. I would be happy if I can help in any way.

One doubt - Even if I add a commit to this code, to continue working on this PR, I need to push them to this branch, for which I won't have the access rights. Any help is appreciated !

Thanks.

@maniteja123
Copy link
Contributor

I have just done simple modifications and refactored the code a little. It is at this branch. Please do look at it though I haven't added any new functionality. Thanks.

@hugobowne
Copy link
Contributor Author

thanks for patience, all. this is a quick note:
@maniteja123 thanks for jumping in & getting involved. if you're going to work on the code now, it is best to currently make a PR to my branch, I think. @james-nichols has just done so & I hope to merge, commit here & squash when i get a moment early this coming week.

but for workflow here, generally, perhaps @MechCoder or @rvraghav93 could suggest best practice given the following: I won't have much time to contribute in the upcoming weeks & @maniteja123 is going to work on the MultiOutputClassifier -- in this case, is it i) best for him to issue PRs to my branch OR ii) should I give him collaborator access to my branch so that I don't need to merge etc... (in which case this all may move more quickly).

is there a common practice for this?

@raghavrv
Copy link
Member

The common way to do this is as a PR to your branch as you had suggested. But if you don't mind giving him access to your repository, you can go ahead as it would indeed speed things up :)

@MechCoder
Copy link
Member

I agree

On Sat, Jan 9, 2016 at 10:57 PM, Raghav R V notifications@github.com
wrote:

The common way to do this is as a PR to your branch as you had suggested.
But if you don't mind giving him access to your repository, you can go
ahead as it would indeed speed things up :)


Reply to this email directly or view it on GitHub
#6127 (comment)
.

Manoj,
http://github.com/MechCoder

@hugobowne
Copy link
Contributor Author

hi all. I have just now merged @james-nichols PR into my branch. I then tried to squash commits but think I may have completely bungled it -- i used this as a guide: http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html

thoughts? @rvraghav93 @MechCoder

@maniteja123 : I have given you collaborator rights to my sklearn fork so please feel free to work on the branch -- I would suggest that you shoot me an email when working on it & i will do the same.

collaborator on code @MrChristophRivera can also field questions when I'm unable to.

@hugobowne hugobowne force-pushed the MultiOneVsRestClassifier branch from ba0db84 to bae109a Compare January 12, 2016 00:09
@hugobowne
Copy link
Contributor Author

ok I just attempted to squash again. let me know how it's looking. apologies for rookie errors!

@MechCoder MechCoder changed the title [WIP] added MultiOneVsRest classifier & testing suite (Meta-estimators for multi-output learning #5824) [MRG] added MultiOneVsRest classifier & testing suite (Meta-estimators for multi-output learning #5824) Jan 13, 2016
predicts a single Y. The purpose of this class is to extend estimators
to be able to estimate a series of target functions (f1,f2,f3...,fn)
that are trained on a single X predictor matrix to predict a series
of reponses (y1,y2,y3...,yn).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this paragraph looks great but it should belong to an example and not here, I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left it for here as of now. Will make a point.

forest_.fit(X, y[:, i])
assert_equal(list(forest_.predict(X)), list(predictions[:, i]))
assert_almost_equal(list(forest_.predict_proba(X)),
list(predict_proba[:, :, i]), decimal=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decimal=1 is small, can't you go further?
Can you have the exact result with appropriate random_state ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it to assert_array_equal now and the test succeeds.

@maniteja123
Copy link
Contributor

@TomDLT I have done all the changes. I also did go through the whole code for any errors in documentation or tests. Hopefully, I have addressed all the comments.

@@ -45,18 +45,18 @@ def __init__(self, estimator, n_jobs=1):

def fit(self, X, y, sample_weight=None):
""" Fit the model to data.
Fits a seperate model for each output variable.
Fit a seperate model for each output variable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate

@TomDLT
Copy link
Member

TomDLT commented Mar 31, 2016

Can you squash into 2 commits?

@maniteja123
Copy link
Contributor

Yeah, I should do something like git rebase -i HEAD~14 after the new commit right ?

@TomDLT
Copy link
Member

TomDLT commented Mar 31, 2016

Yes, at the end you should have only two commits, hugobown's work and yours

@maniteja123
Copy link
Contributor

Just a doubt. when I rebase with git rebase -i HEAD~14, I get all the commits in the PR. Shall I pick the first two (one by hugobowne and other is my first one) and squash the remaining ? Sorry to bother you but I have never worked in collaboration and don't want to do any mistake here.

@MechCoder
Copy link
Member

Just squash you last 12 into one.

git rebase -i HEAD~12

@maniteja123
Copy link
Contributor

I have one local commit also, so it should be 13, right ?

@maniteja123 maniteja123 force-pushed the MultiOneVsRestClassifier branch from 2ea42ca to 2c8dd4e Compare March 31, 2016 13:17
@TomDLT
Copy link
Member

TomDLT commented Mar 31, 2016

Shall I pick the first two (one by hugobowne and other is my first one) and squash the remaining

yes

@MechCoder
Copy link
Member

@TomDLT Please merge if you are happy! Thanks!

@TomDLT
Copy link
Member

TomDLT commented Mar 31, 2016

This looks really good to me!


Just one detail:

NotFittedError ( handled by common tests ?)

Actually not for META_ESTIMATORS, but I am not sure if we should add it in common tests or in test_multioutput.py

@TomDLT TomDLT changed the title [MRG] MultiOutputClassifier [MRG+2] MultiOutputClassifier Mar 31, 2016
@MechCoder
Copy link
Member

@maniteja123 Could you just add a test to check for NonFittedError?

@maniteja123 maniteja123 force-pushed the MultiOneVsRestClassifier branch from 2c8dd4e to 29ee54a Compare April 1, 2016 01:49
@maniteja123
Copy link
Contributor

@MechCoder I added a simple test for NotFittedError when predict, predict_proba and score are called. Do I need to add it for all the other MetaEstimators too ? Oh sorry just woke up. I just added it to this meta estimator.

@MechCoder
Copy link
Member

Merging with master. Thanks for your perseverance! 🍷 🍷

@MechCoder MechCoder merged commit 3078d7d into scikit-learn:master Apr 1, 2016
@raghavrv
Copy link
Member

raghavrv commented Apr 1, 2016

Thanks @maniteja123 and @hugobowne

@MechCoder
Copy link
Member

We forgot to update whatsnew.rst for this. Could you do that?

@maniteja123
Copy link
Contributor

Yeah sure. Shall I push it to this branch itself ?

@maniteja123
Copy link
Contributor

And thank you so so much @MechCoder @rvraghav93 @TomDLT and everyone else for all the help and bearing patiently with my doubts and sincere thanks to @hugobowne for letting me work on this. I am again sorry for taking so much of your time in reviewing this multiple times.

@MechCoder
Copy link
Member

Yes please push it here. I'll cherry-pick it

@maniteja123
Copy link
Contributor

@MechCoder sorry for the delay, This is the commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants