New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG Feature stacker #1173

Merged
merged 11 commits into from Sep 30, 2012

Conversation

Projects
None yet
7 participants
@amueller
Member

amueller commented Sep 20, 2012

This estimator provides a Y piece for the pipeline.
I used it to combine word ngrams and char ngrams into a single transformer.
Basically it just concatenates the output of several transformers into one large feature.

If you think this is helpful, I'll add some docs and an example.
With this, together with Pipeline, one can build arbitrary complex graphs (with one source and one sink) of estimators in sklearn :)

TODO

  • tests
  • narrative documentation
  • example

Thanks to the awesome implementation of the BaseEstimator, grid search simply works - though with complicated graphs you get parameter names like feature_stacker__first_feature__feature_selection__percentile (more or less from my code ^^).

@ogrisel

View changes

Show outdated Hide outdated sklearn/linear_model/tests/test_randomized_l1.py
clf = RandomizedLogisticRegression(verbose=False, C=1., random_state=42,
scaling=scaling, n_resampling=50, tol=1e-3)
feature_scores_sp = clf.fit(X_sp, y).scores_
assert_equal(feature_scores, feature_scores_sp)

This comment has been minimized.

@ogrisel

ogrisel Sep 20, 2012

Member

This hunk seems to be unrelated to this PR.

@ogrisel

ogrisel Sep 20, 2012

Member

This hunk seems to be unrelated to this PR.

This comment has been minimized.

@amueller

amueller Sep 20, 2012

Member

whoops sorry, forked from wrong branch. just a sec.

@amueller

amueller Sep 20, 2012

Member

whoops sorry, forked from wrong branch. just a sec.

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

Very interesting. I want an example first! (then documentation and tests :)

Member

ogrisel commented Sep 20, 2012

Very interesting. I want an example first! (then documentation and tests :)

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 20, 2012

Member

on it :)

Member

amueller commented Sep 20, 2012

on it :)

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

@amueller to avoid forking from non-master branches you should use something such as http://volnitsky.com/project/git-prompt/

Member

ogrisel commented Sep 20, 2012

@amueller to avoid forking from non-master branches you should use something such as http://volnitsky.com/project/git-prompt/

@ogrisel

View changes

Show outdated Hide outdated sklearn/pipeline.py
features.append(trans.transform(X))
issparse = [sparse.issparse(f) for f in features]
if np.any(issparse):
features = sparse.hstack(features).tocsr()

This comment has been minimized.

@ogrisel

ogrisel Sep 20, 2012

Member

Maybe the tocsr() can be avoided. For instance the downstream model might prefer CSC such as ElasticNet for instance.

@ogrisel

ogrisel Sep 20, 2012

Member

Maybe the tocsr() can be avoided. For instance the downstream model might prefer CSC such as ElasticNet for instance.

This comment has been minimized.

@larsmans

larsmans Sep 21, 2012

Member

Then again, bugs crop up every now and then where estimators that are supposed to handle any sparse format turn out to only handle CSR. It's a good defensive strategy to produce CSR by default (and it's unfortunate that sparse.hstack doesn't do this already).

@larsmans

larsmans Sep 21, 2012

Member

Then again, bugs crop up every now and then where estimators that are supposed to handle any sparse format turn out to only handle CSR. It's a good defensive strategy to produce CSR by default (and it's unfortunate that sparse.hstack doesn't do this already).

This comment has been minimized.

@amueller

amueller Sep 21, 2012

Member

I wrote this thing in the heat of the battle and I don't remember if there was a reason or if it was just a precaution. I'm inclined to think that I put it there because something, somewhere, broke.

@amueller

amueller Sep 21, 2012

Member

I wrote this thing in the heat of the battle and I don't remember if there was a reason or if it was just a precaution. I'm inclined to think that I put it there because something, somewhere, broke.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 20, 2012

Member

Yes, it should derive from transformer mixin.
@larsmans can I interpret your comments such that you think this is a good thing to have?

Member

amueller commented Sep 20, 2012

Yes, it should derive from transformer mixin.
@larsmans can I interpret your comments such that you think this is a good thing to have?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 20, 2012

Member

Added a toy example.

Member

amueller commented Sep 20, 2012

Added a toy example.

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

I think such a feature stack should provide some way to do feature group normalization in one way or another. But this probably require some experiments to know which normalization pattern is useful on such beast in practice.

Anybody has practical experience or insight to share on this?

Member

ogrisel commented Sep 20, 2012

I think such a feature stack should provide some way to do feature group normalization in one way or another. But this probably require some experiments to know which normalization pattern is useful on such beast in practice.

Anybody has practical experience or insight to share on this?

@larsmans

This comment has been minimized.

Show comment
Hide comment
@larsmans

larsmans Sep 20, 2012

Member

GREAT idea! However, I don't like the name FeatureStacker much, as stacking implies putting things on top of each other, while this class concatenates things side-by-side.

I tried to find a "plumbing equivalent" of this class to keep with the pipeline metaphor, but I can't seem to find it. It's not quite a tee as it connects the various streams back together in the end. Maybe one of the other devs is more experienced with plumbing? :)

Member

larsmans commented Sep 20, 2012

GREAT idea! However, I don't like the name FeatureStacker much, as stacking implies putting things on top of each other, while this class concatenates things side-by-side.

I tried to find a "plumbing equivalent" of this class to keep with the pipeline metaphor, but I can't seem to find it. It's not quite a tee as it connects the various streams back together in the end. Maybe one of the other devs is more experienced with plumbing? :)

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

BTW I think the example could be improved my using a less trivial example (e.g. using the digits dataset) and showing that the cross validate score best grid searched parameter set of the pipeline with stacked features is better than the pipeline with individual feature transformers separately.

Member

ogrisel commented Sep 20, 2012

BTW I think the example could be improved my using a less trivial example (e.g. using the digits dataset) and showing that the cross validate score best grid searched parameter set of the pipeline with stacked features is better than the pipeline with individual feature transformers separately.

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

@larsmans maybe FeatureConcatenator?

Member

ogrisel commented Sep 20, 2012

@larsmans maybe FeatureConcatenator?

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 20, 2012

Member

FeatureUnion?

Member

ogrisel commented Sep 20, 2012

FeatureUnion?

@ogrisel ogrisel closed this Sep 20, 2012

@larsmans larsmans reopened this Sep 20, 2012

@larsmans

This comment has been minimized.

Show comment
Hide comment
@larsmans

larsmans Sep 20, 2012

Member

MultiTransformer?

Member

larsmans commented Sep 20, 2012

MultiTransformer?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 20, 2012

Member

glad you like it. the estimator and the example even more are in a very rough state.i wasn't sure if the was interest and i had to leave my desk without really testing the example. I'll try to polish it asap. thanks for your suggestions. i don't think this exists in plumbing btw. it's a t followed by a y. ..
Andy

Lars Buitinck notifications@github.com schrieb:

MultiTransformer?


Reply to this email directly or view it on GitHub:
#1173 (comment)

Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

Member

amueller commented Sep 20, 2012

glad you like it. the estimator and the example even more are in a very rough state.i wasn't sure if the was interest and i had to leave my desk without really testing the example. I'll try to polish it asap. thanks for your suggestions. i don't think this exists in plumbing btw. it's a t followed by a y. ..
Andy

Lars Buitinck notifications@github.com schrieb:

MultiTransformer?


Reply to this email directly or view it on GitHub:
#1173 (comment)

Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 21, 2012

Member

FeatureUnion?

My favorite so far

Member

GaelVaroquaux commented Sep 21, 2012

FeatureUnion?

My favorite so far

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 21, 2012

Member

I also like FeatureUnion.
Other possibilities: FeatureBinder, FeatureAssembler, FeatureCombiner.
Or maybe go away from feature? TransformerUnion, TransformBinder, TransformerBundle?

Hm i think I like TransformerBundle

Member

amueller commented Sep 21, 2012

I also like FeatureUnion.
Other possibilities: FeatureBinder, FeatureAssembler, FeatureCombiner.
Or maybe go away from feature? TransformerUnion, TransformBinder, TransformerBundle?

Hm i think I like TransformerBundle

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 21, 2012

Member

+1 for FeatureAssembler or FeatureUnion or TransformerBundle

Member

ogrisel commented Sep 21, 2012

+1 for FeatureAssembler or FeatureUnion or TransformerBundle

@larsmans

This comment has been minimized.

Show comment
Hide comment
@larsmans

larsmans Sep 21, 2012

Member

+1 for TransformerBundle.

Member

larsmans commented Sep 21, 2012

+1 for TransformerBundle.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 21, 2012

Member

In my application, I found the get_feature_names very helpful - I was using text data and some handcrafted features.
I fear in general this is hard to do. I thought about doing hasattr("get_feature_names") and otherwise just return estimator_name_0, estimator_name_1,.... This might be a bit problematic, though, as I don't think there is a reliable method to get the output dimensionality of a transformer :-/

Oh and @ogrisel for the normalization, each feature should be normalized separately, right?
This is "easily" possible but feeding the object pipelines of preprocessing and transformers. As normalization might be quite application specific, I think this solution is ok for the moment.
The code doesn't actually get too messy doing this.

Member

amueller commented Sep 21, 2012

In my application, I found the get_feature_names very helpful - I was using text data and some handcrafted features.
I fear in general this is hard to do. I thought about doing hasattr("get_feature_names") and otherwise just return estimator_name_0, estimator_name_1,.... This might be a bit problematic, though, as I don't think there is a reliable method to get the output dimensionality of a transformer :-/

Oh and @ogrisel for the normalization, each feature should be normalized separately, right?
This is "easily" possible but feeding the object pipelines of preprocessing and transformers. As normalization might be quite application specific, I think this solution is ok for the moment.
The code doesn't actually get too messy doing this.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 21, 2012

Member

ugh I just tried to work on the example and noticed that #1034 wasn't in master yet.
Without a good way to look at the grid search results, this PR is a lot less useful I think.
Have to work on #1034 more :-/

Member

amueller commented Sep 21, 2012

ugh I just tried to work on the example and noticed that #1034 wasn't in master yet.
Without a good way to look at the grid search results, this PR is a lot less useful I think.
Have to work on #1034 more :-/

@larsmans

This comment has been minimized.

Show comment
Hide comment
@larsmans

larsmans Sep 21, 2012

Member

We might introduce an n_features_out_ attribute/property on all transformers that work on feature vectors. For now, only supporting get_feature_names only when all underlying transformers do would be good enough, IMHO.

Member

larsmans commented Sep 21, 2012

We might introduce an n_features_out_ attribute/property on all transformers that work on feature vectors. For now, only supporting get_feature_names only when all underlying transformers do would be good enough, IMHO.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 21, 2012

Member

@larsmans ok, will do that. Should be easy enough.

Member

amueller commented Sep 21, 2012

@larsmans ok, will do that. Should be easy enough.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 21, 2012

Member

Having a bit of a hard time creating a good example :-/

Member

amueller commented Sep 21, 2012

Having a bit of a hard time creating a good example :-/

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 21, 2012

Member

Have you been able to use this kind of tool successfully for your kaggle contest? If so then we can stick to a simplistic toy example and tell in the narrative documentation which kind of feature bundle was proven useful in practice on which kind of problem (e.g. PCA feature + raw TF-IDF for text classification for instance).

Member

ogrisel commented Sep 21, 2012

Have you been able to use this kind of tool successfully for your kaggle contest? If so then we can stick to a simplistic toy example and tell in the narrative documentation which kind of feature bundle was proven useful in practice on which kind of problem (e.g. PCA feature + raw TF-IDF for text classification for instance).

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 21, 2012

Member

I can tell you how successful I was tomorrow ;)
It was definitely helpful to combine handcrafted features with word n-grams. Doing it using this estimator, I was still able to grid-seach for count-vectorize parameters such as min_df, ngram_range, etc. So that definitely helped.

Member

amueller commented Sep 21, 2012

I can tell you how successful I was tomorrow ;)
It was definitely helpful to combine handcrafted features with word n-grams. Doing it using this estimator, I was still able to grid-seach for count-vectorize parameters such as min_df, ngram_range, etc. So that definitely helped.

@mblondel

View changes

Show outdated Hide outdated sklearn/pipeline.py
This estimator applies a list of transformer objects in parallel to the
input data, then concatenates the results. This is useful to combine
several feature extraction mechanisms into a single estimator.

This comment has been minimized.

@mblondel

mblondel Sep 21, 2012

Member

single feature representation?

@mblondel

mblondel Sep 21, 2012

Member

single feature representation?

This comment has been minimized.

@amueller

amueller Sep 21, 2012

Member

I prefer it the way it is, as getting the features out is not the important part, the important part is formulating it as an estimator.

@amueller

amueller Sep 21, 2012

Member

I prefer it the way it is, as getting the features out is not the important part, the important part is formulating it as an estimator.

This comment has been minimized.

@mblondel

mblondel Sep 21, 2012

Member

I misunderstood what you meant. Since you're talking about extraction mechanisms, it may be clearer to say "in a single transformer".

@mblondel

mblondel Sep 21, 2012

Member

I misunderstood what you meant. Since you're talking about extraction mechanisms, it may be clearer to say "in a single transformer".

This comment has been minimized.

@amueller

amueller Sep 21, 2012

Member

agreed

@amueller

amueller Sep 21, 2012

Member

agreed

@mblondel

View changes

Show outdated Hide outdated sklearn/pipeline.py
X : array-like or sparse matrix, shape (n_samples, n_features)
Input data, used to fit transformers.
"""
for name, trans in self.transformer_list:

This comment has been minimized.

@mblondel

mblondel Sep 21, 2012

Member

supporting n_jobs would be nice :)

@mblondel

mblondel Sep 21, 2012

Member

supporting n_jobs would be nice :)

This comment has been minimized.

@amueller

amueller Sep 21, 2012

Member

In principle +1. Are there any transformers that use n_jobs? I am always afraid of having it on the wrong abstraction level....

@amueller

amueller Sep 21, 2012

Member

In principle +1. Are there any transformers that use n_jobs? I am always afraid of having it on the wrong abstraction level....

This comment has been minimized.

@mblondel

mblondel Sep 21, 2012

Member

Since it is embarrassingly parallel and each transformer can take time to fit, I think supporting n_jobs would make sense.

Are there any transformers that use n_jobs?

Not that I know of but I hope that users have enough common sense to not enable n_jobs at two different levels :)

@mblondel

mblondel Sep 21, 2012

Member

Since it is embarrassingly parallel and each transformer can take time to fit, I think supporting n_jobs would make sense.

Are there any transformers that use n_jobs?

Not that I know of but I hope that users have enough common sense to not enable n_jobs at two different levels :)

This comment has been minimized.

@amueller

amueller Sep 21, 2012

Member

So I see you are of the optimist persuasion ;)
I'll add it.

@amueller

amueller Sep 21, 2012

Member

So I see you are of the optimist persuasion ;)
I'll add it.

This comment has been minimized.

@GaelVaroquaux

GaelVaroquaux Sep 21, 2012

Member

On Fri, Sep 21, 2012 at 10:54:16AM -0700, Mathieu Blondel wrote:

Not that I know of but I hope that users have enough common sense to not
enable n_jobs at two different levels :)

I dream of a world in which joblib had a 'grand central dispatch'
pattern, and thus multiple levels of parallelism would be possible. That
world might happen at some point...

@GaelVaroquaux

GaelVaroquaux Sep 21, 2012

Member

On Fri, Sep 21, 2012 at 10:54:16AM -0700, Mathieu Blondel wrote:

Not that I know of but I hope that users have enough common sense to not
enable n_jobs at two different levels :)

I dream of a world in which joblib had a 'grand central dispatch'
pattern, and thus multiple levels of parallelism would be possible. That
world might happen at some point...

@mblondel

View changes

Show outdated Hide outdated sklearn/pipeline.py
List of transformer objects to be applied to the data.
"""
def __init__(self, transformer_list):

This comment has been minimized.

@mblondel

mblondel Sep 21, 2012

Member

a transformer_weight option to give more importance to some transformers could be useful!

@mblondel

mblondel Sep 21, 2012

Member

a transformer_weight option to give more importance to some transformers could be useful!

This comment has been minimized.

@amueller

amueller Sep 21, 2012

Member

hm ok, why not.

@amueller

amueller Sep 21, 2012

Member

hm ok, why not.

@mblondel

This comment has been minimized.

Show comment
Hide comment
@mblondel

mblondel Sep 21, 2012

Member

Nice idea indeed!

Member

mblondel commented Sep 21, 2012

Nice idea indeed!

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 21, 2012

Member

@mblondel any votes on the name?

Member

amueller commented Sep 21, 2012

@mblondel any votes on the name?

@mblondel

This comment has been minimized.

Show comment
Hide comment
@mblondel

mblondel Sep 22, 2012

Member

Some I like include FeatureAssembler, FeatureCombiner and FeatureUnion.

Member

mblondel commented Sep 22, 2012

Some I like include FeatureAssembler, FeatureCombiner and FeatureUnion.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 22, 2012

Member

Name votes:
FeatureAssembler II
FeatureCombiner I
FeatureUnion IIII
TransformerBundle III

(If I counted correctly, which is unlikely given my degree in math)
If no-one objects I'll rename to FeatureUnion and change the state of the PR to MRG.

Member

amueller commented Sep 22, 2012

Name votes:
FeatureAssembler II
FeatureCombiner I
FeatureUnion IIII
TransformerBundle III

(If I counted correctly, which is unlikely given my degree in math)
If no-one objects I'll rename to FeatureUnion and change the state of the PR to MRG.

@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Sep 22, 2012

Member

+1 for FeatureUnion

I would have thought of FeatureConcat (FeatureConcatenator?) but FeatureUnion
is fine with me.

Member

agramfort commented Sep 22, 2012

+1 for FeatureUnion

I would have thought of FeatureConcat (FeatureConcatenator?) but FeatureUnion
is fine with me.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 23, 2012

Member

Renamed, think this is good to go.

Member

amueller commented Sep 23, 2012

Renamed, think this is good to go.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 26, 2012

Member

Any more comments? (github claims this can not be merged but I just rebased, so it should be a fast-forward merge).

Member

amueller commented Sep 26, 2012

Any more comments? (github claims this can not be merged but I just rebased, so it should be a fast-forward merge).

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 26, 2012

Member

This cannot be merged in master currently but appart from that +1 for merging :)

Member

ogrisel commented Sep 26, 2012

This cannot be merged in master currently but appart from that +1 for merging :)

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 29, 2012

Member

LGTM. 👍 for merge. Thanks @amueller !

Member

GaelVaroquaux commented Sep 29, 2012

LGTM. 👍 for merge. Thanks @amueller !

@amueller amueller merged commit d087830 into scikit-learn:master Sep 30, 2012

@vene

This comment has been minimized.

Show comment
Hide comment
@vene

vene Oct 28, 2012

Member

Thank you for this convenient transformer. In my application I had to hack it a bit, and I wonder whether the feature I wanted could be more generally useful.

Basically, sometimes you want to concatenate the same feature extractor multiple times, and have some of the parameters tied when grid searching.

In my case, I was learning a hyphenator, so my data points consist of 2 strings: the one to the left of the current position and the one to the right of the current position. For this I defined a ProjectionVectorizer that has a column attribute that just says "I only work on X[:, column]" and concatenated two of these. Now, when grid searching, it is common sense to use the same n-gram range for both transformers, so the cleanest way to do this was this quick hack (no error handling):

class HomogeneousFeatureUnion(FeatureUnion):
    def set_params(self, **params):
        for key, value in params.iteritems():
            for _, transf in self.transformer_list:
                transf.set_params(**{key: value})

This can be easily extended to support both tied params and specific params. I'm not sure whether I overengineered this, but I still have the feeling that this might pop up in other people's applications, so I wanted to raise the question.

Member

vene commented Oct 28, 2012

Thank you for this convenient transformer. In my application I had to hack it a bit, and I wonder whether the feature I wanted could be more generally useful.

Basically, sometimes you want to concatenate the same feature extractor multiple times, and have some of the parameters tied when grid searching.

In my case, I was learning a hyphenator, so my data points consist of 2 strings: the one to the left of the current position and the one to the right of the current position. For this I defined a ProjectionVectorizer that has a column attribute that just says "I only work on X[:, column]" and concatenated two of these. Now, when grid searching, it is common sense to use the same n-gram range for both transformers, so the cleanest way to do this was this quick hack (no error handling):

class HomogeneousFeatureUnion(FeatureUnion):
    def set_params(self, **params):
        for key, value in params.iteritems():
            for _, transf in self.transformer_list:
                transf.set_params(**{key: value})

This can be easily extended to support both tied params and specific params. I'm not sure whether I overengineered this, but I still have the feeling that this might pop up in other people's applications, so I wanted to raise the question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment